a digital scan of a 35mm film image of a processing sketch running on an LCD
Skip to Content

Geo Graph Bots


A trio of small, wheeled robots each beholden to a particular geo-tagged social web service, tethered together with elastic string, each attempting to pull the other towards the physical location of the most recent event on its particular social network.

A number of web services — Flickr, Twitter, etc. — receive updates with geo-tagged data at a remarkable rate. The proposed robots will receive wireless updates from a laptop with this latitude and longitude information (probably on the order of a few times per second). Using this data and an onboard compass, they will steer toward the location of the most recent photograph / tweet / whatever, and then drive furiously in this direction. This will continue until they receive the latest geo data a bit later, at which point they will set a new course and proceed in that direction.

Since the three bots will be tethered to one another with a length of string, the hope is that they will occasionally get pulled in one direction or another by their neighbors, and perhaps eventually get tangled in the string to the point where they can’t move at all.

Alternately, the bots could lay down string in their wake… sketching their path, overlap, etc.

Parts List

  • 3x bot chassis (probably laser cut wood or plexi)
  • 6x stepper motors
  • 6x wheels
  • 3x small casters
  • 3x arduinos
  • 3x digital compass modules
  • 4x xBees (3 for the bots, 1 for the laptop)
  • 1x xBee explorer
  • 1x length of elastic string (6 feet?)
  • 3x eyelets (for string)
  • 3x rechargeable batteries
April 8 2010 at 6 PM

How to Hack Toy EEGs

Arturo Vidich, Sofy Yuditskaya, and I needed a way to read brains for our Mental Block project last fall. After looking at the options, we decided that hacking a toy EEG would be the cheapest / fastest way to get the data we wanted. Here’s how we did it.

The Options

A non-exhaustive list of the consumer-level options for building a brain-computer interface:

  Open EEG Board
Open EEG
Force Trainer Box
Force Trainer
Mindflex Box
Mind Flex
MindSet Box
Description Plans and software for building an EEG from scratch Levitating ball game from Uncle Milton Levitating ball game from Mattel Official headset from NeuroSky
Attention / Meditation Values No Yes Yes Yes
EEG Power Band Values Yes (roll your own FFT) No Yes Yes
Raw wave values Yes No No Yes
Cost $200+ $75 (street) $80 (street) $200

Open EEG offers a wealth of hardware schematics, notes, and free software for building your own EEG system. It’s a great project, but the trouble is that the hardware costs add up quickly, and there isn’t a plug-and-play implementation comparable to the EEG toys.

The Nerosky MindSet is a reasonable deal as well — it’s wireless, supported, and plays nicely with the company’s free developer tools.

For our purposes, though, it was still a bit spendy. Since NeuroSky supplies the EEG chip and hardware for the Force Trainer and Mind Flex toys, these options represent a cheaper (if less convenient) way to get the same data. The silicon may be the same between the three, but our tests show that each runs slightly different firmware which accounts for some variations in data output. The Force Trainer, for example, doesn’t output EEG power band values — the Mind Flex does. The MindSet, unlike the toys, also gives you access to raw wave data. However, since we’d probably end up running an FFT on the wave anyway (and that’s essentially what the EEG power bands represent), we didn’t particularly miss this data in our work with the Mind Flex.

Given all of this, I think the Mind Flex represents a sweet spot on the price / performance curve. It gives you almost all of the data the Mind Set for less than half the cost. The hack and accompanying software presented below works fine for the Force Trainer as well, but you’ll end up with less data since the EEG power values are disabled in the Force Trainer’s firmware from the factory.

Of course, the Mind Flex is supposed to be a black-box toy, not an officially supported development platform — so in order to access the actual sensor data for use in other contexts, we’ll need to make some hardware modifications and write some software to help things along. Here’s how.

But first, the inevitable caveat: Use extreme caution when working with any kind of voltage around your brain, particularly when wall power is involved. The risks are small, but to be on the safe side you should only plug the Arduino + Mind Flex combo into a laptop running on batteries alone. (My thanks to Viadd for pointing out this risk in the comments.) Also, performing the modifications outlined below means that you’ll void your warranty. If you make a mistake you could damage the unit beyond repair. The modifications aren’t easily reversible, and they may interfere with the toy’s original ball-levitating functionality.

However, I’ve confirmed that when the hack is executed properly, the toy will continue to function — and perhaps more interestingly, you can skim data from the NeuroSky chip without interfering with gameplay. In this way, we’ve confirmed that the status lights and ball-levitating fan in the Mind Flex are simply mapped to the “Attention” value coming out of the NeuroSky chip.

The Hardware

Here’s the basic layout of the Mind Flex hardware. Most of the action is in the headband, which holds the EEG hardware. A micro controller in the headband parses data from the EEG chip and sends updates wirelessly to a base station, where a fan levitates the ball and several LEDs illuminate to represent your current attention level.

Mind Flex Schematic

This schematic immediately suggests several approaches to data extraction. The most common strategy we’ve seen is to use the LEDs on the base station to get a rough sense of the current attention level. This is nice and simple, but five levels of attention just doesn’t provide the granularity we were looking for.

A quick aside: Unlike the Mind Flex, the Force Trainer has some header pins (probably for programming / testing / debugging) which seem like an ideal place to grab some data. Others have reported success with this approach. We could never get it to work.

We decided to take a higher-level approach by grabbing serial data directly from the NeuroSky EEG chip and cutting the rest of the game hardware out of the loop, leaving a schematic that looks more like this:

Mind Flex Schematic Hacked

The Hack

Parts list:

  • 1 x Mind Flex
  • 3 x AAA batteries for the headset
  • 1 x Arduino (any variety), with USB cable
  • 2 x 12” lengths of solid core hookup wire (around #22 or #24 gauge is best).
  • A PC or Mac to monitor the serial data

Software list:

The video below walks through the whole process. Detailed instructions and additional commentary follow after the video.


1. Disassembly.

Grab a screwdriver and crack open the left pod of the Mind Flex headset. (The right pod holds the batteries.)

Mind Flex internal layout

2. The T Pin.

The NeuroSky Board is the small daughterboard towards the bottom of the headset. If you look closely, you should see conveniently labeled T and R pins — these are the pins the EEG board uses to communicate serially to the microcontroller on the main board, and they’re the pins we’ll use to eavesdrop on the brain data. Solder a length of wire (carefully) to the “T” pin. Thin wire is fine, we used #24 gauge. Be careful not to short the neighboring pins.

The T PinT Pin with soldered lead

3. Common ground.

Your Arduino will want to share ground with the Mind Flex circuit. Solder another length of wire to ground — any grounding point will do, but using the large solder pad where the battery’s ground connection arrives at the board makes the job easier. A note on power: We’ve found the Mind Flex to be inordinately sensitive to power… our initial hope was to power the NeuroSky board from the Arduino’s 3.3v supply, but this proved unreliable. For now we’re sticking with the factory configuration and powering the Arduino and Mind Flex independently.

Ground lead

4. Strain relief and wire routing.

We used a dab of hot glue to act as strain relief for the new wires, and drilled a hole in the case for the two wires to poke through after the case was closed. This step is optional.

Strain reliefWire routing

5. Hook up the Arduino.

The wire from the Mind Flex’s “T” pin goes into the Arduino’s RX pin. The ground goes… to ground. You may wish to secure the Arduino to the side of the Mind Flex as a matter of convenience. (We used zip ties.)

Finished hack

That’s the extent of the hardware hack. Now on to the software. The data from the NeuroSky is not in a particularly friendly format. It’s a stream of raw bytes that will need to be parsed before they’ll make any sense. Fate is on our side: the packets coming from the Mind Flex match the structure from NeuroSky’s official Mindset documentation. (See the mindset_communications_protocol.pdf document in the Mindset developer kit if you’re interested.) You don’t need to worry about this, since I’ve written an Arduino library that makes the parsing process as painless as possible.

Essentially, the library takes the raw byte data from the NeuroSky chip, and turns it into a nice ASCII string of comma-separated values.

6. Load up the Arduino.

Download and install the Arduino Brain Library — it’s available here. Open the BrainSerialOut example and upload it to your board. (You may need to disconnect the RX pin during the upload.) The example code looks like this:

  1. #include <Brain.h>
  3. // Set up the brain parser, pass it the hardware serial object you want to listen on.
  4. Brain brain(Serial);
  6. void setup() {
  7.         // Start the hardware serial.
  8.         Serial.begin(9600);
  9. }
  11. void loop() {
  12.         // Expect packets about once per second.
  13.         // The .readCSV() function returns a string (well, char*) listing the most recent brain data, in the following format:
  14.         // "signal strength, attention, meditation, delta, theta, low alpha, high alpha, low beta, high beta, low gamma, high gamma"   
  15.         if (brain.update()) {
  16.                 Serial.println(brain.readCSV());
  17.         }
  18. }

7. Test.

Turn on the Mind Flex, make sure the Arduino is plugged into your computer, and then open up the Serial Monitor. If all went well, you should see the following:

Brain library serial test

Here’s how the CSV breaks down: “signal strength, attention, meditation, delta, theta, low alpha, high alpha, low beta, high beta, low gamma, high gamma”

(More on what these values are supposed to mean later in the article. Also, note that if you are hacking a Force Trainer instead of a Mind Flex, you will only see the first three values — signal strength, attention, and meditation.)

If you put the unit on your head, you should see the “signal strength” value drop to 0 (confusingly, this means the connection is good), and the rest of the numbers start to fluctuate.

8. Visualize.

As exciting as the serial monitor is, you might think, “Surely there’s a more intuitive way to visualize this data!” You’re in luck: I’ve written a quick, open-source visualizer in Processing which graphs your brain activity over time (download). It’s designed to work with the BrainSerialOut Arduino code you’ve already loaded.

Download the code, and then open up the brain_grapher.pde file in Processing. With the Mind Flex plugged in via USB and powered on, go ahead and run the Processing sketch. (Just make sure the Arduino IDE’s serial monitor is closed, otherwise Processing won’t be able to read from the Mind Flex.) You may need to change the index of the serial list array in the brain_grapher.pde file, in case your Arduino is not the first serial object on your machine:

serial = new Serial(this, Serial.list()[0], 9600);

You should end up with a screen like this:

Processing visualizer test

About the data

So what, exactly, do the numbers coming in from the NeuroSky chip mean?

The Mind Flex (but not the Froce Trainer) provide eight values representing the amount of electrical activity at different frequencies. This data is heavily filtered / amplified, so where a conventional medical-grade EEG would give you absolute voltage values for each band, NeuroSky instead gives you relative measurements which aren’t easily mapped to real-world units. A run down of the frequencies involved follows, along with a grossly oversimplified summary of the associated mental states.

In addition to these power-band values, the NeuroSky chip provides a pair of proprietary, black-box data values dubbed “attention” and “mediation”. These are intended to provide an easily-grokked reduction of the brainwave data, and it’s what the Force Trainer and Mind Flex actually use to control the game state. We’re a bit skeptical of these values, since NeuroSky won’t disclose how they work, but a white paper they’ve released suggests that the values are at least statistically distinguishable from nonsense.

Here’s the company line on each value:

  • Attention:

    Indicates the intensity of a user’s level of mental “focus” or “attention”, such as that which occurs during intense concentration and directed (but stable) mental activity. Distractions, wandering thoughts, lack of focus, or anxiety may lower the Attention meter levels.

  • Meditation:

    Indicates the level of a user’s mental “calmness” or “relaxation”. Meditation is related to reduced activity by the active mental processes in the brain, and it has long been an observed effect that closing one’s eyes turns off the mental activities which process images from the eyes, so closing the eyes is often an effective method for increasing the Meditation meter level. Distractions, wandering thoughts, anxiety, agitation, and sensory stimuli may lower the Meditation meter levels.

At least that’s how it’s supposed to work. We’ve found that the degree of mental control over the signal varies from person to person. Ian Cleary, a peer of ours at ITP, used the Mind Flex in a recent project. He reports that about half of the people who tried the game were able to exercise control by consciously changing their mental state.

The most reasonable test of the device’s legitimacy would be a comparison with a medical-grade EEG. While we have not been able to test this ourselves, NeuroSky has published the results of such a comparison. Their findings suggest that the the NeuroSky chip delivers a comparable signal. Of course, NeuroSky has a significant stake in a positive outcome for this sort of test.

And there you have it. If you’d like to develop hardware or software around this data, I recommend reading the documentation that comes with the brain library for more information — or browse through the visualizer source to see how to work with the serial data. If you make something interesting using these techniques, I’d love to hear about it.

March 2013 Update:

Almost three years on, I think I need to close the comments since I don’t have the time (or hardware on hand) to keep up with support. Please post future issues on the GitHub page of the relevant project:

Arduino Brain Library

Processing Brain Grapher

Most issues I’m seeing in the comments seem like the result of either soldering errors or compatibility-breaking changes to the Processing and Arduino APIs. I’ll try to stay ahead of these on GitHub and will be happy to accept pull requests to keep the code up to date and working.

Thanks everyone for your feedback and good luck with your projects.

April 7 2010 at 2 PM

ASCII Cellular Automata: CAd nauseam

There’s a slightly pathetic anticlimax when a cellular automata bound for infinity runs into the edge of a computer screen and halts. This unfortunate behavior can be diverted by the most trivial of interface elements: the scroll bar.

So, I created an HTML / JavaScript implementation of a Wolfram’s one-dimensional binary cellular automata: CAd nauseam. The name is intended to reflect the sense of conceptual exhaustion around this particular set of 256 rules, which has been researched, poked, and prodded within an inch of its life.

Give it a try: http://frontiernerds.com/cad-nauseam

As the CA is rendered, the page simply grows to accommodate as much output as you have patience for. It’s easy to pause, scroll back up, and reminisce about past lines. (If you’re into that sort of thing…)

In addition to the browser’s native scrollability, I added a few knobs and dials that let you manipulate the rules in real time.

In this context, ASCII doesn’t offer many endearing qualities beyond a certain nostalgic cheekiness, but I suppose one could argue that the output is easy to cut / paste and it allows the simulation to run relatively quickly in the browser. (At least compared to rendering the pixels with a table or using the canvas object.)

The code is based heavily on Dan Shiffman’s Processing example of the same CA. Just view the source to get a sense of how it works — although most of my contributions to the code are just interface-related cruft.

There are two ways to set the active rule. You can click each four-square icon to toggle that particular rule on or off. (The top three squares represent the seed condition, the single bottom square indicates whether that condition will turn the next pixel on or off.) Alternately, you can type the rule number you’d like to see directly into the text box at the top of the palette. Hit enter, or click outside the box to activate the rule.

As you change the rules, the URL will change to represent a direct-link to the current rule set. For example, you can visit http://frontiernerds.com/cad-nauseam/#30 to see the output of rule 30.

The rest of the interface should be self-explanatory. “Reseed” will clear the latest line and replace it with a new line with single X in the center. “Go / Stop” freezes the simulation so you can scroll through the history more easily. “Rule Info” takes you to the Wolfram|Alpha page describing the current rule.

Runs best in Safari, the experience is much slower and stickier in Firefox, IE, and Chrome.

April 7 2010 at 12 AM

Free Body Diagram

I sketched a free-body diagram of a hot glue gun’s stick-advance mechanism in an attempt to formalize its action. (I was also surprised to see hot glue used to secure wires inside the hot glue gun… a textbook chicken / egg paradox.)

April 1 2010 at 2 PM


Charles Babbage’s Brain, Photo: James Wheare


Can a machine accumulate enough information about your patterns of communication to create an effective digital doppelgänger? Could we use the data left behind on Google’s servers and our own hard disks to effectively replace ourselves with an artificial intelligence born and bred of our online conversations and quirks? What might it be like to have a conversation with a past representation of ourselves, what would a hypothetical exchange between two digitally-reconstructed individuals look like?

Michael Edgcumbe and I approached these questions with Caprica, our rough attempt to commit to code some of the ideas of digital reincarnation put forth in the (reportedly mediocre) eponymous television series.

Both Michael and I have managed to retain a good portion of our instant messenger chat logs. My archives represent just over a half-million lines of conversation logged from about 2001 to 2004. Michael’s are a bit more recent, and weigh in at 34,000 lines. So data is in relative abundance.

The goal was to build an autonomous chat bot that would draw from the content of our logs to construct an infinite stream of back-and-forth conversation between our younger selves. Ideally, these conversations should be reasonably cogent and reflect whatever personality / themes we left behind in our logs.


Our initial approach to an algorithm was simple — the entire chat log can be considered a kind of question / answer training set. There’s a bit of latent intelligence built right into the log, since it literally documents how you responded to a wide range of queries. By finding which line in the log is the closest match to a given query, we should be able to walk forward a few lines and retrieve a reasonable response. This turns the problem into one of sentence similarity and avoids the issue of extracting and classifying meaning from the logs.

There are some peculiarities about instant messenger conversations which needed to be considered:

  • Typos are rampant
  • Netspeak doesn’t play well with NLTK dictionaries and algorithms trained on more formal corpora
  • A new line of conversation often acts as a comma; single line responses and serial responses from one person are common

With these points in mind, we tried a number of techniques for ranking similarity between a query string and lines of logged conversation. First, we wanted to increase the opportunities for a match between the query and the log, so we used lemmatization / synonym lookup to expand the query.

For example, for the query how about the weathereach word is expanded into a list of synonymous terms:

['about', 'astir', 'approximately', 'close_to', 'just_about', 'some', 'roughly', 'more_or_less', 'around', 'or_so', 'almost', 'most', 'nearly', 'near', 'nigh', 'virtually', 'well-nigh'],
['weather', 'weather_condition', 'conditions', 'atmospheric_condition', 'endure', 'brave', 'brave_out', 'upwind']]

From there, the chat log is searched for lines containing these synonyms — each word match improves the score of a particular line, which means its more likely to wind up as the best match to the query.

Other methods attempted include turning the logs into bigrams, to give a bit more weight to pairs of words used in context — this proved too slow to run in real time, we would need to set up a cache or database of bigrams for each log to use this approach in the future. (It’s currently scrapped from the working implementation.)

We also attempted to ignore line breaks in the logs and instead treat each stream of replies from one individual as a single chunk. This left us with unnaturally long-winded responses, slower searches (since the queries were much longer) and less of a quality improvement than we expected. (Also scrapped from the working implementation.)

Finally, our algorithm handles some basic house keeping: A response gets flagged after it’s used, so that conversations won’t repeat themselves. Response scores are also normalized based on length, so that longer lines (with more potential word matches) don’t dominate the conversation. It also manages the eternal conversational bounce between each log: After a response is generated, that response becomes the query to the other log… ad infinitum until every single line is used.

The source code is available on GitHub. The caprica3-presented.py file represents the most recent working implementation.


Here’s an excerpt of a hypothetical conversation between my adolescent self and Michael:

Edgwired: what are we lying about?
obrigado: the royal you
Edgwired: we had to transfer them as files rather than as music
obrigado: hah
Edgwired: heh
obrigado: wtf?
Edgwired: music is portable
obrigado: J.R. Rodale
Edgwired: plus
obrigado: additionaly
Edgwired: cool to hang out
obrigado: all this time coco
Edgwired: this is what i’m leaning towards
obrigado: i have assumed
Edgwired: LOL
obrigado: haha
Edgwired: what monitor?
obrigado: right
Edgwired: that one is pretty good
obrigado: that the version of remind me
Edgwired: fuck it
obrigado: actually it is
Edgwired: serious

The full text is also available.

Even with our crude implementation, the generated conversations are at least moderately interesting. Humans are quite good at finding patterns and extrapolating meaning where there is actually very little of either, and I think this helps mask the mediocrity of the algorithm.

Future Improvements

We have a number of ideas for improvements that didn’t make it into the first cut.

We considered Stemming the logs to increase the number of matches. However, the search code we’re using at the moment allows for partial word matches, so I’m not sure how much we would gain from this step.

Another major issue is that the log data requires a massive amount of clean-up before it’s ready for use. Ideally, we would have a program that would automatically aggregate a user’s chat (or email, or twitter, etc.) data without them needing to dig up their logs from the depths of the file system and run a bunch of finicky clean-up routines to get the data ready for use. Michael and I spent a huge amount of time dealing with character encoding issues and generally restructuring the log data so that it was consistent for both of us. Writing a reliable, hands-off parser would be a lot of work, but it would be consistent with the goals of the project: to provide access to an interactive, digital representation of oneself.

Python starts to show its slowness when you’re handling many thousands of lines of strings… for efficiency’s sake, the logs would benefit from migration to a database system.

And most importantly, the sentence similarity approach is deeply naïve. There’s a lot more to the reconstruction process than finding word matches, and to improve the results we will really need a way to extract and tag actual data from the logs. We will need some way to identify major themes and then weave them together into more convincing conversation.

March 22 2010 at 10 AM

You Mean



Google’s automatic search completion give an instant zeitgeist from just a few words of input. Here’s an example of it at work:

A universal auto-complete function would be a useless and wonderful thing to have, and right now I think Google’s search completion is as close as we can get. I’m interested in what would happen if a piece of text was forced to conform with Google’s platonic search query, essentially handing over final editorial authority the their algorithm — which in itself is just a representation of grooves worn into the search engine by millions of people searching for exactly the same thing.

Google sometimes imposes their assistance by placing a link at the top of search results suggesting “did you mean something?” This officious interjection is often creepily right — why yes, I did mean something.

Hence my proposed poetic form: You Mean. This form takes Google’s help a step further by forcing a given string through the suggestion algorithm and reconstructing output consisting entirely of suggestions.

For example, the paragraph above becomes the following:

Henceforth myspace proposed health care bill poetic forms you mean the world to me this form is submitted in connection with an application for takeshi kaneshiro google scholar help a reporter out step further or farther byu forcing amaryllis longest palindrome in a given string through the looking glass suggestion algorithms andkon reconstructing history output devices consisting essentially of entirely pets office depot suggestions lyrics.


First, I needed programmatic access to Google’s suggestions. Google itself was helpful enough to point me to this gentle hack of their toolbar code — a URL that you can hit for an XML list of suggestions for a given query. Handy.

Next, there was the issue of how to atomize input text. This proved a bit trickier, since judgments would have to be made as to how much of a line should be fed through the algorithm at a time. Initially, I tried sending words in individually. This was helpful in creating repetitive structures in the output, but I thought it was loosing to much of the source text’s content.

So I implemented a recursive algorithm that takes the full text of a line, and then tests to see if there are suggestions for it. If there are suggestions, it declares success. If not, it pops a word off the end up the sentence, and tries to find a suggestion for the new, shorter line. It continues to pop words until it finds a suggestion, and then will return to the rest of the sentence and go through the same process of shortening until a suggestion is found. Eventually, a whole line is digested this way. It unfairly weights the beginning of the line (since it’s tested first) but it seemed like a reasonable compromise between performance (the http queries take some time) and content retention.

With some extra print statements, processing looks like this — showing the recursive approach to suggested-sentence generation:

You say: showing the recursive approach
trying: showing the recursive approach
no suggestions
trying: showing the recursive
no suggestions
trying: showing the
suggestion: shown thesaurus
trying: recursive approach
no suggestions
trying: recursive
suggestion: recursive formula
trying: approach
suggestion: approach plates
You mean: shown thesaurus recursive formula approach plates

Occasionally, Google gets stumped on a single word and runs out of suggestions. (“Pluckest”, for example.) In these cases, the algorithm relents and lets the original word through. It’s conceivable that an entire body of text could elude suggestions in this way, if the words were far enough from the online vernacular.


An interesting behavior emerges in canonical texts. Partial lines will be automatically completed with the original text, which gives the text a tendency to repeat itself.

For example, here’s Frost:

whose woods these are i think i know his house is in the village though
his house is in the village though thought for the day
he will not see me stopping here to watch his woods fill up with snow
to watch his woods fill up with snow snowshoe mountain
my little horse must think it queer to stop without a farmhouse near
to stop without a farmhouse near near death experiences
between the woods and frozen lake the darkest evening of the year
the darkest evening of the year by dean koontz
he gives his harness bells a shake to ask if there is some mistake
to ask in spanish if there is something lyrics mistake quotes
the only other sound’s the sweep sounds the same spelled differently sweepstakes
of easy wind and downy flake flake lyrics
the woods are lovely dark and deep poem
but i have promises to keep and miles to go before i sleep
and miles to go before i sleep meaning
and miles to go before i sleep meaning

Source Code

The code is designed to work in two possible configurations. You can either pass it text via standard input, which it will suggestify and spit back out. Or, you can run it with the argument “interactive”, which will bring up a prompt for you to experiment quickly with different suggested text transformations.

  1. import sys
  2. import urllib
  3. from xml.dom import minidom
  4. import string
  6. # set to true for more output
  7. debug = 0
  9. def strip_punctuation(s):
  10.         return s.translate(string.maketrans("",""), string.punctuation)
  12. # returns a list of google suggestions
  14. # store them in a dictionary for basic caching… then when parsing the text
  15. # fetch the suggestion from google only if we need to
  16. suggestion_cache = dict();
  18. def fetch_suggestions(query):
  19.         if query in suggestion_cache:
  20.                 return suggestion_cache[query]
  22.         # here’s the suggestion "API"
  23.         # google.com/complete/search?output=toolbar&q=microsoft
  24.         # adding a trailing space prevents partial matches
  25.         # how to handle multi-word? find the largest possible suggestions
  26.         query_string = urllib.urlencode({"output" : "toolbar", "q" : query})   
  28.         # returns some xml
  29.         suggestion_request = urllib.urlopen("http://www.google.com/complete/search?" + query_string)
  31.         suggestions = list();  
  33.         # handle the odd xml glitch from google
  34.         try:
  35.                 suggestion_xml = minidom.parse(suggestion_request)
  36.                 # let’s extract the suggestions (throw them in a list)
  37.                 for suggestion in suggestion_xml.getElementsByTagName("suggestion"):
  38.                         suggestions.append(suggestion.attributes["data"].value)
  40.                 suggestion_cache[query] = suggestions;
  41.         except:
  42.                 pass
  44.         suggestion_request.close()
  46.         return suggestions
  49. # glues together a list of words into a sentence based on start and end indexes
  50. def partial_sentence(word_list, start, end):
  51.         if len(word_list) >= end:      
  52.                 sentence = str()
  53.                 for i in range(start, end):
  54.                         sentence = sentence + word_list[i] + " "
  56.                 return sentence.strip()
  57.         else:
  58.                 return "partial sentence length error"
  61. # takes a line and recursively returns google’s suggestion
  62. def suggestify_line(line):
  63.         output_text = ""       
  64.         words = line.lower().strip().split(" ")
  66.         if len(words) > 1:
  68.                 end_index = len(words)
  69.                 start_index = 0
  70.                 suggested_line = ""
  71.                 remaining_words = len(words)
  73.                 # try to suggest based on as much of the original line as possible, then
  74.                 # walk left to try for matches on increasingly atomic fragments
  75.                 while remaining_words > 0:
  76.                         query = partial_sentence(words, start_index, end_index)
  77.                         suggestions = fetch_suggestions(query)
  79.                         if debug: print "trying: " + query
  81.                         if suggestions:
  82.                                 if debug: print "suggestion: " + suggestions[0]
  83.                                 output_text += suggestions[0] + " "
  85.                                 remaining_words = len(words) - end_index
  86.                                 start_index = end_index;
  87.                                 end_index = len(words)
  89.                         else:
  90.                                 # else try a shorter query length              
  91.                                 if debug: print "no suggestions"
  93.                                 # if we’re at the end, relent and return original word
  94.                                 if (end_index - start_index) == 1:
  95.                                         if debug: print "no suggestions, using: " + words[start_index]
  96.                                         output_text += words[start_index] + " "
  97.                                         remaining_words = len(words) - end_index
  98.                                         start_index = end_index;
  99.                                         end_index = len(words)                                 
  100.                                 else:
  101.                                         end_index -= 1
  103.         # handle single word lines
  104.         elif len(words) == 1:
  105.                 if debug: print "trying: " + words[0]          
  106.                 suggestions = fetch_suggestions(words[0])
  107.                 if suggestions:
  108.                         if debug: print "suggestion: " + suggestions[0]
  109.                         output_text += suggestions[0] + " ";                   
  110.                 else:
  111.                         if debug: print "defeat"
  112.                         # defeat, you get to use the word you wanted
  113.                         if debug: print words[0]
  114.                         output_text += words[0] + " ";                 
  116.         output_text.strip()
  117.         return output_text
  121. # are we in interactive mode?
  123. if len(sys.argv) <= 1:
  124.         # Grab a file from standard input, dump it in a string.
  125.         # source_text = sys.stdin.readlines()
  126.         source_text = open("frost.txt").readlines()
  127.         #source_text = "His house is in the village though"
  129.         output_text = ""
  131.         for line in source_text:
  132.                 output_text += suggestify_line(strip_punctuation(line))
  133.                 output_text += "\n"
  135.         print output_text
  138. elif sys.argv[1] == "interactive":
  139.         while 1:
  140.                 resp = raw_input("You say: ")
  141.                 print "You mean: " + suggestify_line(strip_punctuation(resp)) + "\n"
  142.                 if resp == "exit":
  143.                         break

March 12 2010 at 1 PM

Phys Pix

The (slightly anemic) start of an homage to the most beloved software of my youth, the original KidPix.

Based heavily on the PBox2D examples, this sketch lets you draw objects with physical properties onto a canvas.


March 2 2010 at 12 PM

Mechanisms Midterm Concept: TagBot

Some of the most interesting mechanical solutions seem to have emerged from the transition period between manual and automated means of production — the process of adapting tasks long performed by hand to operate under machine power for the first time. The initial iterations of the adaptation process usually result in endearingly anthropomorphic machines, since the process of abstracting human motions out of the process isn’t yet complete. (Examples include electric typewriters, sewing machines, automated assembly lines, etc.)

I’m interested in working through this process of converting hand power to mechanical power myself. A Dymo tapewriter represents an unlikely but possibly satisfying platform to turn into an automatic, electronic device.

I’m also interested in unintended and unknown physical consequences for actions taken online. The stream of new tag data on sites like Flickr could provide interesting source text, and would force the idea of a tag into the physical world – e.g. here’s a machine that involuntarily spits out sticky pieces of tape with letters on them that could, conceivably, tag real-world objects.

Thus, the TagBot. A mechanized, automatic Dymo tapewriter which scrapes new tag data from Flickr in real time, and generates labels accordingly.

A slightly more ambitious variant could be built with mobility in mind — you could position it somewhere in the city, and it would spit out tags from photographs taken in the vicinity.

Mechanically, several factors need to be accounted for:

  • Rotation and indexing of the character wheel — a stepper motor would probably suffice.
  • The space — a light squeeze on the handle advances the tape without printing a letter. A strong motor or solenoid could manage this.
  • Character printing — a harder squeeze on the handle.
  • Cut — a hard squeeze on another lever.
  • Pull — a motor to pull the finished tag out of the printer.
  • Tape reloading — Dymo tape rolls are pretty short, some kind of automated reloading system would be great, but probably beyond the scope / time available for the midterm.

Code and control will require the following:

  • An Arduino to coordinate the motions of each mechanical element.
  • An interface with the Flickr API to fetch the latest tags. (Either serially from a laptop, or directly over a WiShield.)
  • Code to reduce the character set to those present on the character wheel.
February 25 2010 at 5 PM