a digital scan of a 35mm film image of a processing sketch running on an LCD
Skip to Content

You Mean

Demo

Concept

Google’s automatic search completion give an instant zeitgeist from just a few words of input. Here’s an example of it at work:

A universal auto-complete function would be a useless and wonderful thing to have, and right now I think Google’s search completion is as close as we can get. I’m interested in what would happen if a piece of text was forced to conform with Google’s platonic search query, essentially handing over final editorial authority the their algorithm — which in itself is just a representation of grooves worn into the search engine by millions of people searching for exactly the same thing.

Google sometimes imposes their assistance by placing a link at the top of search results suggesting “did you mean something?” This officious interjection is often creepily right — why yes, I did mean something.

Hence my proposed poetic form: You Mean. This form takes Google’s help a step further by forcing a given string through the suggestion algorithm and reconstructing output consisting entirely of suggestions.

For example, the paragraph above becomes the following:

Henceforth myspace proposed health care bill poetic forms you mean the world to me this form is submitted in connection with an application for takeshi kaneshiro google scholar help a reporter out step further or farther byu forcing amaryllis longest palindrome in a given string through the looking glass suggestion algorithms andkon reconstructing history output devices consisting essentially of entirely pets office depot suggestions lyrics.

Algorithm

First, I needed programmatic access to Google’s suggestions. Google itself was helpful enough to point me to this gentle hack of their toolbar code — a URL that you can hit for an XML list of suggestions for a given query. Handy.

Next, there was the issue of how to atomize input text. This proved a bit trickier, since judgments would have to be made as to how much of a line should be fed through the algorithm at a time. Initially, I tried sending words in individually. This was helpful in creating repetitive structures in the output, but I thought it was loosing to much of the source text’s content.

So I implemented a recursive algorithm that takes the full text of a line, and then tests to see if there are suggestions for it. If there are suggestions, it declares success. If not, it pops a word off the end up the sentence, and tries to find a suggestion for the new, shorter line. It continues to pop words until it finds a suggestion, and then will return to the rest of the sentence and go through the same process of shortening until a suggestion is found. Eventually, a whole line is digested this way. It unfairly weights the beginning of the line (since it’s tested first) but it seemed like a reasonable compromise between performance (the http queries take some time) and content retention.

With some extra print statements, processing looks like this — showing the recursive approach to suggested-sentence generation:

You say: showing the recursive approach
trying: showing the recursive approach
no suggestions
trying: showing the recursive
no suggestions
trying: showing the
suggestion: shown thesaurus
trying: recursive approach
no suggestions
trying: recursive
suggestion: recursive formula
trying: approach
suggestion: approach plates
You mean: shown thesaurus recursive formula approach plates

Occasionally, Google gets stumped on a single word and runs out of suggestions. (“Pluckest”, for example.) In these cases, the algorithm relents and lets the original word through. It’s conceivable that an entire body of text could elude suggestions in this way, if the words were far enough from the online vernacular.

Output

An interesting behavior emerges in canonical texts. Partial lines will be automatically completed with the original text, which gives the text a tendency to repeat itself.

For example, here’s Frost:

whose woods these are i think i know his house is in the village though
his house is in the village though thought for the day
he will not see me stopping here to watch his woods fill up with snow
to watch his woods fill up with snow snowshoe mountain
my little horse must think it queer to stop without a farmhouse near
to stop without a farmhouse near near death experiences
between the woods and frozen lake the darkest evening of the year
the darkest evening of the year by dean koontz
he gives his harness bells a shake to ask if there is some mistake
to ask in spanish if there is something lyrics mistake quotes
the only other sound’s the sweep sounds the same spelled differently sweepstakes
of easy wind and downy flake flake lyrics
the woods are lovely dark and deep poem
but i have promises to keep and miles to go before i sleep
and miles to go before i sleep meaning
and miles to go before i sleep meaning

Source Code

The code is designed to work in two possible configurations. You can either pass it text via standard input, which it will suggestify and spit back out. Or, you can run it with the argument “interactive”, which will bring up a prompt for you to experiment quickly with different suggested text transformations.

  1. import sys
  2. import urllib
  3. from xml.dom import minidom
  4. import string
  5.  
  6. # set to true for more output
  7. debug = 0
  8.  
  9. def strip_punctuation(s):
  10.         return s.translate(string.maketrans("",""), string.punctuation)
  11.  
  12. # returns a list of google suggestions
  13.  
  14. # store them in a dictionary for basic caching… then when parsing the text
  15. # fetch the suggestion from google only if we need to
  16. suggestion_cache = dict();
  17.  
  18. def fetch_suggestions(query):
  19.         if query in suggestion_cache:
  20.                 return suggestion_cache[query]
  21.  
  22.         # here’s the suggestion "API"
  23.         # google.com/complete/search?output=toolbar&q=microsoft
  24.         # adding a trailing space prevents partial matches
  25.         # how to handle multi-word? find the largest possible suggestions
  26.         query_string = urllib.urlencode({"output" : "toolbar", "q" : query})   
  27.        
  28.         # returns some xml
  29.         suggestion_request = urllib.urlopen("http://www.google.com/complete/search?" + query_string)
  30.        
  31.         suggestions = list();  
  32.                
  33.         # handle the odd xml glitch from google
  34.         try:
  35.                 suggestion_xml = minidom.parse(suggestion_request)
  36.                 # let’s extract the suggestions (throw them in a list)
  37.                 for suggestion in suggestion_xml.getElementsByTagName("suggestion"):
  38.                         suggestions.append(suggestion.attributes["data"].value)
  39.  
  40.                 suggestion_cache[query] = suggestions;
  41.         except:
  42.                 pass
  43.        
  44.         suggestion_request.close()
  45.        
  46.         return suggestions
  47.  
  48.  
  49. # glues together a list of words into a sentence based on start and end indexes
  50. def partial_sentence(word_list, start, end):
  51.         if len(word_list) >= end:      
  52.                 sentence = str()
  53.                 for i in range(start, end):
  54.                         sentence = sentence + word_list[i] + " "
  55.        
  56.                 return sentence.strip()
  57.         else:
  58.                 return "partial sentence length error"
  59.  
  60.  
  61. # takes a line and recursively returns google’s suggestion
  62. def suggestify_line(line):
  63.         output_text = ""       
  64.         words = line.lower().strip().split(" ")
  65.  
  66.         if len(words) > 1:
  67.  
  68.                 end_index = len(words)
  69.                 start_index = 0
  70.                 suggested_line = ""
  71.                 remaining_words = len(words)
  72.        
  73.                 # try to suggest based on as much of the original line as possible, then
  74.                 # walk left to try for matches on increasingly atomic fragments
  75.                 while remaining_words > 0:
  76.                         query = partial_sentence(words, start_index, end_index)
  77.                         suggestions = fetch_suggestions(query)
  78.        
  79.                         if debug: print "trying: " + query
  80.        
  81.                         if suggestions:
  82.                                 if debug: print "suggestion: " + suggestions[0]
  83.                                 output_text += suggestions[0] + " "
  84.                                
  85.                                 remaining_words = len(words) - end_index
  86.                                 start_index = end_index;
  87.                                 end_index = len(words)
  88.                
  89.                         else:
  90.                                 # else try a shorter query length              
  91.                                 if debug: print "no suggestions"
  92.                                
  93.                                 # if we’re at the end, relent and return original word
  94.                                 if (end_index - start_index) == 1:
  95.                                         if debug: print "no suggestions, using: " + words[start_index]
  96.                                         output_text += words[start_index] + " "
  97.                                         remaining_words = len(words) - end_index
  98.                                         start_index = end_index;
  99.                                         end_index = len(words)                                 
  100.                                 else:
  101.                                         end_index -= 1
  102.  
  103.         # handle single word lines
  104.         elif len(words) == 1:
  105.                 if debug: print "trying: " + words[0]          
  106.                 suggestions = fetch_suggestions(words[0])
  107.                 if suggestions:
  108.                         if debug: print "suggestion: " + suggestions[0]
  109.                         output_text += suggestions[0] + " ";                   
  110.                 else:
  111.                         if debug: print "defeat"
  112.                         # defeat, you get to use the word you wanted
  113.                         if debug: print words[0]
  114.                         output_text += words[0] + " ";                 
  115.        
  116.         output_text.strip()
  117.         return output_text
  118.  
  119.  
  120.  
  121. # are we in interactive mode?
  122.  
  123. if len(sys.argv) <= 1:
  124.         # Grab a file from standard input, dump it in a string.
  125.         # source_text = sys.stdin.readlines()
  126.         source_text = open("frost.txt").readlines()
  127.         #source_text = "His house is in the village though"
  128.  
  129.         output_text = ""
  130.  
  131.         for line in source_text:
  132.                 output_text += suggestify_line(strip_punctuation(line))
  133.                 output_text += "\n"
  134.  
  135.         print output_text
  136.  
  137.  
  138. elif sys.argv[1] == "interactive":
  139.         while 1:
  140.                 resp = raw_input("You say: ")
  141.                 print "You mean: " + suggestify_line(strip_punctuation(resp)) + "\n"
  142.                 if resp == "exit":
  143.                         break

March 12 2010 at 1 PM

Interesting article.
I did some research into Google Auto Complete a few months back and listed the results for the whole alphabet.

July 7 2012 at 7 AM