a digital scan of a 35mm film image of a processing sketch running on an LCD
Skip to Content

Line Weight

NOC Midterm Concept: A physics-based drawing tool.

February 23 2010 at 12 PM

POS Shuffler

Homework #2: The digital cut-up. Write a program that reads in and creatively re-arranges the content of one or more source texts. What is the unit of your cut-up technique? (the word, the line, the character? something else?) How does your procedure relate (if at all) to your choice of source text? Feel free to build on your assignment from last week.

I wanted to build a cut-up machine that was as grammatically and syntactically non-invasive as possible, while still significantly munging the source text. So I decided to shuffle the text in a way that treated parts of speech as sacred — the words could fall where they may, but if the first word was a noun in the source text, it had better be replaced with a noun in the output. If the second word was a verb, nothing but a verb from elsewhere in the text should take its place, and so on.

Programmatically deriving a word’s part of speech turns out to be a major issue, so I leaned on NLTK to take care of this. It actually does a pretty decent job. From there it’s just a matter of storing lists of the words in a dictionary keyed to each part of speech, shuffling them, and then reconstituting the text.

I ran Obama’s most recent state of the union address through my algorithm. It’s formal enough and carefully constructed enough so as not to pose a significant challenged to NLTK’s Brown Corpus trained part of speech tagger. Also, I was struck by how much the output resembles the famous Bushisms, albeit with a lingering Obama-esque tinge.

Here’s some output:

Bloody Biden, succeed. Run Constitution, convictions from America, were moments, But great Allies: tested. union. declares during on time to look, the prosperity must time to union. history of the struggle at our For Sunday one leaders, our people have fulfilled all Bull Union chose done so of hesitations despite progress And America and we are turned again on the war that call and midst; in marchers as inevitable anything and fellow state. they’s tempting to move very of the guests and president at our victory distinguished civil, that President were back done to Tuesday and when the Beach was tested back about Our Omaha but these Americans great beaten that duty. Congress, nation crashed forward so of These Madame the future was of certain. tranquility. And first years periods was landed because doubt. Again, the depression was assume and Congress Speaker was rights on destined the market of our moments and the courage in our Black and at this our fears and divisions, our disagreements and our members, Vice prevailed that we have to answer always of one strife And 220 times.

They, It have When and much, we shall give information’s strength.

Sample output from the full text of Obama’s 2010 state of the union is also available here. The original text is also available for comparison.

The source code follows. Pretty simple, NLTK does most of the heavy lifting.

  1. import sys
  2. import nltk
  3. import random
  4. import re
  6. # Grab a file from standard input, dump it in a string.
  7. source_text = sys.stdin.read()
  9. # Use NLTK to make some guesses about each word’s part of speech.
  10. token_text = nltk.word_tokenize(source_text)
  11. pos_text = nltk.pos_tag(token_text)
  13. # Set up a dictionary where each key is a POS holding a list
  14. # of each word of that type from the text.
  15. pos_table = dict()
  17. for tagged_word in pos_text:
  18.   # Create the list, if it doesn’t exist already.
  19.   if tagged_word[1] not in pos_table:
  20.     pos_table[tagged_word[1]] = list()
  22.   pos_table[tagged_word[1]].append(tagged_word[0])
  24. # Scramble the word lists.
  25. for pos_key in pos_table:
  26.   random.shuffle(pos_table[pos_key])
  28. # Rebuild the text.
  29. output = str()
  31. for tagged_word in pos_text:
  32.   # Take the last word from the scrambled list.
  33.   word = pos_table[tagged_word[1]].pop()
  35.   # Leave out the space if it’s punctuation.
  36.   if not re.match("[.,;:’!?]", word):
  37.     output +=  " "
  39.   # Accmulate the words
  40.   output +=  word
  42. # Remove white space.
  43. output = output.strip()
  45. print output

February 19 2010 at 1 PM

Vinyl as Visualization

Vinyl under an electron microscope.

A vinyl record, magnified. From Chris Supranowitz’s OPT 307 Final Project.

One of Arthur C. Clarke’s laws of prediction states that “any sufficiently advanced technology is indistinguishable from magic.” There’s something bootstrappy about one sufficiently advanced technology (SEM) laying bare the magic from a formerly advanced technology (Vinyl). In this case, to see the waveform etched in the vinyl is to understand how the medium works in a more-than-conceptual way. No magic required.

Yet the magnifier doesn’t shed the same revelatory light on a compact disc. There’s another layer of abstraction — and it’s arguably beyond visualization. (Still, it’s unusual treat to see the atoms behind those etherial bits… given our tendency to segregate the two.)

Via Noise for Airports.

February 18 2010 at 3 AM

Weight of Your Words

Assignment: Use a physics library.

Physics libraries like Box2D tend to use extremely rational language in extremely literal ways (mass, friction, gravity, etc.) — I wanted to build on this language by overloading its meaning and taking it in an absurdist direction. Electrons, in the quantities pushed around by our machines, certainly don’t carry much physical weight… how, then, can we weigh a string of characters?

Google seems to have this sorted out… just about any conceivable string of text can be quantified, weighed, and perhaps valued by the number of results it dredges up.

So I whipped up an app that literally uses Google’s search result count to determine how game elements behave — with the intention to pressure a player into testing their own judgment of the worth of a word against Google’s. It looks like this:

How does our understanding of how much a word weighs depart from Google’s absolutism? How much weight can you balance?

The game mechanics are pretty basic… Box2D manages the interactions between the words and the tilting ledge below. The ledge is attached to a joint in the middle of the screen, and if words are distributed unevenly it will tilt and send terms sliding into the abyss.

The cloud floating above has a text input box which sends search queries off to Google. A bit of code scrapes the resulting HTML to figure out how many results exist for the query. This runs in its own thread so as not to block the rendering / physics work. After the query count comes back from google, the term you entered drops from the cloud onto the ledge. (You can influence where it will land by positioning the cloud with the mouse beforehand.) The more results a term has, the higher its density — this means that major search terms will help you load extra weight on the ledge, but their extra mass also means they’re more likely to tilt the ledge past the point of recovery. This, I hope, forces the player to estimate the weight of a term before they drop it from the cloud.

Here are a few more screenshots from development and testing:

Weight of Your Words help screenWeight of Your Words tilting overboard

The game doesn’t work very well as an Applet (it uses the net library, which requires code signing), so it’s probably easiest to download the zip below if you’d like to give it a try.

The source is a bit long to embed here, so I’ve attached it below.

February 16 2010 at 8 AM

Blender != CAD

Assignment: Spend a handful of hours evaluating your assigned 3D modeling software (Blender)

Blender doesn’t seem like a reasonable substitute for Alibre or Solidworks… I’m not even sure CAD suites are on the radar of competing products for the Blender Dev team. Instead, they’re positioning Blender as an alternative to Maya or 3Ds Max, and extending functionality into game design while basic CAD functionality (measurements, assemblies, alignment) is neglected.

Still, it represents a huge amount of open-source work towards the basic infrastructure required by both 3D Modeling / Animation tools and 3D Drafting / Design tools. Several efforts have been made to implement CAD functionality within Blender, but none of them seem mature enough for production work (or even sustained dabbling).

Of all the small side-projects and forks attempting to take Blender in a more CAD-like direction, the only one with any enduring momentum seems to be BlenderCAD — which proved too unstable for completing even basic work.

This pretty much describes the experience:

I wish the developers luck, though, since the lack of serious open-source CAD tools is a shame. Something Blender-based is probably the last great hope.

February 11 2010 at 2 PM

And Computer

Artist and Computer

Artist and Computer, some thoughts from 1976.

Via Kevin O’Neill.

February 8 2010 at 11 PM

Interstitial Wasteland

Homework #1. Create a program that behaves like a UNIX text processing program (such as cat, grep, tr, etc.). Your program should take text as input (any text, or a particular text of your choosing) and output a version of the text that has been filtered and/or munged. Be creative, insightful, or intentionally banal.

Choose one text that you created with your program to read in class.

Bonus: Use the program that you created in tandem with another UNIX command line utility.

I try to avoid destroying data. I draw upon the usual set of justifications: Storage is only getting cheaper, an empty HD occupies the same physical space as a full HD, yadda yadda.

Whether or not this policy is for the best, it’s left me with over a million lines of instant messenger conversation logs from my earlier years — mundane, personal conversations between myself and a handful of friends. (Running from about Jr. High to end end of High School.) If not of much general interest, the contents of these (often painfully angst-ridden) logs are a personally surreal thing to revisit.

In response to the first assignment, I wanted to draw from this well of text in some way. I’m particularly interested in the idea of accidental art — the daily, unintended collisions with things that might be formally construed as art.

I wrote a quick algorithm to put my variation of the Infinite monkey theorem to the test. Can enough angsty teens, given enough time to type, eventually produce something resembling a traditionally recognized “great work”?

I decided to pit my adolescent conversations against T.S. Eliot’s The Waste Land. I wasn’t interested in simply recreating the poem verbatim, instead I used the first and last words of each line as anchors between the poem and my logs, and anything in the middle would be filled in from my conversations based on nearby words.

So, the algorithm takes the first and last word from each line of the poem, and then looks for matches in my conversation logs. If it finds a match for both words in my logs, then it will walk forward from the first word, and backward from the last word, to create a line of text which starts and ends identically to a line in The Waste Land.

Finally, if the resulting line is too long, it will cut words out of the middle until the length of the output line matches the length of the line in The Waste Land. Currently, lines that are shorter than their equivalents in the original poem are just printed as-is. (It would be nice to find a reasonable way to beef these up to size.)

When matches aren’t found, the line is dropped. Only about 60% of the poem could be reconstructed from my million lines of conversation text. (e.g., words like “abolie” never turned up in my logs, and therefore were not available to reconstruct that line of The Waste Land.)

The code is happy to work with any plain text files. Supply it with a model text (in this case, Eliot’s poem) and a source text (in this case, my conversation logs), and it will do its best to shape the source text into the model text.

The first argument to the program is the source text, and the second is the model text. For example, from the command line, this would use aim.txt as the source and wasteland.txt as the model, and save the results to a text file named aim-wasteland.txt:
python interstitial-wasteland.py aim.txt wasteland.txt >> aim-wasteland.txt

It takes a while to run, and you won’t get decent results unless the source text is huge.

Here’s the full output: interstitial-wasteland-output.txt

And the output with the original poem in parallel: interstitial-wasteland-output-parallel.txt

A small excerpt of the raw output:

I… wasn’t going us, he’s DEAD!

April one is designed for inbreeding,
Memory Lane and stirring
Winter Olympic Games. of recovering
Earth, yet we and feeding
And let me tell for like an hour..by
Individuals can be dangerous though
I”m confused http://winter.squaw.com/html/squawcambig.html

-What are the current problems problems?- it grows
Out man,

The same excerpt in parallel with the model text:

I… wasn’t going us, he’s DEAD!

April is the cruellest month, breeding
April one is designed for inbreeding,

Memory and desire, stirring
Memory Lane and stirring

Winter kept us warm, covering
Winter Olympic Games. of recovering

Earth in forgetful snow, feeding
Earth, yet we and feeding

And drank coffee, and talked for an hour.
And let me tell for like an hour..by

In the mountains, there you feel free.
Individuals can be dangerous though

I read, much of the night, and go south in the winter.
I”m confused http://winter.squaw.com/html/squawcambig.html

What are the roots that clutch, what branches grow
-What are the current problems problems?- it grows

Out of this stony rubbish? Son of man,
Out man,

And the source code:

  1. import sys
  2. args = sys.argv
  4. # I hard coded these for my local testing.
  5. #args = [‘self’, ‘aim.txt’, ‘wasteland.txt’]
  7. # Set to true if you want extra output for debugging.
  8. # TK turn this into a command line parameters.
  9. verbose = 0
  11. # Set to true if you want to show the original line above the munged one.
  12. # TK turn this into a command line parameters.
  13. print_original = 0
  15. if verbose: print args
  16. if verbose: print ‘Take the text from ‘ + args[1] + ’ and model it after ‘ + args[2]
  18. # Pull the filenames from stdin.
  19. source_file_name = args[1]
  20. model_file_name = args[2]
  22. # Open each file. (Error handling would be good here…)
  23. source_file = open(source_file_name, ‘r’)
  24. model_file = open(model_file_name, ‘r’)
  26. # Read each line of each file into a list.
  27. source_lines = source_file.readlines()
  28. model_lines = model_file.readlines()
  31. # Removes usernames from the start of a line, e.g. removes "OBRIGADO:"
  32. def anonymize(line):
  33.   if ’:’ in line:
  34.     colon_index = line.index(’:’) + 1
  35.     anonymous_line = line[colon_index:len(line)]
  36.     return anonymous_line.strip()
  38.   return line
  40. # Clean up line breaks.
  41. def remove_breaks(line):
  42.   line = line.replace(\n,)
  43.   line = line.replace(\r,)
  44.   return line
  47. # Gives index of element containing word.
  48. # Less strict than .index(string) since it finds partial matches.
  49. def word_at(string, list):
  50.   index = 0
  51.   for item in list:
  52.     if string in item:
  53.       return index
  54.       break
  55.     index += 1
  57.   return -1
  60. # Go through the model and look for matches to the first and last words.
  61. index = 0
  62. for line in model_lines:
  63.   # Make sure it’s not a blank line.
  64.   line = line.strip()
  66.   # Put in line breaks if it is blank.
  67.   if len(line) == 0:
  68.     print
  70.   # Otherwise, start processing.
  71.   if len(line) > 1:
  72.     # Place each word in a list.
  73.     line_list = line.split(’ ‘)
  74.     first_word = line_list[0];
  75.     last_word = line_list[-1];
  77.     if verbose: print ‘––––––––––––’    
  78.     if verbose: print ‘Line ‘ + str(index) + ’ starts with "’ + first_word + ‘" ends with "’ + last_word + ‘"’
  80.     # Find first instance of first word in source file.
  81.     for first_word_line in source_lines:
  82.       if first_word in first_word_line:
  84.         # We found the starting word, now find the ending word.
  85.         for last_word_line in source_lines:
  86.           if last_word in last_word_line:
  88.             # We have both a starting and ending word match!
  90.             # Clean up, remove line breaks and attribution.
  91.             # TK problem if match was in name?
  92.             first_word_line = anonymize(remove_breaks(first_word_line))
  93.             last_word_line = anonymize(remove_breaks(last_word_line))      
  95.             # For the first line, save from the word forward.
  96.             first_line_list = first_word_line.split(’ ‘)
  97.             first_word_index = word_at(first_word, first_line_list)
  98.             first_line_list = first_line_list[first_word_index:len(first_line_list)]
  100.             # For the last line, save from the word backward.
  101.             last_line_list = last_word_line.split(’ ‘)            
  102.             last_word_index = word_at(last_word, last_line_list)
  103.             last_line_list = last_line_list[0:last_word_index + 1]
  105.             # TK remove blank stuff.
  106.             complete_line_list = first_line_list + last_line_list
  107.             if verbose: print complete_line_list
  109.             # Construct a sentence as close to the original length as possible.
  110.             model_line_length = len(line_list);
  112.             # remove words until we have the desired length.
  113.             # TK single word line problems?
  114.             while len(complete_line_list) > model_line_length:
  115.               # Pop from the middle.
  116.               complete_line_list.pop(int(len(complete_line_list) / 2))
  118.             complete_line = ’ ‘.join(complete_line_list)
  120.             # Print the original above the munged line.
  121.             if print_original: print line
  123.             print complete_line
  125.             # Print add some line breaks for readability.
  126.             if print_original: print
  128.             break
  130.         break
  132.   index += 1

February 5 2010 at 3 PM