a digital scan of a 35mm film image of a processing sketch running on an LCD
Skip to Content

It Talks: Text to Speech in Processing

The Mac has a really great text-so-speech (TTS) engine built right in, but at first glance it’s only available at Apple’s whim in specific contexts — e.g. via a menu command in TextEdit, or system-wide through the accessibility settings. Seems grim, but we’re in luck — Apple, in their infinite generosity, have given us a command line program called “say”, which lets us invoke the TTS engine through the terminal. It’s super simple to use, just type the command and then the text you want, e.g. say cosmic manifold.

So that’s great, now what if we wanted to make a Processing sketch talk to us? In Java, as in most languages, there are ways to send commands to the terminal programmatically. By calling Runtime.getRuntime().exec("some command");we can run any code we want on the terminal from within Processing. So to invoke the TTS engine from a Processing sketch, we can just create the say ... command line instruction in a string object, pass that into the runtime execution thing, which in turn handles the TTS conversion.

I’ve put together a small Processing class that makes it easy to add speech to your Processing sketches. It only works on Mac OS, won’t work in a web applet, and has only been tested in Mac OS 10.6. (I think the list of voices has changed since 10.5.)

Note that the since the class is quite simple and really just wraps up a few functions. I’ve set it up for static access, which means that you should never need to instantiate the class by calling something like TextToSpeech tts = new TextToSpeech() — and in fact that would be a Bad Idea. Instead, you can access the methods any time without any prior instantiation using static style syntax, e.g. TextToSpeech.say("cosmic manifold");.

Here’s the class and a sample sketch:

  1. // Processing Text to Speech
  2. // Eric Mika, Winter 2010
  3. // Tested on Max OS 10.6 only, possibly compatible with 10.5 (with modification)
  4. // Adapted from code by Denis Meyer (CallToPower)
  5. // Thanks to Mark Triant for the inspiring sample text
  6.  
  7. String script = "cosmic manifold";
  8. int voiceIndex;
  9. int voiceSpeed;
  10.  
  11. void setup() {
  12.   size(500, 500);
  13. }
  14.  
  15. void draw() {
  16.   background(0);
  17.  
  18.   // set the voice based on mouse y
  19.   voiceIndex = round(map(mouseY, 0, height, 0, TextToSpeech.voices.length - 1));
  20.  
  21.   //set the vooice speed based on mouse X
  22.   voiceSpeed = mouseX;
  23.  
  24.   // help text
  25.   fill(255);
  26.   text("Click to hear " + TextToSpeech.voices[voiceIndex] + "\nsay \"" + script + "\"\nat speed " + mouseX, 10, 20);
  27.  
  28.   fill(128);
  29.   text("Mouse X sets voice speed.\nMouse Y sets voice.", 10, 65);
  30. }
  31.  
  32. void mousePressed() {
  33.   // say something
  34.   TextToSpeech.say(script, TextToSpeech.voices[voiceIndex], voiceSpeed);
  35. }
  36.  
  37.  
  38. // the text to speech class
  39. import java.io.IOException;
  40.  
  41. static class TextToSpeech extends Object {
  42.  
  43.   // Store the voices, makes for nice auto-complete in Eclipse
  44.  
  45.   // male voices
  46.   static final String ALEX = "Alex";
  47.   static final String BRUCE = "Bruce";
  48.   static final String FRED = "Fred";
  49.   static final String JUNIOR = "Junior";
  50.   static final String RALPH = "Ralph";
  51.  
  52.   // female voices
  53.   static final String AGNES = "Agnes";
  54.   static final String KATHY = "Kathy";
  55.   static final String PRINCESS = "Princess";
  56.   static final String VICKI = "Vicki";
  57.   static final String VICTORIA = "Victoria";
  58.  
  59.   // novelty voices
  60.   static final String ALBERT = "Albert";
  61.   static final String BAD_NEWS = "Bad News";
  62.   static final String BAHH = "Bahh";
  63.   static final String BELLS = "Bells";
  64.   static final String BOING = "Boing";
  65.   static final String BUBBLES = "Bubbles";
  66.   static final String CELLOS = "Cellos";
  67.   static final String DERANGED = "Deranged";
  68.   static final String GOOD_NEWS = "Good News";
  69.   static final String HYSTERICAL = "Hysterical";
  70.   static final String PIPE_ORGAN = "Pipe Organ";
  71.   static final String TRINOIDS = "Trinoids";
  72.   static final String WHISPER = "Whisper";
  73.   static final String ZARVOX = "Zarvox";
  74.  
  75.   // throw them in an array so we can iterate over them / pick at random
  76.   static String[] voices = {
  77.     ALEX, BRUCE, FRED, JUNIOR, RALPH, AGNES, KATHY,
  78.     PRINCESS, VICKI, VICTORIA, ALBERT, BAD_NEWS, BAHH,
  79.     BELLS, BOING, BUBBLES, CELLOS, DERANGED, GOOD_NEWS,
  80.     HYSTERICAL, PIPE_ORGAN, TRINOIDS, WHISPER, ZARVOX
  81.   };
  82.  
  83.   // this sends the "say" command to the terminal with the appropriate args
  84.   static void say(String script, String voice, int speed) {
  85.     try {
  86.       Runtime.getRuntime().exec(new String[] {"say", "-v", voice, "[[rate " + speed + "]]" + script});
  87.     }
  88.     catch (IOException e) {
  89.       System.err.println("IOException");
  90.     }
  91.   }
  92.  
  93.   // Overload the say method so we can call it with fewer arguments and basic defaults
  94.   static void say(String script) {
  95.     // 200 seems like a resonable default speed
  96.     say(script, ALEX, 200);
  97.   }
  98.  
  99. }

December 6 2010 at 11 AM

Hey, works great for me! One question: Is there a function to stop the speech? When I quit my Application, my Mac is still speaking. :) And another question: How do I can check if the speaking has finished. I have an array with several Strings. I would like to automatically switch to the next element and "say" it after he finished saying the current sentence.

Thanks!

January 21 2011 at 3 AM

Eric Mika:

Kamil, I’m glad it worked for you.

The control issues are tricky, because you’re dealing with what amounts to an external program (Apple’s TTS engine) to which we have relatively crude access through the say command.

Everything you can do with the command is documented, enter man say in a terminal window to see Apple’s docs. I don’t see any commands related to timing information. (That’s not to say that they don’t exist! For example, the speed parameter I use in the processing example isn’t listed in say’s man pages.)

Given that, I can think of four reasonable approaches to your problem:

  1. Just concatenate your array into a single string before sending it to say.

  2. Use a native Java speech synthesizer (like FreeTTS which will give you more control. Quality is mediocre compared to Apple’s engine.

  3. Get clever with the say command. The docs say that it can write sound files to disk instead of speaking them. I’m not sure how fast this happens, but it’s possible that you could “render” your speech to disk and then re-open it in Processing with the Minim library. (This would give you duration info, and the option to start / stop playback arbitrarily.)

  4. Use an industrial-strength API to Apple’s TTS engine. They call it the Speech Synthesis Manager. Getting access to this in a sane way from Processing is probably going to be more grief than writing the app you want in C++ or Objective C. However, these docs could be a great way to find hidden functionality in the say command. (Let me know if you find any!)

Update: Here’s a bonus 5th idea. When you execute say in Java, you get a unique Process object back. This object has a destroy() method which effectively silences say. Since each execution of say runs in its own process, you could keep track of them in your Processing app and kill them at your discretion. This would require a slight reworking of the code above, from a static approach to a more traditional instance-based approach that would return a Process object for every invocation of say. This is probably a better bet than any of the other options above if the only extra thing you need the code to do is stop speaking on command.

January 21 2011 at 4 AM

Thanks a lot for your answer! I am not an absolute beginner but I have no idea how to keep track of those process objects. But so far I will try out FreeTTS which seems to be more easy to handle. (for me) :) However I will also try to learn something more concerning your other ideas. Especially Nr. 5.

I am quite sure I'll annoy you again with a lot of questions, problems... soon. :)

January 27 2011 at 1 PM

Yeah, actually the FreeTTS Sound Output doesn't really satisfy me. So now, let's have a look on idea 5.

January 27 2011 at 9 PM

Hey there,

thanks for the code Eric, I used it in a recent project and added a few things (idea #5 for example).

To stop the voice you have to change the tts funtions like this:

public static Process say(String text, String voice, int speed) {
try {
return Runtime.getRuntime().exec(
new String[] { "say", "-v", voice,
"[[rate " + speed + "]]" + text });
} catch (IOException e) {
System.err.println("Could not start program \"say\" in terminal");
return null;
}
}

/**
* Sends the "say" command to the terminal. Will use the default voice and
* speed 200.
*
* @param text
* The text to be spoken
* @return The process ID of the newly generated thread
*/
public static Process say(String text) {
return say(text, voices[DEFAULT_VOICE_INDEX], DEFAULT_SPEED);
}
}

So change "public static void" to "public static Process" and add a return in front of the say command.

Then, you should add a global variable:

Process lastVoiceProcess;

I did it like this - when you recall the say command, first check if it's still running:

// If the last text-to-speech thread is still running, kill it
if(lastVoiceProcess != null) lastVoiceProcess.destroy();
// remeber the newly started voice process
lastVoiceProcess = TextToSpeech.say("Fourty Two");

September 19 2012 at 10 AM

Still works with OS 10.9. Thank you!!

July 24 2014 at 3 PM

Yak:

ha, this is amazing, still works in Yosemite. Thanks a lot!

April 28 2015 at 8 PM

WhiteLotus:

Does any one know how to invoke mac's inbuilt TTS functionality in Unity3d
this doesn't work for me:
function SpeakString(s : String)
{
System.Diagnostics.Process.Start("say", (s));
}

Thanks

August 11 2015 at 6 AM