Text to Speech Clips/Robot Voices!

Text to Speech ClipMy Friend the Robot

Need a goofy sidekick? What about a recorded warning message? Or, maybe you just aren’t comfortable with using your own voice in videos…

Well maybe Alex can help! Or Victoria, Vicky, Bruce, Fred and Kathy! Who are these helpful people? They are our friends built into ScreenFlow!

Insert Speech Clip

Under the Insert menu in ScreenFlow there is an often unknown and overlooked option to add a speech clip to your ScreenFlow project.  Type a phrase or two into the resulting box, choose your character, and ScreenFlow will turn your text into speech!  Now before we get real deep into this, I want to point out that the human voice is incredibly complex.  The ability of a computer generated voice to mimic such an intricate process is far from perfect. That being said, this is an awesome feature!  I like the “Alex” voice, but that is just my preference.

Here is a little background info on some of the characters built into ScreenFlow!

From a logistical and rational standpoint, using this feature can be quite helpful.  I often like to create recordings in places where I don’t feel comfortable using my own voice.  It is always awkward sitting at a café and speaking crisply and clearly to your computer! It can also help when you don’t have access to a microphone, or the location you are recording in is not suited to audio recordings…like Guitar Center.

While this can be an acceptable substitute for a human voice, I would much rather use it for flavor!  Adding in some goofy, robotic voice tracks over your video can really help to spice things up.

Have you ever used this feature? Will you ever use it? It is definitely tons of fun.  Experiment, and show us what you can come up with!



  1. This is a great demo of the Insert Speech Clip feature in ScreenFlow. Nice and simple.
    Speech synthesis, called MacinTalk for many years, has been a part of the Mac experience ever since Steve Jobs pulled the first Mac out of its carrying bag on stage. I believe that Mac said, “I sure am glad to get out of that bag.”
    Recently (since MacOS X 10.7 – Lion), Apple has been improving the many ways that MacOS X and iOS handle speech, including text-to-speech (TTS) as demonstrated here and speech-to-text (STT) which is what Siri uses to understand our questions and commands.
    On my ePublishing blog, I’ve been exploring how screencasts can add significantly to the value of eBooks and, because I am an educator, eTextbooks as well. As you have demonstrated here, speech synthesis can be a very useful part of screencasting.
    Here are a few of my recent blog posts on various aspects of speech synthesis that can be related to screencasting and ePublishing.
    1) I am also interested in video accessibility, particularly subtitles/captioning so I began with trying to use speech-to-text in order to produce transcripts of narrated video that might subsequently be turned into subtitles. See “Enhanced Dictation in MacOS X 10.9 as an STT Engine for Extant Audio Files” at: http://frank-lowney.blogspot.com/2013/11/dictation-audio-file-to-text-how-to.html
    2) Next, I tried to create a workflow for creating narration from text to address several needs. The need to have more than one voice in a screencast to provide point/counterpoint narration and the need to provide narration in languages other than English with a budget of zero. See “Free Voice Talent for Your Next ScreenFlow Screencast” at: http://frank-lowney.blogspot.com/2014/05/free-voice-talent-for-your-next.html
    3) That last one was a little complicated so I was really happy to discover new developments in MacOS X 10.10 Yosemite that enabled a simpler workflow in automating narrated screencast production. Using Keynote, now free, with ScreenFlow and the new Keynote Automation Suite enables rapid screencasting via TTS. See “Using Keynote and Automator for Rapid Screencasting” at: http://frank-lowney.blogspot.com/2014/11/using-keynote-and-automator-for-rapid.html
    In all of this poking around I discovered that there are a number of tricks that one can use to make synthesized speech more human sounding. They are woven through these blog posts so be on the lookout for mention of phonetic misspelling, the effect of punctuation marks on the speech engine and embedded speech commands such as [[slnc nnn]] where nnn is a period of silence lasting that number of miliseconds.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top