I have always been a fan of the TTS256 – a tiny but great English text-to-speech IC based on a 8-bit microprocessor for embedded voice applications. Unfortunately, the TTS256 has been out of production for a long time and despite better technology being developed over the years, chip manufacturers do not seem to be interested in developing a similar or better text to speech IC, leaving the average electronics hobbyists searching eBay for second-hand TTS256 ICs, often listed at unreasonable price.
Nowadays, SpeakJet by Sparkfun and RoboVoice by Speechchips are among the few available text-to-speech modules for embedded projects. Both are priced at 20-30 USD and have pinout and interface commands similar to the TTS256. Although these speech modules come in handy, their price range seems a bit high for many projects. Hence, I decided to search for free alternatives.
Syntho and PICTalker
There are several open source text-to-speech projects for 8-bit microcontrollers such as Syntho and PICTalker, built for the PIC16F616 and PIC16F628 respectively. In both projects, one or more EEPROMs are used to store the phoneme database. The EEPROM size is around 64K for the PICTalker project and is made smaller by using innovative compression techniques in the Syntho project. Both projects require phonemes (and not English text) to be sent before they can be pronounced. This is due to the lack of a rule database to convert text to phonemes, presumably due to the limited amount of memory available. These solutions are somewhat closer to the SPO256, which requires phonemes as input, rather than the TTS256, which accepts English text.
If you don’t know what phonemes are, read this on Wikipedia. They are simply the phonetic representations of a word’s pronunciation. There are approximately 44 phonemes in English to represent both vowels and consonants.
Below is a voice sample of the Syntho project, which is trying to say “I am a really cheap computer”:
As expected, the voice sounds too mechanical and can hardly be understood.
Arduino TTS library
Next I came across another TTS library made for the Arduino and decided to give it a quick try to test the speech quality. As I do not have an Arduino board available, I ported them to a Visual Studio 2012 C++ application which accepts English text as input and saves the resulting speech as a wave file. The ported code can be downloaded here. If you intend to use this code, take note that it only writes the wave data for the generated speech and ignores the wave file header. You will probably need a professional sound editor software such as Goldwave to play and examine the generated file. This is because calculating the exact total duration of the generated speech (required for creating the wave file header) is complicated and there is no need for me to attempt that since the code is only for testing.
This is the generated voice sample. It is trying to say “Hello Master, how are you doing? I am fine, thank you.”:
Although the quality is obviously better than the PICTalker, it still sounds robotic and difficult to understand.
SAM (Software Automatic Mouth) project
My next attempt is to see if the same can be done on a PIC, with better speech quality. By chance, I came across SAM (Software Automatic Mouth), a tiny (less than 39KB) text-to-speech C program. The project website contains a tool to generate a demo voice from the text entered.
After getting the Windows source code to compile and run without issues in Visual Studio (download the project here), I decided to port the code to the PIC24FJ64GA002, which is surprisingly rather straightforward. The only challenge was to get all the 32-bit data types in the original source code ported properly to the 16-bit architecture of the PIC24 micro-controller, and to get the rule and phoneme database fit nicely into the PIC24FJ64GA002 available memory. Fortunately, the entire project when compiled uses just around 50% of the total program and data memory on the PIC24FJ64GA002, leaving available space for other codes.
You may be able to fit the project into the smaller PIC24FJ32GA002 by changing the project build options to use the large memory model during compilation:
However, in my experiment, the code compiled but ran erratically when using large memory model, perhaps due to pointer behavior differences. It is therefore better to compile with the default settings and use the PIC24FJ64GA002 (or one with more memory) to save the trouble and have more code space for other purposes.
The following is a recording of the generated speech for the sentence “This is SAM text to speech. I am so small that I will work also on embedded computers”, when running on the PIC24 using PWM for audio output.
Below is a longer demo speech. Can you understand what it is trying to say?
As can be seen, the quality of the generated speech is much better – less mechanical, clearer pronunciation and easier to understand. Although the voice still sounds robotic and there are some mispronounced words, the overall quality should be good enough to be used in embedded projects, as a free alternative to current commercial text to speech solutions.
With some pitch adjustments, the PIC24 can also sing “The Star-Spangled Banner”, the national anthem of the United States of America:
The complete ported code, as a MPLAB 8 PIC24FJ64GA002 project, can be downloaded here. The project also contains example codes for the SYN6288, a Chinese text-to-speech module.