Text to speech on iOS

Jun 30, 2014

category: programming

Starting iOS7, the SDK contains a new set of APIs for performing text-to-speech related operations under the library AVFoundation.

There are 3 components required for performing text-2-speech on iOS

Voice

This determines the speech synthesis voice and language. The default is English.

For eg: Consider the russian letter "Ж" - english voice pronounces it as Crylic Za while the russian voice actually pronounces it as the alphabet.

 // Instantiate Russian voice/pronounciation
 AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"ru-RU"];

Utterance

Here is the definition from the apple docs

An AVSpeechUtterance is the basic unit of speech synthesis. An utterance encapsulates some amount of text to be spoken and a set of parameters affecting its speech: voice, pitch, rate, and delay.

 :::objc
 AVSpeechUtterance *utterance = [AVSpeechUtterance speechUtteranceWithString:strToSpeak];
 utterance.rate = AVSpeechUtteranceMinimumSpeechRate;
 utterance.voice = voice;
 utterance.pitchMultiplier = 0.5;

Attributes

1. Rate

The speed with which utterance is to be spoken. It ranges from 0.0 (very slow) -> 1.0(normal) -> 2.0 (double speed).


2. Voice

The voice with which utterance should be spoken in


3. Pitch

This is the only part of voice that we have real control over as we cannot yet modify the gender or other attributes

Synthesizer

AVSpeechSynthesizer produces synthesized speech and also provides with methods for controlling or monitoring the progress of the speech utterance.

Speech synthesizer also accepts a delegate to send messages to custom classes to handle: paused, stopped, completed, etc events.

Example

Here is a simple example to speak all russian alphabets

NSString *charSentence = @"А- Б- В- Г- Д- Е- Ё- Ж- З- И- Й- К- Л- М- Н- О- П- Р- С- Т- У- Ф- Х- Ц- Ч- Ш- Щ- Ъ- Ы- Ь- Э- Ю- Я-";

// Instantiate Russian voice/pronounciation
AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"ru-RU"];

// Create the Utterance
AVSpeechUtterance *utterance = [AVSpeechUtterance speechUtteranceWithString:strToSpeak];
utterance.rate = AVSpeechUtteranceMinimumSpeechRate;
utterance.voice = voice;
utterance.pitchMultiplier = 0.5;

// Create the Synthesizer
AVSpeechSynthesizer *synth = [[AVSpeechSynthesizer alloc] init];    
[synth speakUtterance:charUtterance];

Note:

The hiphen after each character is required for the synthesizer to identify and speak it as an individual character and not as a word in a sentence.

Next: Zhang-Suen Thinning Algorithm
Previous: Sniff HTTP traffic on iOS

Text to speech on iOS

Example

More posts