KalyanChakravarthy.net

home photos apps about

Text to speech on iOS

Mon 30 June 2014

Starting iOS7, the SDK contains a new set of APIs for performing text-to-speech related operations under the library AVFoundation.

There are 3 components required for performing text-2-speech on iOS

  1. Voice

    This determines the speech synthesis voice and language. The default is English.

    For eg: Consider the russian letter "Ж" - english voice pronounces it as Crylic Za while the russian voice actually pronounces it as the alphabet.

    // Instantiate Russian voice/pronounciation
    AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"ru-RU"];
    
  2. Utterance

    Here is the definition from the apple docs

    An AVSpeechUtterance is the basic unit of speech synthesis. An utterance encapsulates some amount of text to be spoken and a set of parameters affecting its speech: voice, pitch, rate, and delay.

    AVSpeechUtterance *utterance = [AVSpeechUtterance speechUtteranceWithString:strToSpeak];
    utterance.rate = AVSpeechUtteranceMinimumSpeechRate;
    utterance.voice = voice;
    utterance.pitchMultiplier = 0.5;
    

    Attributes

    1. Rate

      The speed with which utterance is to be spoken. It ranges from 0.0 (very slow) -> 1.0(normal) -> 2.0 (double speed).

    2. Voice

      The voice with which utterance should be spoken in

    3. Pitch

      This is the only part of voice that we have real control over as we cannot yet modify the gender or other attributes

  3. Synthesizer

    AVSpeechSynthesizer produces synthesized speech and also provides with methods for controlling or monitoring the progress of the speech utterance.

    Speech synthesizer also accepts a delegate to send messages to custom classes to handle: paused, stopped, completed, etc events.

Example

Here is a simple example to speak all russian alphabets

NSString *charSentence = @"А- Б- В- Г- Д- Е- Ё- Ж- З- И- Й- К- Л- М- Н- О- П- Р- С- Т- У- Ф- Х- Ц- Ч- Ш- Щ- Ъ- Ы- Ь- Э- Ю- Я-";

// Instantiate Russian voice/pronounciation
AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"ru-RU"];

// Create the Utterance
AVSpeechUtterance *utterance = [AVSpeechUtterance speechUtteranceWithString:strToSpeak];
utterance.rate = AVSpeechUtteranceMinimumSpeechRate;
utterance.voice = voice;
utterance.pitchMultiplier = 0.5;

// Create the Synthesizer
AVSpeechSynthesizer *synth = [[AVSpeechSynthesizer alloc] init];    
[synth speakUtterance:charUtterance];

Note:

  • The hiphen after each character is required for the synthesizer to identify and speak it as an individual character and not as a word in a sentence.