The <Speak/>
Element
This element reads the text as speech to the caller. It is very useful for dynamic text that cannot be pre-recorded. Variables can also be interpolated in the Speak text.
Element Attributes
Attribute Name | Description | Allowed Values | Default Value |
---|---|---|---|
language | Language to be used for output | ja-JP", en-US etc. Not required if attribute voice is passed | ja-JP |
voice | Voice to be used for output | Any Standard or Wavenet voice supported by Google TTS like en-US-Standard-C, ja-JP-Wavenet-B. Obs: Neural2 and Studio voices are not available | none |
loop | Number of time to repeat the output | integer between 1 and 5 | 1 |
purpose | indicates what this is used for. This is only relevant when this element is inside GetDigits or GetInput (elements with purpose="prompt" are used to compose the GetDigits/GetInput prompt. Elements with purpose="alert" are used to compose the 'input error' prompt | 'prompt' or 'alert' | 'prompt' |
Obs: Previously, we used another TTS provider (AI-Talk). This provider had its own set of voices. With the transition to Google TTS, we do the following conversions if old voices are specified:
1) The voices nozomi, sumire, kaho, maki, nanako, araki, anzu will be converted to ja-JP-Wavenet-B (female).
2) The voices seiji, osamu, hiroshi, koutaro will be converted to ja-JP-Wavenet-D (male).
Examples
Example 1: Hi this is Basix
When a call is directed to the following XML document, the caller will hear "Hi this is Basix" spoken once
1 2 3 4 |
<?xml version="1.0" encoding="UTF-8" ?> <IVR> <Speak voice="en-US-Standard-C">Hi this is Basix.</Speak> </IVR> |
Example 2: Hey, Hey, Hey
This XML document instructs Basix to say "Hey" thrice in a row.
1 2 3 4 |
<?xml version="1.0" encoding="UTF-8" ?> <IVR> <Speak voice="en-US-Standard-C" loop="3">Hey</Speak> </IVR> |
Example 3: interpolation of text to be played
1 2 3 4 5 6 7 |
<?xml version="1.0" encoding="UTF-8" ?> <IVR> <GetDigits> <Speak voice="en-US-Standard-C">Please input some digits</Speak> </GetDigits> <Speak voice="en-US-Standard-C" loop="3">You dialed {{Digits}}</Speak> </IVR> |
Example 4: interpolation with data a from JSON document
1 2 3 4 5 |
<?xml version="1.0" encoding="UTF-8" ?> <IVR> <GetJSON var="data" url="https://somewhere.com/info?calling_number={{CallingNumber}}"/> <Speak voice="{{data.voice}}">{{data.msg}}</Speak> </IVR> |
Example 5: Japanese
This XML document tells Basix to say a plain text and then a complex text using SSML
1 2 3 4 5 6 7 8 9 |
<?xml version="1.0" encoding="UTF-8" ?> <IVR> <Speak voice="ja-JP-Wavenet-B">おはようございます</Speak> <Speak voice="ja-JP-Wavenet-B"> <break time="100ms"/>お電話、ありがとうございます、 <break time="500ms"/> <emphasis level="strong">市役所です。</emphasis> </Speak> </IVR> |
ATTENTION:
Google Speech Synthesis has difficulties to phrase telephone numbers. So we will apply the following adjustments:
- if the text contains SSML, do nothing. If it is plain text, proceed with the next steps
- separate numbers with spaces. Ex: 08012341234 => 0 8 0 1 2 3 4 1 2 3 4
- convert '-' to SSML element break with time='0.1s" (and put the whole result inside an SSML speak tag).
For example: This XML:
1 2 3 4 |
<?xml version="1.0" encoding="UTF-8" ?> <IVR> <Speak voice="ja-JP-Wavenet-B">ご利用いただけるサービスがございません。050-0000-0000からおかけ直しください</Speak> </IVR> |
Will be the same as:
1 2 3 4 5 6 7 |
<?xml version="1.0" encoding="UTF-8" ?> <IVR> <Speak voice="ja-JP-Wavenet-B">ご利用いただけるサービスがございません。0 5 0 <break time="0.1s"/>6 8 6 8 <break time="0.1s"/>1 0 8 8からおかけ直しください </Speak> </IVR> |