The <Speak/> Element

This element reads the text as speech to the caller. It is very useful for dynamic text that cannot be pre-recorded. Variables can also be interpolated in the Speak text.

Element Attributes

Attribute Name Description Allowed Values Default Value
language Language to be used for output ja-JP", en-US etc. Not required if attribute voice is passed ja-JP
voice Voice to be used for output Any Standard or Wavenet voice supported by Google TTS like en-US-Standard-C, ja-JP-Wavenet-B. Obs: Neural2 and Studio voices are not available none
loop Number of time to repeat the output integer between 1 and 5 1
purpose indicates what this is used for. This is only relevant when this element is inside GetDigits or GetInput (elements with purpose="prompt" are used to compose the GetDigits/GetInput prompt. Elements with purpose="alert" are used to compose the 'input error' prompt 'prompt' or 'alert' 'prompt'

Obs: Previously, we used another TTS provider (AI-Talk). This provider had its own set of voices. With the transition to Google TTS, we do the following conversions if old voices are specified:

1) The voices nozomi, sumire, kaho, maki, nanako, araki, anzu will be converted to ja-JP-Wavenet-B (female).

2) The voices seiji, osamu, hiroshi, koutaro will be converted to ja-JP-Wavenet-D (male).

Examples

Example 1: Hi this is Basix

When a call is directed to the following XML document, the caller will hear "Hi this is Basix" spoken once

1
2
3
4
<?xml version="1.0" encoding="UTF-8" ?>
<IVR>
	<Speak voice="en-US-Standard-C">Hi this is Basix.</Speak>
</IVR>

Example 2: Hey, Hey, Hey

This XML document instructs Basix to say "Hey" thrice in a row.

1
2
3
4
<?xml version="1.0" encoding="UTF-8" ?>
<IVR>
	<Speak voice="en-US-Standard-C" loop="3">Hey</Speak>
</IVR>

Example 3: interpolation of text to be played

1
2
3
4
5
6
7
<?xml version="1.0" encoding="UTF-8" ?>
<IVR>
	<GetDigits>
		<Speak voice="en-US-Standard-C">Please input some digits</Speak>
	</GetDigits>
	<Speak voice="en-US-Standard-C" loop="3">You dialed {{Digits}}</Speak>
</IVR>

Example 4: interpolation with data a from JSON document

1
2
3
4
5
<?xml version="1.0" encoding="UTF-8" ?>
<IVR>
	<GetJSON var="data" url="https://somewhere.com/info?calling_number={{CallingNumber}}"/>
	<Speak voice="{{data.voice}}">{{data.msg}}</Speak>
</IVR>

Example 5: Japanese

This XML document tells Basix to say a plain text and then a complex text using SSML

1
2
3
4
5
6
7
8
9
<?xml version="1.0" encoding="UTF-8" ?>
<IVR>
	<Speak voice="ja-JP-Wavenet-B">おはようございます</Speak>
	<Speak voice="ja-JP-Wavenet-B">
		<break time="100ms"/>お電話、ありがとうございます、
		<break time="500ms"/>
		<emphasis level="strong">市役所です。</emphasis>
	</Speak>
</IVR>

ATTENTION:

Google Speech Synthesis has difficulties to phrase telephone numbers. So we will apply the following adjustments:

  • if the text contains SSML, do nothing. If it is plain text, proceed with the next steps
  • separate numbers with spaces. Ex: 08012341234 => 0 8 0 1 2 3 4 1 2 3 4
  • convert '-' to SSML element break with time='0.1s" (and put the whole result inside an SSML speak tag).

For example: This XML:

1
2
3
4
<?xml version="1.0" encoding="UTF-8" ?>
<IVR>
	<Speak voice="ja-JP-Wavenet-B">ご利用いただけるサービスがございません。050-0000-0000からおかけ直しください</Speak>
</IVR>

Will be the same as:

1
2
3
4
5
6
7
<?xml version="1.0" encoding="UTF-8" ?>
<IVR>
	<Speak voice="ja-JP-Wavenet-B">ご利用いただけるサービスがございません。0 5 0
		<break time="0.1s"/>6 8 6 8
		<break time="0.1s"/>1 0 8 8からおかけ直しください
	</Speak>
</IVR>