Google STT

Google STT

The purpose of this activity is to utilize Google's Speech-to-Text service.

The activity's configuration requires the following parameters:

 

lang

 

Defines the language code to be used for speech recognition. For example, "en-US" for US English or "es-ES" for Spain's Spanish. This code determines the language in which the audio will be interpreted.

 

silence

 

Specifies the duration of silence, in seconds, to be detected before considering the audio capture complete. A low value (e.g., 2) means that if 2 consecutive seconds of no speech are detected, the system will stop recording and proceed with the speech recognition.

 

interruptKey (optional)

 

Indicates the key or keys that, when pressed, will interrupt the recording before silence is detected. The default is typically "#", but it can be configured to any digit or a combination of digits and the # or ***** symbols. For instance, "any" means that any digit or key will interrupt the recording.

 

beep

 

Determines whether a tone (beep) will be played before the recording starts. For example, if set to "NOBEEP", no sound will be played when the recording begins. If this parameter is omitted or left empty, a beep will be played at the start.

 

timeout

 

Sets the absolute maximum recording time in seconds. Once this time has elapsed, the recording will stop automatically, regardless of whether an interrupt key was pressed or silence was detected. For example, a value of 10 would cause the recording to end 10 seconds after it started.

 

speechContexts

 

Allows you to provide a list of suggested words or phrases to help the speech recognition API more accurately interpret specific terms. This is generally specified as a string with words separated by commas (e.g., "Agamemnon,Midas"). These contexts improve recognition when uncommon terms, proper nouns, or specialized vocabulary are expected.

The result of the speech recognition is stored in a variable named “res”, which can be used anywhere in the flow by referencing it like any other variable (e.g., ${res}).

The speech recognition service must be acquired as a Google service and linked to a token.

This token is what must be used in the activity's configuration.

The service configuration must be modified from the file located at the following server path: /var/lib/asterisk/agi-bin/speech-recog.agi

The variable that needs to be configured with the token is: my $key = "";