Vernier STT Engine
Converse uses a high-accuracy speech recognition service optimized for real-time telephony environments, including noisy backgrounds, strong accents, and multilingual conversations.
Language Support
Set the language in the Flow's Start node (for flows) or in the Agent's Language setting. This configures both speech recognition and voice synthesis for that call.
Supported languages include English, Hindi, Tamil, Telugu, Malayalam, Kannada, Marathi, Spanish, French, German, and Arabic. The language setting must match the expected caller language for best accuracy.
Multilingual callers
How it works in a call
Speech recognition runs as a live stream during the call:
- Streaming: Partial transcripts are generated in real time as the caller speaks — the system doesn't wait for a pause.
- Voice activity detection (VAD): The system detects when the caller has finished speaking and triggers processing. This balances responsiveness with accuracy.
- Turn detection: Smart turn detection avoids cutting off the caller mid-sentence while keeping response times low.
Accuracy considerations
- Phone audio quality: Transcription is optimized for telephony audio (8kHz or 16kHz), including common background noise.
- Proper nouns and brand names: If your agent deals with specific product names, account codes, or unusual terms, include them in the system prompt so the LLM can interpret near-matches correctly.
- Numbers and dates: Spoken numbers, dates, and amounts are formatted correctly in the transcript (e.g., "twenty-five thousand rupees" → "₹25,000").
Barge-in
When a caller speaks while the agent is talking, barge-in detection stops the current TTS output and starts processing the new input immediately. This makes conversations feel natural rather than forced-sequential. Barge-in sensitivity is configured per agent in the Advanced settings.
Testing STT
Go to Playground → STT tab to upload an audio file and see the raw transcript. Use the Flow tab with voice mode enabled to test speech recognition end-to-end in your actual flow.
