Inbound / outbound voice via LiveKit + SIP, ASR/TTS streaming, barge-in, multi-language.

Real-time telephony

The telephony channel lets a betool pipeline answer or place calls. Real-time audio, streaming transcription, low-latency speech synthesis, and barge-in interruption are all supported.

Architecture

Under the hood:

LiveKit handles real-time audio transport.
LiveKit-SIP connects LiveKit to your carrier trunk (SIP).
A dedicated worker orchestrates the call: ASR (Deepgram, OpenAI Whisper), LLM (Claude, GPT-4o, private model), TTS (ElevenLabs, OpenAI TTS, Azure).

This stack runs as a separate process from the main backend. You do not configure it directly — the operator of your instance sets up the SIP bridge.

Prerequisites

A SIP trunk from a carrier (Twilio, Voxbone, OVH, Sewan, or a national operator).
An inbound number and / or the ability to place outbound calls.
A key from an ASR and TTS provider — or a private model on Enterprise.

On the Enterprise plan, betool can provision the SIP trunk and voice providers for you. Otherwise, enter the credentials in the admin panel.

Admin setup

Administration → Telephony → Trunks — enter your carrier's SIP credentials.
Administration → Telephony → Numbers — associate a number with a trunk, then with a target pipeline.
Administration → Voice models — choose the ASR (input) and TTS (output). Unit usage counters are displayed.

Designing a voice pipeline

A voice pipeline always starts with a Start node with receiver phone_gateway. From there, the pipeline receives:

exchange.user_message — each transcribed turn of speech
exchange.intent — detected intent (if you activate a classifier agent)
exchange.channel.source_type — set to phone_gateway

Downstream nodes can return text that the TTS will read aloud. Specialized voice tools (barge-in, hangup, transfer, hold music) are automatically available to agents when the pipeline has phone_gateway upstream.

Best practices

Keep missions concise. Response latency matters: an agent that hesitates for 4 seconds sounds frozen on a call. Prefer fast models (Haiku, GPT-4o-mini) except for decision-critical turns.
Enable barge-in. Callers must be able to interrupt the agent. This is on by default.
Limit loops. A pipeline that iterates more than 3 times on the same turn creates unsettling silence for the caller. Monitor the iteration counter.

Costs

See Pricing. Indicative: 200 credits per minute of call + ASR / TTS / LLM. A 5-minute call typically costs $0.20 to $0.80 depending on the LLM model chosen.

Known limitations

No video support (yet).
Transfer to a human requires a SIP trunk that supports REFER (Twilio is compatible).
The agent cannot (yet) identify the caller without a CRM integration.