Real-time telephony
The telephony channel lets a betool pipeline answer or place calls. Real-time audio, streaming transcription, low-latency speech synthesis, and barge-in interruption are all supported.
Architecture
Under the hood:
- LiveKit handles real-time audio transport.
- LiveKit-SIP connects LiveKit to your carrier trunk (SIP).
- A dedicated worker orchestrates the call: ASR (Deepgram, OpenAI Whisper), LLM (Claude, GPT-4o, private model), TTS (ElevenLabs, OpenAI TTS, Azure).
This stack runs as a separate process from the main backend. You do not configure it directly — the operator of your instance sets up the SIP bridge.
Prerequisites
- A SIP trunk from a carrier (Twilio, Voxbone, OVH, Sewan, or a national operator).
- An inbound number and / or the ability to place outbound calls.
- A key from an ASR and TTS provider — or a private model on Enterprise.
On the Enterprise plan, betool can provision the SIP trunk and voice providers for you. Otherwise, enter the credentials in the admin panel.
Admin setup
- Administration → Telephony → Trunks — enter your carrier's SIP credentials.
- Administration → Telephony → Numbers — associate a number with a trunk, then with a target pipeline.
- Administration → Voice models — choose the ASR (input) and TTS (output). Unit usage counters are displayed.
Designing a voice pipeline
A voice pipeline always starts with a Start node with receiver phone_gateway. From there, the pipeline receives:
exchange.user_message— each transcribed turn of speechexchange.intent— detected intent (if you activate a classifier agent)exchange.channel.source_type— set tophone_gateway
Downstream nodes can return text that the TTS will read aloud. Specialized voice tools (barge-in, hangup, transfer, hold music) are automatically available to agents when the pipeline has phone_gateway upstream.
Best practices
- Keep missions concise. Response latency matters: an agent that hesitates for 4 seconds sounds frozen on a call. Prefer fast models (Haiku, GPT-4o-mini) except for decision-critical turns.
- Enable barge-in. Callers must be able to interrupt the agent. This is on by default.
- Limit loops. A pipeline that iterates more than 3 times on the same turn creates unsettling silence for the caller. Monitor the iteration counter.
Costs
See Pricing. Indicative: 200 credits per minute of call + ASR / TTS / LLM. A 5-minute call typically costs $0.20 to $0.80 depending on the LLM model chosen.
Known limitations
- No video support (yet).
- Transfer to a human requires a SIP trunk that supports REFER (Twilio is compatible).
- The agent cannot (yet) identify the caller without a CRM integration.