Documentation

Real-time telephony

Inbound / outbound voice via LiveKit + SIP, ASR/TTS streaming, barge-in, multi-language.

Real-time telephony

The telephony channel lets a betool pipeline answer or place calls. Real-time audio, streaming transcription, low-latency speech synthesis, and barge-in interruption are all supported.

Architecture

Under the hood:

  • LiveKit handles real-time audio transport.
  • LiveKit-SIP connects LiveKit to your carrier trunk (SIP).
  • A dedicated worker orchestrates the call: ASR (Deepgram, OpenAI Whisper), LLM (Claude, GPT-4o, private model), TTS (ElevenLabs, OpenAI TTS, Azure).

This stack runs as a separate process from the main backend. You do not configure it directly — the operator of your instance sets up the SIP bridge.

Prerequisites

  • A SIP trunk from a carrier (Twilio, Voxbone, OVH, Sewan, or a national operator).
  • An inbound number and / or the ability to place outbound calls.
  • A key from an ASR and TTS provider — or a private model on Enterprise.

On the Enterprise plan, betool can provision the SIP trunk and voice providers for you. Otherwise, enter the credentials in the admin panel.

Admin setup

  1. Administration → Telephony → Trunks — enter your carrier's SIP credentials.
  2. Administration → Telephony → Numbers — associate a number with a trunk, then with a target pipeline.
  3. Administration → Voice models — choose the ASR (input) and TTS (output). Unit usage counters are displayed.

Designing a voice pipeline

A voice pipeline always starts with a Start node with receiver phone_gateway. From there, the pipeline receives:

  • exchange.user_message — each transcribed turn of speech
  • exchange.intent — detected intent (if you activate a classifier agent)
  • exchange.channel.source_type — set to phone_gateway

Downstream nodes can return text that the TTS will read aloud. Specialized voice tools (barge-in, hangup, transfer, hold music) are automatically available to agents when the pipeline has phone_gateway upstream.

Best practices

  • Keep missions concise. Response latency matters: an agent that hesitates for 4 seconds sounds frozen on a call. Prefer fast models (Haiku, GPT-4o-mini) except for decision-critical turns.
  • Enable barge-in. Callers must be able to interrupt the agent. This is on by default.
  • Limit loops. A pipeline that iterates more than 3 times on the same turn creates unsettling silence for the caller. Monitor the iteration counter.

Costs

See Pricing. Indicative: 200 credits per minute of call + ASR / TTS / LLM. A 5-minute call typically costs $0.20 to $0.80 depending on the LLM model chosen.

Known limitations

  • No video support (yet).
  • Transfer to a human requires a SIP trunk that supports REFER (Twilio is compatible).
  • The agent cannot (yet) identify the caller without a CRM integration.