BYOK — why it is non-negotiable in regulated industries

AI vendors typically offer two billing models: a shared pool where you pay a flat fee that covers tokens, or a BYOK mode where you provide your own API key and pay your provider directly.

In standard SaaS, the shared pool is convenient. In regulated environments, it is untenable.

The problem with the shared pool

When your vendor makes your OpenAI / Anthropic / Mistral key available to all its customers via a single master key:

You lose the accounting detail. It becomes impossible to prove to your auditor exactly how many tokens you consumed at the provider — you receive an aggregated invoice.
You lose the contractual chain. Your data flows through the vendor's account. The data processing agreement (DPA) with OpenAI no longer covers you directly.
You lose control of the model. The vendor can silently switch from model A to model B for cost reasons. Your benchmarks regress and you do not know why.
You lose immediate revocation. In the event of an incident, you cannot revoke the key without asking the vendor first.

What BYOK guarantees

With BYOK:

Your prompts and completions flow directly between your provider account and betool — not through a pooled third-party account.
Your DPA with the provider remains the sole contractual chain. No opaque sub-processing.
Your billing is transparent: the provider invoices you, you see every line item. betool only invoices you for orchestration.
Your revocation is instant: regenerate the key at the provider, and the model becomes inaccessible within seconds.

The private-model option (self-hosted)

For organisations with the strictest requirements (banking, defence, healthcare), even BYOK is not enough: sending your prompts to OpenAI or Anthropic still constitutes a transfer to the United States.

The solution is a private model:

Ollama on your GPU, for open-source models (Llama, Qwen, Mistral, DeepSeek).
vLLM on a GPU cluster for high-throughput production.
Azure OpenAI / AWS Bedrock when you have a private cloud contract.

Your prompts never leave your perimeter. Latency is under your control. Compliance is total.

What it costs

The myth is that self-hosting an LLM costs a fortune. In reality, for modern open-source models in the Llama 3 / Qwen 2 class:

A dual A100 80 GB GPU server can serve a Llama-3 70B in production.
At moderate usage (a few thousand exchanges per day), a single A6000 GPU is sufficient to serve a 32B model with sub-second latency.
Monthly amortisation cost: ~$2,000 to $5,000 depending on your procurement strategy (rental vs. purchase).

Compared to an equivalent OpenAI spend over 12 months, the investment pays back in a matter of months at serious volumes — and the ROI flips entirely if you have sovereignty constraints that make OpenAI unacceptable.

Our choice

betool has been BYOK-native since day one. You cannot use the platform without connecting your own keys. This is intentional:

The shared pool creates compliance debt that we refuse to take on for our regulated customers.
The asymmetry between "small customers" and "large customers" on sovereignty is not acceptable.
The experience we want to build — "I can see exactly what I consume, where, and with whom" — is only possible with BYOK.

It adds one step to onboarding (15 minutes to retrieve a key from your provider). It is an investment that pays off over time.