The interruption-and-resumption problem — solved

Most voice agents lose context the moment a customer cuts in. Veqa was designed from the ground up around this exact failure mode.

Interruption and resumption flow diagram

When the agent is walking a customer through a multi-step procedure — a login flow, a claim filing, a scheduling process — Veqa emits the response as a structured task plan, not a single block of speech. Each step is tracked individually with a status of pending, speaking, or completed.

When voice activity is detected from the caller mid-utterance, the TTS stream is cancelled at the current word boundary, the step is reverted to pending, and the LLM classifies the interruption: question, clarification, pause request, scope change, or end-of-call.

The handler for each type is different. A pause request gets a patient acknowledgement and a frozen task plan. A scope change re-plans the call. A question gets answered inline, then the agent resumes the procedure exactly where it stopped.

Example call · password reset

Agent: To reset your password, go to bankone-example.com/account, sign in with your username, click on your profile in the top right, then…

↑ TTS speaking · task plan step 1 of 4 in progress

Caller: Wait — let me actually open my laptop first.

↑ VAD interrupt · TTS cancelled · LLM classifies: pause_request

Agent: Of course — take your time. Just let me know when you're ready.

↑ Task plan frozen · step 1 reverted to pending

… (silence) …

Caller: Okay, I'm on the site now.

Agent: Great. So as I was saying — sign in with your username, then click on your profile in the top right…

↑ Resuming task plan · step 1 of 4 restarted

Architecture & Stack

Open-source first, NVIDIA-accelerated, and designed to drop into existing Asterisk-based SIP infrastructure without rebuilding the dialplan.

Telephony ingress

Asterisk 22 LTS (certified build)
AudioSocket protocol (TCP, bidirectional slin16)
asterisk-java ARI client for call control

Orchestration

Async orchestration core — one isolated worker per concurrent call
AudioSocket server (TCP, low-latency, TCP_NODELAY)
Streaming ASR ↔ LLM ↔ TTS pipeline (Pipecat-compatible)

Speech-to-Text (ASR)

NVIDIA Parakeet TDT — streaming, low-latency English
faster-whisper Large-v3 Turbo — Spanish and French
SenseVoice (Alibaba FunASR) — Cantonese specialist
Silero VAD — low-latency barge-in detection

Language understanding

Llama-3.3-Nemotron-Super-49B-Instruct — primary LLM (NVIDIA-tuned, FP8)
Qwen 2.5 32B Instruct — Chinese-language fallback
NVIDIA Triton Inference Server with TensorRT-LLM optimization

Text-to-Speech (TTS)

F5-TTS — natural voice cloning, English and Chinese
Kokoro — fast, deterministic neural voices (Spanish, French)
CosyVoice 2 (Alibaba) — Cantonese / Mandarin with emotional control
All TTS streams chunked and cancellable mid-utterance

GPU compute

NVIDIA RTX PRO 6000 Blackwell — 96 GB GDDR7, 24,064 CUDA cores
Workstation Edition (600W) or Max-Q (300W) for multi-GPU dense deployments
On-premise at customer site — voice audio never leaves the customer network

Early Access

Bring Veqa to your call flow.

We're onboarding a small cohort of early-access partners in healthcare, financial services, and regulated B2B. Tell us about your call volumes and compliance requirements and we'll be in touch within one business day.

Request Early Access

[email protected] · St. Petersburg, Florida