The interruption-and-resumption problem — solved

Most voice agents lose context the moment a customer cuts in. Veqa was designed from the ground up around this exact failure mode.

Interruption and resumption flow diagram

When the agent is walking a customer through a multi-step procedure — a login flow, a claim filing, a scheduling process — Veqa emits the response as a structured task plan, not a single block of speech. Each step is tracked individually with a status of pending, speaking, or completed.

When voice activity is detected from the caller mid-utterance, the TTS stream is cancelled at the current word boundary, the step is reverted to pending, and the LLM classifies the interruption: question, clarification, pause request, scope change, or end-of-call.

The handler for each type is different. A pause request gets a patient acknowledgement and a frozen task plan. A scope change re-plans the call. A question gets answered inline, then the agent resumes the procedure exactly where it stopped.

Example call · password reset
Agent: To reset your password, go to bankone-example.com/account, sign in with your username, click on your profile in the top right, then…
↑ TTS speaking · task plan step 1 of 4 in progress
Caller: Wait — let me actually open my laptop first.
↑ VAD interrupt · TTS cancelled · LLM classifies: pause_request
Agent: Of course — take your time. Just let me know when you're ready.
↑ Task plan frozen · step 1 reverted to pending
… (silence) …
Caller: Okay, I'm on the site now.
Agent: Great. So as I was saying — sign in with your username, then click on your profile in the top right…
↑ Resuming task plan · step 1 of 4 restarted

Architecture & Stack

Open-source first, NVIDIA-accelerated, and designed to drop into existing Asterisk-based SIP infrastructure without rebuilding the dialplan.

Veqa end-to-end architecture diagram

Telephony ingress

  • Asterisk 22 LTS (certified build)
  • AudioSocket protocol (TCP, bidirectional slin16)
  • asterisk-java ARI client for call control

Orchestration

  • Async orchestration core — one isolated worker per concurrent call
  • AudioSocket server (TCP, low-latency, TCP_NODELAY)
  • Streaming ASR ↔ LLM ↔ TTS pipeline (Pipecat-compatible)

Speech-to-Text (ASR)

  • NVIDIA Parakeet TDT — streaming, low-latency English
  • faster-whisper Large-v3 Turbo — Spanish and French
  • SenseVoice (Alibaba FunASR) — Cantonese specialist
  • Silero VAD — low-latency barge-in detection

Language understanding

  • Llama-3.3-Nemotron-Super-49B-Instruct — primary LLM (NVIDIA-tuned, FP8)
  • Qwen 2.5 32B Instruct — Chinese-language fallback
  • NVIDIA Triton Inference Server with TensorRT-LLM optimization

Text-to-Speech (TTS)

  • F5-TTS — natural voice cloning, English and Chinese
  • Kokoro — fast, deterministic neural voices (Spanish, French)
  • CosyVoice 2 (Alibaba) — Cantonese / Mandarin with emotional control
  • All TTS streams chunked and cancellable mid-utterance

GPU compute

  • NVIDIA RTX PRO 6000 Blackwell — 96 GB GDDR7, 24,064 CUDA cores
  • Workstation Edition (600W) or Max-Q (300W) for multi-GPU dense deployments
  • On-premise at customer site — voice audio never leaves the customer network
Early Access

Bring Veqa to your call flow.

We're onboarding a small cohort of early-access partners in healthcare, financial services, and regulated B2B. Tell us about your call volumes and compliance requirements and we'll be in touch within one business day.

[email protected] · St. Petersburg, Florida