The interruption-and-resumption problem — solved
Most voice agents lose context the moment a customer cuts in. Veqa was designed from the ground up around this exact failure mode.

When the agent is walking a customer through a multi-step procedure — a login flow, a claim filing, a scheduling process — Veqa emits the response as a structured task plan, not a single block of speech. Each step is tracked individually with a status of pending, speaking, or completed.
When voice activity is detected from the caller mid-utterance, the TTS stream is cancelled at the current word boundary, the step is reverted to pending, and the LLM classifies the interruption: question, clarification, pause request, scope change, or end-of-call.
The handler for each type is different. A pause request gets a patient acknowledgement and a frozen task plan. A scope change re-plans the call. A question gets answered inline, then the agent resumes the procedure exactly where it stopped.
Architecture & Stack
Open-source first, NVIDIA-accelerated, and designed to drop into existing Asterisk-based SIP infrastructure without rebuilding the dialplan.

Telephony ingress
- Asterisk 22 LTS (certified build)
- AudioSocket protocol (TCP, bidirectional slin16)
- asterisk-java ARI client for call control
Orchestration
- Async orchestration core — one isolated worker per concurrent call
- AudioSocket server (TCP, low-latency, TCP_NODELAY)
- Streaming ASR ↔ LLM ↔ TTS pipeline (Pipecat-compatible)
Speech-to-Text (ASR)
- NVIDIA Parakeet TDT — streaming, low-latency English
- faster-whisper Large-v3 Turbo — Spanish and French
- SenseVoice (Alibaba FunASR) — Cantonese specialist
- Silero VAD — low-latency barge-in detection
Language understanding
- Llama-3.3-Nemotron-Super-49B-Instruct — primary LLM (NVIDIA-tuned, FP8)
- Qwen 2.5 32B Instruct — Chinese-language fallback
- NVIDIA Triton Inference Server with TensorRT-LLM optimization
Text-to-Speech (TTS)
- F5-TTS — natural voice cloning, English and Chinese
- Kokoro — fast, deterministic neural voices (Spanish, French)
- CosyVoice 2 (Alibaba) — Cantonese / Mandarin with emotional control
- All TTS streams chunked and cancellable mid-utterance
GPU compute
- NVIDIA RTX PRO 6000 Blackwell — 96 GB GDDR7, 24,064 CUDA cores
- Workstation Edition (600W) or Max-Q (300W) for multi-GPU dense deployments
- On-premise at customer site — voice audio never leaves the customer network
Bring Veqa to your call flow.
We're onboarding a small cohort of early-access partners in healthcare, financial services, and regulated B2B. Tell us about your call volumes and compliance requirements and we'll be in touch within one business day.
[email protected] · St. Petersburg, Florida