latency · pillar 01

From last syllable
to first token,
in <150ms.

Multi-second lag is the #1 'AI tell' interviewers cite. Mirly runs the entire pipeline an order of magnitude faster than the incumbent — with documented budgets per stage, reproducible benchmarks, and raw data published.

Mirly delivers a first useful token in 127ms p50 and 189msp95, measured end-to-end from the last syllable of the interviewer’s question to the first pixel rendered on the candidate’s screen. Final Round AI on the same machine measured 1,810ms p50 — a 14× gap.

pipeline budget

Every millisecond, accounted for.

stagebudgetimplementation

Audio frame ready0msgetDisplayMedia + getUserMedia capture

STT partial token0mswhisper.cpp on Apple Silicon

Question detection0msspeech_final + heuristic match

LLM cache hit0msGroq Llama 3.3 instant skeleton

Render0msCSS opacity transition, no layout

total0msp50, end-to-end, opt-in telemetry

benchmark

5–14× faster than the rest.

Same machine, same audio source, same question. Warm p50 in milliseconds — last syllable to first visible token, frame-counted at 60fps.

Mirly

0ms

Parakeet AI

0ms

LockedIn AI

0ms

Verve AI

0ms

Cluely

0ms

Sensei AI

0ms

Final Round AI

0ms

methodology

Reproducible. Auditable.

Published benchmarks should be re-runnable by anyone with the same hardware. Ours are. Raw WAV, screen-recordings, and frame-count spreadsheet at github.com/mirly/latency-benchmark-2026.

Hardware: MacBook Air M2 · 16GB · macOS 14.5 · plugged in, single-app foreground
Network: Gigabit ethernet, London — deliberately worst-case for US-East vendors
Audio source: Pre-recorded 16kHz mono WAV, played into system audio via BlackHole
Question: "Tell me about a time you led a contentious technical decision" — same across all tools
Metric: Last syllable of question → first visible token, 60fps frame count
Runs: 10 per tool — 1 cold + 9 warm; p50 = median(warm)
Date: 2026-05-15 · vendor versions tabulated below

Full teardown: /blog/latency-teardown-6-copilots

Feel the difference.

Download for Mac →See personalization

From last syllable to first token, in <150ms.

Every millisecond, accounted for.

5–14× faster than the rest.

Reproducible. Auditable.

Feel the difference.

From last syllable
to first token,
in <150ms.