mirlyDownload

how it works

From audio sample
to first token
in milliseconds.

A walkthrough of the pipeline, the OS APIs we use, and the latency budget at each stage. No magic. Just three of the fastest pieces of available infrastructure, glued together carefully.

01

Paste your context once

On first launch, paste your résumé, your target job description, and three to five of your STAR stories. Stays on your device — never persisted on our servers.

  • résumé text
  • target JD
  • 3–5 STAR stories
  • optional 60-sec voice sample

02

Start a session before you join the call

Press the global hotkey. Mirly opens a small overlay window in the corner of your screen. Pre-warm WebSocket connections to Deepgram + Anthropic + Groq before the call begins.

  • hotkey ⌘⇧Space
  • window invisible to screen share
  • pre-warmed connections

03

Speak. Listen. Read.

You hear the interviewer. Mirly transcribes them in real time. The moment they finish a sentence, an answer in your voice — built from your résumé, your stories — streams into the overlay.

  • speech_final triggers LLM
  • sub-150ms first token
  • cached profile prefix

latency budget

Every millisecond,
budgeted.

The hot path is <0ms p50, end-to-end on Apple Silicon. Below is the per-stage budget, to the millisecond.

stagebudgetimplementation
Audio capture~16msgetDisplayMedia + getUserMedia
STT partial (whisper.cpp)60–80mson-device, Apple Silicon
Question detection<5msspeech_final + heuristics
LLM call (cached)60–100msGroq Llama 3.3 instant draft
Render (DOM paint)<16msCSS opacity transition

stealth

A documented
OS API, not a trick.

When you screen-share on Zoom, Teams, Meet, or Webex, those apps call BitBlt (Windows) or CGWindowListCopyWindowInfo (macOS) to capture the screen. Apple and Microsoft both ship documented APIs to exclude specific windows from those calls — used by 1Password to hide passwords, by Netflix to enforce DRM, by Apple Pay to hide card details from screen recording.

Mirly uses those APIs. The interviewer never sees the overlay. We do not run a kernel extension, do not hook system calls, do not do anything that would break with the next OS update.

macOSNSWindow.sharingType = .none
WindowsSetWindowDisplayAffinity(hWnd, WDA_EXCLUDEFROMCAPTURE)
mirly · listening
142ms
Q

Tell me about IAM permission sprawl at scale.

A
• At Stripe the merchant team had 1,200+ IAM roles with overlapping policies.
• I built an IAM analyzer in Go that diffed real CloudTrail traffic against role policies.
• It opened minimal-policy PRs automatically — three rollout waves, four months.
• Cut 87% of unused permissions, ended at 312 roles.
• IAM-related Sev2 incidents went to zero for two consecutive quarters.
profile · sam patelclaude sonnet 4.6 · cached prefix

questions

Common questions, answered.

How is mirly invisible to screen share?

+

On macOS, mirly calls NSWindow.sharingType = .none — the same documented API password managers and DRM playback use. On Windows, it calls SetWindowDisplayAffinity(WDA_EXCLUDEFROMCAPTURE). Both exclude the window from BitBlt and DXGI Desktop Duplication, which is what Zoom, Teams, Meet, and Webex use to capture your screen. We do not use a kernel exploit, hide the process, or rely on any trick.

Why is it sub-150ms?

+

On-device whisper.cpp on Apple Silicon emits transcript partials in 60–80 ms with no network round-trip. We fire a Groq Llama 3.3 draft on the first high-confidence partial (speculative execution), and Claude Sonnet 4.6 streams the refined answer with prompt caching on the system prompt + your profile. The first useful token lands before the interviewer has finished asking the question.

Does my résumé leave my device?

+

It is sent to your chosen LLM provider as part of the per-session system prompt — that is how the personalization works. We do not persist it on our servers; it lives in localStorage on your device. You can also configure the LLM API keys yourself (BYOK) so the model traffic does not pass through our backend at all.

What if Zoom updates and breaks stealth?

+

We test the screen-capture matrix every hour on real macOS and real Windows. The status page shows green/amber/red per cell, last-tested timestamp. If Zoom ships an update that breaks WDA_EXCLUDEFROMCAPTURE, the relevant cell turns red and we ship a fix in under 72 hours. The status page is the SLA, public.

See it in motion.