Essay · July 4, 2026 · 4 min read

Dictation should stay home

Typing is deliberate. You compose, you backspace, you decide. Speech is not like that. When you dictate, the computer hears the half-formed version -- the names of your kids and your doctors, the number you're about to counter-offer with, the sentence you'd never have typed because you deleted it in your head first.

That makes voice the most intimate input surface a computer has. And the industry's default architecture for it -- stream the audio to our servers, we'll take it from there -- has a track record.

The track record

These aren't hypotheticals. Each one is a different way the same architecture fails.

Human review. In 2019, a Guardian investigation revealed that contractors grading Siri responses regularly heard confidential medical details, drug deals, and sex -- much of it from accidental activations. Apple suspended the program and apologized. The lawsuit it spawned ended in a $95 million settlement, with checks going out this January -- Apple denied wrongdoing, and the point isn't villainy anyway. The point is that a pipeline with audio in it grows humans who listen to the audio. The same year, Bloomberg reported thousands of Amazon workers annotating Alexa recordings, up to a thousand clips per shift.

Retention. In 2023 Amazon paid a $25 million FTC penalty for keeping children's Alexa voice recordings indefinitely -- in some cases after parents asked for them to be deleted. Nobody built that as a feature. Indefinite retention was simply the default, and defaults are what happen.

The database. In 2017, the keyboard app ai.type left a misconfigured database open to the internet: records on 31 million users, and among them millions of entries of text actually typed through the keyboard, search terms and email-password pairs included. An input method had quietly become a copy of what people typed, and the copy leaked.

Your own people. In 2023, Samsung banned generative AI tools on company devices after engineers pasted sensitive source code into ChatGPT. No breach, no contractor -- just words leaving the building because the tool lived outside it.

Human review, retention, misconfiguration, leakage. Four different failure modes, one root cause: the words sat on infrastructure their owner didn't control.

So where should your words go?

Diagram comparing two data paths. Typical cloud dictation: your audio flows to the vendor's servers under their account and terms, then on to a model provider, with a dotted branch to retention, logs, and possible human review. Tonecast BYOK: your Mac talks directly to your provider using your key and your agreement. — where your words go — one path has a landlord

Tonecast's answer has two parts, and I want to state both precisely, because "private" is the most abused word in this industry.

Part one: everything that can stay home, stays home. The microphone is only live while you're holding Fn -- there's no always-on wake-word listener, because the wake word is checked in the transcript, not the audio. What Tonecast learns about your writing style lives in markdown files on your Mac, including the small local buffer they're distilled from. Your API keys are plain files in Application Support. There's no account, no telemetry by default, and no Tonecast server that any of this syncs to -- in BYOK mode there is no Tonecast server in the loop at all. And because trust needs a window, not a promise, the app keeps a live audit view of your recent activations: the exact prompts and responses of every AI call, transcription included, held in memory and cleared when you quit. You can see, word for word, what left the machine.

Part two, the honest one: BYOK still means network calls. Speech recognition runs on Groq's Whisper models, so while you hold the key down, your audio does go to Groq -- under your key. The polish and reply steps go to whichever LLM provider you picked: same deal. That's the entire point of bring-your-own-key. It's not "no cloud," it's no middleman: your data moves under an agreement between you and your provider, on an API key you can revoke tonight, with no extra company aggregating everyone's dictation under one vendor account and one set of terms. And if you want the text step fully offline, Tonecast runs local models through Ollama -- no key, nothing leaves the machine for that call.

Credit where due: honest local options exist elsewhere too. SuperWhisper, for one, can run its transcription models entirely on-device, and that's a legitimate architecture. The distinct thing about Tonecast's position is the combination -- local profiles, local keys, BYOK for every model call, and no vendor server even as an option you have to opt out of.

The rule of thumb

You don't have to take any product's word for this stuff, mine included. Just ask one question: whose account does my data travel under? If the answer is "the vendor's," then every incident above is a version of your future -- their contractors, their retention defaults, their database configs, their acquisition someday. If the answer is "mine," the blast radius of everyone else's mistakes stops mattering to you.

Your half-formed thoughts deserve the second answer.

If that matches how you think about your own words, join the early-access list.

Tonecast is built by Codefox AI. Questions, feedback, or just want to say hi? Email us at support@tonecast.ai.