Tonecast vs superwhisper
Dictation ends at the cursor. Tonecast starts at the thread.
superwhisper is the dictation tool we respect most — real on-device speech models, a real free tier, a license you can own outright. But everything it writes begins and ends at your cursor. Tonecast begins with the conversation on your screen.
“yestoboth—numbersThursday,andtenworks”
The cursor
superwhisper · any text field
Yestoboth—numbersThursday,andtenworks.
on-device transcription · lands at the cursor
The thread
Tonecast · the same field
Priya
Can you send final numbers by Thursday? Board moved up.
Priya
Also — does 10am still work for the run-through?
Jordan
I'll grab a room if we're on.
Yestoboth,Priya—finalnumbersbyThursday,and10amstillworks.Jordan,grabtheroom.
reply drafted from the thread · ⌘V ready
three messages read · reply lands at the cursor
The most honest dictation tool
Start with the credit, because it's earned. superwhisper can run its speech models entirely on-device — your audio becomes text without touching a server, and the local models are included even on the free tier. We've called that a legitimate architecture on our own blog, and this page won't walk it back. One Pro license covers macOS, Windows, iPhone, and iPad, and there's a $249.99 lifetime option — you can simply own the thing.
And it goes well past raw transcription. Custom modes run your dictation through an LLM with a prompt you write yourself, and you can feed them real context: the active input field and window title, your selected text, your clipboard. Dictate a rough sentence and get back a formatted one, in the shape you asked for, with a model you chose — cloud or local. As pure dictation goes, it's the most honest tool in the category.
What the cursor can't see
Look closely at what those modes are fed, though, because superwhisper's own docs are precise about it. Application Context means “Text from active input fields, names, and title from your active window.” Selected text is what you highlighted; clipboard is what you copied. Every input is something you handed it — and their docs describe Super Mode as built for “text transformation or formatting rather than content analysis or generation.” Nothing in the product reads the conversation on your screen.
That's the line Tonecast crosses. Press the hotkey over a thread and it resolves the conversation you're looking at — Gmail, Apple Mail, Superhuman, WhatsApp, Slack, iMessage — through per-app integrations. It knows who wrote, what they asked, and what's still unanswered, then drafts three replies, each labeled by intent, and pastes the one you pick at your cursor. The demo above is the whole argument: to a dictation tool, “yes to both” is a finished sentence; to Tonecast it's an instruction that only makes sense once the thread has been read.
Local respect, compared
Both products keep something important on your machine — but not the same thing, and the difference deserves precision rather than point-scoring. superwhisper's local guarantee is about the audio: pick a local model and speech becomes text with no network in the path at all. That guarantee is real, and Tonecast doesn't match it — our dictation goes out through your own Groq Whisper key, and drafting through whichever text provider you bring. A provider still sees the request; it's just your provider, on your key, with no Tonecast server in between.
What Tonecast keeps home is the part dictation doesn't have: the identity. Your voice profiles are plain markdown files at ~/Library/Application Support/Tonecast/voices/ — per channel, per contact — that you can open, edit, or delete, and your keys, vocabulary, and logs live beside them. If you want text generation on-device too, Ollama is a supported provider: no key at all, text only. Two different layers of the same value — they keep the audio local; we keep you local, and keep every vendor out of the default path.
One honest caveat on our side: Tonecast Cloud, the optional managed tier for people who'd rather not handle provider keys, does route prompts through our API. We don't store your text and we don't train on it — but if “nothing leaves the machine” is your bar, superwhisper's local models clear it and neither of our modes does.
The ledger
| Tonecast | superwhisper | |
|---|---|---|
| Voice dictation | ||
| Fully on-device transcription | no — your Groq key | |
| Reads the thread on your screen | No | |
| Drafts reply options | 3, intent-labeled | No |
| Voice profile per contact | No | |
| Path with no vendor server | BYOK — your keys, no account | yes — local models |
| Platforms | macOS today · iPhone coming soon · Linux & Windows in the works | macOS, Windows, iPhone & iPad |
| Price | free BYOK · Cloud $10/mo | $8.49/mo · $84.99/yr · $249.99 lifetime · free tier |
Sources: superwhisper docs & Pro pricing, verified 2026-07-05 · tonecast.ai/privacy
Where superwhisper wins
Entirely offline. Select a local model and transcription happens with no network at all — not encrypted-in-transit, not anonymized: absent. Tonecast can't say that; even BYOK calls a provider on your key. If you dictate things that should never leave the room, this is the feature that decides.
Own it outright. $249.99 buys a lifetime license — one purchase covering macOS, Windows, iPhone, and iPad, with no subscription underneath it. And below that sits a permanent free tier with unlimited local models. Tonecast has no lifetime option; free-with-your-own-keys is our floor.
Raw-dictation depth — and reach. Custom modes with per-mode model choice, prompts you write yourself, and context toggles make it a deeper pure-dictation tool than anything we ship. It also runs on Windows and iPad today, while Tonecast is macOS today · iPhone coming soon · Linux & Windows in the works.
Every claim on this page comes from superwhisper's own docs, custom-modes guide, and Pro page, verified July 5, 2026. If they change a number, we'll update this page.
The real question
If you want speech-to-text with nothing leaving the machine, get superwhisper. It's the most honest architecture in dictation, and you can own it outright. If the work is replying — reading the thread, deciding what it needs, and sounding like yourself — that's Tonecast. Running both is coherent: their local models for the words that stay private, Tonecast for the conversations that need an answer.
superwhisper Pro is $8.49/mo, $84.99/yr, or $249.99 once, with a permanent free tier underneath. Tonecast is free with your own keys, no account required; Tonecast Cloud is $10/mo if you'd rather not manage them.