Tonecast vs superwhisper

Dictation ends at the cursor. Tonecast starts at the thread.

superwhisper is the dictation tool we respect most — real on-device speech models, a real free tier, a license you can own outright. But everything it writes begins and ends at your cursor. Tonecast begins with the conversation on your screen.

“yestobothnumbersThursday,andtenworks”

The cursor

superwhisper · any text field

YestobothnumbersThursday,andtenworks.

on-device transcription · lands at the cursor

The thread

Tonecast · the same field

P

Priya

Can you send final numbers by Thursday? Board moved up.

P

Priya

Also — does 10am still work for the run-through?

J

Jordan

I'll grab a room if we're on.

Yestoboth,PriyafinalnumbersbyThursday,and10amstillworks.Jordan,grabtheroom.

reply drafted from the thread · ⌘V ready

three messages read · reply lands at the cursor

The most honest dictation tool

Start with the credit, because it's earned. superwhisper can run its speech models entirely on-device — your audio becomes text without touching a server, and the local models are included even on the free tier. We've called that a legitimate architecture on our own blog, and this page won't walk it back. One Pro license covers macOS, Windows, iPhone, and iPad, and there's a $249.99 lifetime option — you can simply own the thing.

And it goes well past raw transcription. Custom modes run your dictation through an LLM with a prompt you write yourself, and you can feed them real context: the active input field and window title, your selected text, your clipboard. Dictate a rough sentence and get back a formatted one, in the shape you asked for, with a model you chose — cloud or local. As pure dictation goes, it's the most honest tool in the category.

What the cursor can't see

Look closely at what those modes are fed, though, because superwhisper's own docs are precise about it. Application Context means “Text from active input fields, names, and title from your active window.” Selected text is what you highlighted; clipboard is what you copied. Every input is something you handed it — and their docs describe Super Mode as built for “text transformation or formatting rather than content analysis or generation.” Nothing in the product reads the conversation on your screen.

That's the line Tonecast crosses. Press the hotkey over a thread and it resolves the conversation you're looking at — Gmail, Apple Mail, Superhuman, WhatsApp, Slack, iMessage — through per-app integrations. It knows who wrote, what they asked, and what's still unanswered, then drafts three replies, each labeled by intent, and pastes the one you pick at your cursor. The demo above is the whole argument: to a dictation tool, “yes to both” is a finished sentence; to Tonecast it's an instruction that only makes sense once the thread has been read.

Local respect, compared

Both products keep something important on your machine — but not the same thing, and the difference deserves precision rather than point-scoring. superwhisper's local guarantee is about the audio: pick a local model and speech becomes text with no network in the path at all. That guarantee is real, and Tonecast doesn't match it — our dictation goes out through your own Groq Whisper key, and drafting through whichever text provider you bring. A provider still sees the request; it's just your provider, on your key, with no Tonecast server in between.

What Tonecast keeps home is the part dictation doesn't have: the identity. Your voice profiles are plain markdown files at ~/Library/Application Support/Tonecast/voices/ — per channel, per contact — that you can open, edit, or delete, and your keys, vocabulary, and logs live beside them. If you want text generation on-device too, Ollama is a supported provider: no key at all, text only. Two different layers of the same value — they keep the audio local; we keep you local, and keep every vendor out of the default path.

One honest caveat on our side: Tonecast Cloud, the optional managed tier for people who'd rather not handle provider keys, does route prompts through our API. We don't store your text and we don't train on it — but if “nothing leaves the machine” is your bar, superwhisper's local models clear it and neither of our modes does.

The ledger

Tonecastsuperwhisper
Voice dictation
Fully on-device transcriptionno — your Groq key
Reads the thread on your screenNo
Drafts reply options3, intent-labeledNo
Voice profile per contactNo
Path with no vendor serverBYOK — your keys, no accountyes — local models
PlatformsmacOS today · iPhone coming soon · Linux & Windows in the worksmacOS, Windows, iPhone & iPad
Pricefree BYOK · Cloud $10/mo$8.49/mo · $84.99/yr · $249.99 lifetime · free tier

Sources: superwhisper docs & Pro pricing, verified 2026-07-05 · tonecast.ai/privacy

Where superwhisper wins

Entirely offline. Select a local model and transcription happens with no network at all — not encrypted-in-transit, not anonymized: absent. Tonecast can't say that; even BYOK calls a provider on your key. If you dictate things that should never leave the room, this is the feature that decides.

Own it outright. $249.99 buys a lifetime license — one purchase covering macOS, Windows, iPhone, and iPad, with no subscription underneath it. And below that sits a permanent free tier with unlimited local models. Tonecast has no lifetime option; free-with-your-own-keys is our floor.

Raw-dictation depth — and reach. Custom modes with per-mode model choice, prompts you write yourself, and context toggles make it a deeper pure-dictation tool than anything we ship. It also runs on Windows and iPad today, while Tonecast is macOS today · iPhone coming soon · Linux & Windows in the works.

Every claim on this page comes from superwhisper's own docs, custom-modes guide, and Pro page, verified July 5, 2026. If they change a number, we'll update this page.

The real question

If you want speech-to-text with nothing leaving the machine, get superwhisper. It's the most honest architecture in dictation, and you can own it outright. If the work is replying — reading the thread, deciding what it needs, and sounding like yourself — that's Tonecast. Running both is coherent: their local models for the words that stay private, Tonecast for the conversations that need an answer.

superwhisper Pro is $8.49/mo, $84.99/yr, or $249.99 once, with a permanent free tier underneath. Tonecast is free with your own keys, no account required; Tonecast Cloud is $10/mo if you'd rather not manage them.