Transcribe a WhatsApp voice note in under 3 minutes
Got a voice note from a friend, family member, or colleague that you can't (or don't want to) listen to right now? Forward it as an audio attachment, upload here, get the text. Works in 90+ languages — handy when the note isn't in your strongest language.
Read instead of listen
In a meeting? Public transit with no headphones? Voice note from a relative in a language you only half-speak? Read the transcript in seconds, reply when you have time to actually engage.
90+ languages
WhatsApp voice notes from a Norwegian aunt, a Spanish coworker, an Arabic family group? Auto-detected and transcribed natively — not translated.
$2 per note (or per hour)
Most voice notes are 30 seconds to 5 minutes — so they all hit the $2 minimum. If you want to bulk-process, stitch several notes together first.
Privacy by default
Audio is processed in EU data centres and deleted from our servers 90 days after your last sign-in. Sensitive personal voice notes don't outlive your need for them.
From WhatsApp voice note to text in 3 steps
Forward the voice note as audio
Long-press the voice note → Forward → tap your own number / Saved Messages → Share / Send. The note now exists as a file, not a chat-bound voice note. On iPhone or Android, you can also long-press → Share → Save to Files.
Upload to TranscribeCat
Open transcribecat.com, sign in, drop the .opus or .ogg or .m4a file (depending on your phone) into the upload area. We handle all WhatsApp voice-note formats.
Read the transcript
Most voice notes finish in 30 seconds to 2 minutes. Read in-browser, copy to your reply, or save to Notes/Email/Slack.
What format are WhatsApp voice notes (and why it matters)
WhatsApp uses the Opus codec inside an OGG containerfor voice notes on most platforms — sometimes .opus, sometimes .ogg, and on some iPhone configurations you'll see .m4a (AAC) instead. All three are supported here without conversion.
Opus is interesting for voice transcription because it was designed specifically for speech at low bitrates — exactly what WhatsApp voice notes are (they're typically 20–32 kbps to keep messages small over data connections). At those bitrates, MP3 would sound terrible; Opus sounds clearly intelligible. Transcription accuracy on a 24 kbps Opus voice note is roughly identical to a 96 kbps MP3 of the same speech — the codec is doing real work to preserve speech detail.
One quirk: WhatsApp records voice notes in mono at 16 kHz(telephone-band audio). This is fine for transcription — Whisper-class models handle 16 kHz input natively — but it does limit accuracy slightly compared to studio-quality recordings. For typical conversational voice notes you can expect 90–95% accuracy on clear speakers in most languages.
Forwarding the voice note as audio is what produces the file. If you tap Share without forwarding, on some platforms WhatsApp will share a link to itself rather than the actual audio. Long-press → Forward → send to yourself / Saved Messages → tap the message → Share / Save reliably gets you the file.
The voice-note tax (and how transcription unwinds it)
Voice notes are great for the sender — faster than typing, more emotion, less friction. They're often worsefor the receiver: you can't skim, you can't Ctrl-F, you can't read on mute, and the play time of the note is the time you must spend listening. A 5-minute voice note is a 5-minute commitment.
This asymmetry is real. Some people like it (warmth, intimacy). Many don't (admin burden, attention cost). Reading the transcript is one way to get the gist without spending the time, and reply when you actually have it.
Common scenarios where users transcribe voice notes:
- Family group chats in languages you only half-speak.
- Long voice notes from coworkers that turn out to contain one actionable sentence.
- Voice notes in meetings or public places where you can't play audio out loud.
- Important voice notes you want to archive in writing for searchability.
- Voice notes from elderly parents who use voice instead of text — read for clarity, save for later.
What WhatsApp voice note transcription costs
$2 minimum per file (covers any voice note up to 1 hour):
$2
One 30-second note
$2
One 50-min monologue
$10
5 separate notes
Tip: if you have many short voice notes from one person, save them all and stitch with QuickTime or any audio app, then upload once. You'll pay $2 instead of $2 × N.
Frequently asked questions
How do I save a WhatsApp voice note as a file?+
Long-press the voice note in the chat → Forward → choose your own number, Saved Messages, or any chat where you can share with yourself. The voice note now appears as a regular audio attachment. From there, tap → Share → Save to Files (iPhone) or Save (Android) to save it to your phone's storage.
What file format are WhatsApp voice notes?+
.opus or .ogg on Android (Opus codec in an Ogg container), and .m4a on iPhone (AAC). All three formats are supported by TranscribeCat directly — no conversion needed.
Can I batch-upload many voice notes at once?+
Yes — drop up to 10 files into the upload area. Each is transcribed independently and counts as a $2 minimum, though. To save money on many short notes, stitch them together first using QuickTime, Audacity, or any audio editor; then upload as one file.
What about WhatsApp voice notes in languages I don't speak?+
Our pipeline auto-detects the language and transcribes natively — including Norwegian, Swedish, Danish, Spanish, French, German, Portuguese, Italian, Arabic, Hindi, Mandarin, Japanese, and 80+ other languages. The transcript stays in the original language; we don't auto-translate (that's a different tool).
How accurate is transcription on a low-bitrate voice note?+
Surprisingly good. WhatsApp uses the Opus codec, which is designed for low-bitrate speech and preserves intelligibility well. You'll typically see 90–95% accuracy on clear speech, with some drop on heavily accented speech, very fast speakers, or background noise.
Are WhatsApp voice notes encrypted? What happens to my privacy if I upload one?+
WhatsApp voice notes are end-to-end encrypted in transit between you and the sender. Once you have the file on your phone, encryption no longer applies to the local copy — it's a regular audio file. When you upload to TranscribeCat, the audio is sent over HTTPS, processed in EU data centres, and deleted 90 days after your last sign-in. We don't share recordings beyond OpenAI for the transcription pass (zero-retention agreement). See /trust for full subprocessor disclosure.
Why doesn't WhatsApp transcribe voice notes itself?+
Some WhatsApp clients (newer iPhone versions) include built-in transcription. It's English-first, no speaker labels, and the experience varies by phone version. For everything else — non-English, longer notes, exporting the text, archiving — uploading to TranscribeCat is the consistent answer.
Related voice and language resources
iPhone Voice Memo to text
For voice memos recorded directly in iPhone, not WhatsApp.
M4A to text
iPhone WhatsApp voice notes save as M4A — full format details.
Spanish transcription
For voice notes in Spanish from family or work groups.
Multilingual transcription guide
Per-language tips and accuracy expectations.
Read your WhatsApp voice notes
Forward, upload, read in seconds. $2 per note. 90+ languages.
Start Transcribing ($2/hr)Free to sign up · Pay only when you transcribe