Transcribe MP3 to text for $2/hour
Drop your MP3 file, get a clean transcript with speaker labels in under 5 minutes. We handle every common bitrate — 64 kbps voice notes through 320 kbps studio podcasts — at the same flat $2 per hour.
Any MP3 bitrate
64, 96, 128, 192, 320 kbps — variable or constant. No re-encoding required, no quality settings to fiddle with. Upload the file as-is.
Speaker labels included
Multiple speakers in your MP3? We separate them automatically. Crucial for podcast episodes, recorded interviews, and meeting recordings.
$2 per hour, flat
Whether your MP3 is a 5-minute voicemail or a 4-hour podcast back-catalog, the rate is $2 per hour of audio. $2 minimum per file.
EU-processed
Audio is processed in EU data centres and deleted from our servers 90 days after your last sign-in. GDPR-compliant by default.
From MP3 to searchable text in 3 steps
Upload your MP3
Drag your .mp3 file into the upload area. Files up to 500 MB and 10 hours work directly — no compression, no format conversion.
We transcribe
Our engine detects the language automatically (90+ supported), separates speakers, and runs the audio through a Whisper-class model. Most MP3s under 2 hours finish in 3–5 minutes.
Download the transcript
Copy as plain text, export as SRT for subtitles, or download as a Word document. The audio plays back alongside the text so you can verify any line.
Bitrate, mono vs stereo, and why MP3 transcribes well
MP3 is the most common audio format in the world precisely because it makes sensible tradeoffs for human speech. The format compresses by removing frequencies our ears don't reliably detect — which happens to be the same trick that lets transcription models focus on the parts of the signal that matter for words. A 96 kbps spoken-word MP3 transcribes as well as a 320 kbps version; you only see meaningful accuracy gains with WAV or FLAC when the source is music or has very low signal-to-noise.
That said, two MP3 quirks are worth knowing:
- Variable bitrate (VBR) vs constant bitrate (CBR): both work. VBR files report inconsistent bitrates to some tools but our pipeline reads the underlying samples, so accuracy is identical to CBR.
- Mono vs stereo: most podcast and meeting MP3s are mono — fine. If your MP3 is stereo with one speaker hard-panned to each channel (a common Zoom or interview-rig setup), our diarisation model handles speaker separation either way; you don't need to manually split the channels.
- Joint stereo: the default for most encoders. Decodes identically to true stereo for our purposes.
One thing that does hurt accuracy more than bitrate: background music or heavy noise. If your MP3 is a podcast with constant intro music under the voices, expect 1–2% lower accuracy than a clean studio recording. The transcription model is robust, not magic.
Where MP3 files come from (and what to expect)
Most of the MP3s our customers upload come from one of these sources, each with different characteristics:
- Podcast hosts (Anchor, Buzzsprout, Libsyn): usually 96–128 kbps mono. Clean signal, sometimes intro music. Transcribes very accurately.
- Voice recording apps (Easy Voice Recorder, Smart Recorder): 64–96 kbps mono. Good speech accuracy, watch for background noise from where you recorded.
- DAW exports (Audacity, GarageBand, Reaper): often 192–320 kbps stereo. Highest accuracy, larger files — fine, we handle up to 500 MB.
- Phone-call recordings: typically 32–64 kbps mono with some compression artifacts. Still transcribes well; speakers labels work even on phone-quality narrowband audio.
- Audiobook or YouTube rips: variable. If the audio is clean, accuracy is high. If music sits on top of speech (common on lectures with intro stings), the music portions may produce low-confidence text — that's expected.
What MP3 transcription actually costs
$2 per hour of MP3 audio, flat. Real examples:
$2
15-minute voice memo
$4
2-hour podcast episode
$10
5-hour back-catalog
$2 minimum per file. No subscription. You only pay for the MP3s you actually transcribe.
Frequently asked questions
What MP3 bitrates are supported?+
All of them — 32 kbps through 320 kbps, variable or constant bitrate. Lower bitrates (32–64 kbps) sometimes produce slightly lower accuracy because the source audio itself has less detail, but the format does not limit us. We read the decoded samples directly.
How long does it take to transcribe an MP3?+
Most MP3 files under 2 hours finish in 3–5 minutes. A 4-hour MP3 typically takes 8–12 minutes. The page updates automatically — you can leave the tab and come back.
What is the maximum MP3 file size?+
500 MB or 10 hours of audio, whichever you hit first. A typical 128 kbps MP3 is about 1 MB per minute, so a 10-hour MP3 at 128 kbps is roughly 75 MB — well under the limit. 320 kbps stereo MP3s are larger; a 5-hour 320 kbps file is around 720 MB and would need to be split.
Can I transcribe a podcast MP3?+
Yes — podcasts are one of the most common file types we transcribe. The speaker-label feature is especially useful for interview-format podcasts. Background intro/outro music doesn't hurt accuracy on the spoken portions.
Do MP3s with multiple speakers get speaker labels?+
Yes. Our pipeline includes automatic speaker diarisation — the transcript will be broken into segments labeled Speaker 1, Speaker 2, and so on. You can rename speakers in the editor after the fact.
Will my MP3 sound quality affect the transcript?+
Bitrate has minimal impact for spoken-word audio. What matters more is background noise, simultaneous speakers cross-talking, and accent/dialect. Studio-quality MP3s reach 95%+ accuracy on clear English; phone-quality recordings with background noise typically hit 88–92%.
Can I transcribe a copyrighted MP3 (audiobook, music)?+
You should only upload files you have the right to transcribe. We don't police content but we also don't condone copyright infringement. For your own recordings, podcasts, lectures, and meetings — go ahead.
Related MP3 transcription resources
M4A to text
iPhone Voice Memos default to M4A. Same engine, same $2/hr.
WAV to text
Lossless audio for studio-quality recordings.
Audio to text — full guide
How AI transcription works under the hood, accuracy expectations, format support.
Transcription pricing
$2/hr explained, with comparison to subscription and per-minute services.
Drop your MP3 and get a transcript
$2 per hour, $2 minimum, speaker labels included. No subscription.
Start Transcribing ($2/hr)Free to sign up · Pay only when you transcribe