Transcribe MP4 video to text for $2/hour
Drop your MP4 — we extract the audio track automatically and return a speaker-labeled transcript. Works for Zoom recordings, screen recordings, lectures, conference talks, and downloaded videos. $2 per hour, $2 minimum.
We extract the audio
Don't convert your MP4 to MP3 first — upload the video as-is. We pull the audio track server-side and discard the video. The transcript comes out the same.
Speaker labels for meetings
MP4 is the dominant format for Zoom, Teams, and Meet recordings. Multiple speakers? We separate them and label each segment.
$2 per hour, video or audio
You pay for length, not file format. A 1-hour MP4 lecture is $2, same as a 1-hour MP3 podcast. The video being there doesn't cost extra.
SRT export for subtitles
Need captions for the same video on YouTube or social media? Download the transcript as SRT and burn or upload it to your video — timestamps are pre-aligned.
From MP4 video to text in 3 steps
Upload your MP4
Drop the .mp4 into the upload area. Files up to 500 MB / 10 hours work directly. No need to extract audio first or convert to MP3.
We extract and transcribe
Our pipeline pulls the audio track, runs it through speaker diarisation and Whisper-class transcription. Most MP4s under 2 hours finish in 4–8 minutes.
Download text or SRT
Copy the transcript, export as SRT for video subtitles (timestamps already aligned), or download as Word. The MP4 plays back alongside the text.
Video formats, codecs, and why MP4 is the safe default
MP4 is technically a container, like ZIP for video. Inside, you usually find:
- Video stream: typically H.264 (AVC), increasingly H.265 (HEVC), occasionally newer codecs like AV1. We don't care — we throw the video away.
- Audio stream: usually AAC, occasionally AC-3 or MP3. This is the only part that matters. We extract it, decode it, transcribe it.
- Subtitle/caption tracks: ignored. We generate our own from the audio.
You don't need to extract the audio yourself.Tools like Audacity, Handbrake, or ffmpeg will let you pull a WAV or MP3 from an MP4 — but doing so doesn't change the resulting transcript at all. Skip the step. Upload the MP4 directly.
One bandwidth tip: if your source video is huge (a 4K screen recording can easily hit 5 GB for a 1-hour file), and you only care about the audio for transcription, exporting audio-only with QuickTime or Audacity will save upload time. But a typical Zoom recording (1080p, 1 hour) lands at 200–400 MB and uploads fine as-is.
Common MP4 sources and what to expect
- Zoom cloud recordings: 1080p H.264 + AAC. Clean transcripts, speaker labels work well because Zoom records each participant roughly to the same volume. Typical 1-hour file: 250 MB.
- Microsoft Teams recordings: Stored in OneDrive/SharePoint, usually 720p H.264 + AAC. Same accuracy as Zoom. Download from Stream/SharePoint as MP4.
- Google Meet cloud recordings: Workspace-only feature. 720p H.264 + AAC, slightly lower bitrate than Zoom. Speaker accuracy is good but slightly less crisp on cross-talk.
- QuickTime / OBS / iPhone screen recordings: H.264 + AAC. Excellent quality. Common for product walkthroughs, lecture screen-shares, software demos.
- Camera footage (iPhone, GoPro, DSLR): usually H.264 or HEVC + AAC. Field recordings often have wind noise or background — accuracy depends on mic quality, not the format.
- YouTube downloads: tools like cobalt or yt-dlp give MP4. Make sure you have the right to transcribe (your own video, or with permission).
What MP4 transcription actually costs
$2 per hour of video, regardless of resolution or codec. Real examples:
$2
45-min Zoom recording
$4
2-hour lecture
$10
5-hour conference talks
$2 minimum per file. Resolution doesn't matter — we only transcribe the audio.
Frequently asked questions
Do I need to extract the audio from my MP4 first?+
No. Upload the .mp4 directly — we pull the audio track server-side. Pre-extracting to WAV or MP3 doesn't change the transcript and just adds a step.
What if my MP4 is bigger than 500 MB?+
Either compress the video first (Handbrake will get a typical 1080p Zoom recording well under 500 MB), or extract the audio to MP3/M4A (QuickTime: File → Export As → Audio Only). Audio-only files are roughly 5–10% the size of the video.
How long does an MP4 take to transcribe?+
Most MP4s under 2 hours finish in 4–8 minutes. The extraction step adds maybe 30 seconds vs. starting from MP3. A 4-hour MP4 typically takes 12–18 minutes.
Will I get subtitles I can drop into my video?+
Yes. The SRT export uses the same timestamps as the source audio, so you can attach the .srt file to your MP4 in any video player or upload it as captions on YouTube/Vimeo. The format is standard SubRip.
Does codec matter — H.264 vs H.265 vs AV1?+
No. We discard the video stream entirely. As long as the file plays in any modern player, our pipeline can read it. AAC, MP3, and AC-3 audio are all supported.
What about screen recordings with no spoken audio?+
Silent video produces an empty transcript and a refund — we automatically refund any file where transcription fails or finds no speech. If your recording has only background music, you'll get song-detection-style output, not a useful transcript.
Does Zoom's built-in transcription work just as well?+
Zoom's built-in transcript is decent for free, but it lacks speaker-label accuracy on cross-talk, doesn't support 90+ languages, and isn't available on free Zoom. If you only need English transcripts of clean meetings and you're on a paid Zoom plan, the built-in is fine. For everything else, an MP4 upload is the consistent answer.
Related MP4 and video resources
Drop your MP4 and get a transcript
Zoom recordings, lectures, screen captures — $2 per hour with speaker labels and SRT export.
Start Transcribing ($2/hr)Free to sign up · Pay only when you transcribe