Transcribe a YouTube video for $2/hour
Speaker labels, 90+ languages, accurate transcripts in 5–10 minutes. For your own YouTube content, podcasts you host on YouTube, or any video where you have transcription rights. $2 per hour, $2 minimum.
Speaker labels
YouTube's auto-captions don't separate speakers. Our diarisation tags each segment — useful for interview channels, panel discussions, or podcast video uploads.
90+ languages
YouTube auto-captions are English-first. We auto-detect 90+ languages and transcribe natively — no English-translation step that loses nuance.
$2/hour, flat
A 30-minute episode is $2. A 2-hour interview is $4. No subscription, no per-minute meter, no upgrade tiers.
Verbatim, not summary
Some YouTube tools give you a chapter summary or key points. We give you the full word-for-word transcript with timestamps — useful for editing, repurposing, or accessibility.
From YouTube video to transcript in 3 steps
Get the audio (or video)
Your own YouTube channel: download the original from Studio (Content → … → Download). Other content: use yt-dlp, cobalt, or 4K Video Downloader. Save as MP3 or MP4.
Upload to TranscribeCat
Drop the file into the upload area. Audio (MP3) or video (MP4) — both work. Files up to 500 MB / 10 hours upload directly.
Get a clean transcript
Most YouTube videos under 1 hour finish in 4–7 minutes. Copy the text, export as SRT for re-uploading captions to YouTube, or download as Word.
Pulling audio from YouTube — what you can and can't do
Your own YouTube content: easy. YouTube Studio gives channel owners the original upload back — Content → click the video → ⋮ menu → Download. That's the cleanest path for re-purposing. You can also export YouTube's own auto-captions (Subtitles tab → Download SRT) and use them as a starting point if you only need to clean up English text — but if you want speaker labels, multiple languages, or higher accuracy, transcribing the original is a lot better.
Public YouTube videos you have rights to (commissioned content, content under a permissive license, public-domain content, your guest appearances on others' channels with their permission): tools likeyt-dlp (CLI, free, open source),cobalt.tools (web UI, no install), and 4K Video Downloader (desktop GUI) extract MP3 or MP4 from any YouTube URL. yt-dlp is the most reliable; the syntax isyt-dlp -x --audio-format mp3 <URL> for audio-only.
What you shouldn't do: download and transcribe copyrighted content you don't have permission for, even for personal use. YouTube's ToS prohibits downloading without express permission. We don't police what you upload, but the responsibility for what you transcribe is yours.
If you only have the URL, not a file: download with one of the tools above first. We don't fetch from YouTube on our end — partly to avoid being a copyright laundromat, partly because YouTube's tooling for third-party access is unreliable.
YouTube auto-captions vs TranscribeCat — when to use which
YouTube has had auto-captions since 2009 and they're free. Why pay $2?
- Speaker labels: YouTube doesn't separate speakers. Interview-format videos turn into a single block of text. We label each speaker.
- Non-English languages: YouTube's auto-caption accuracy varies wildly by language. For Norwegian, Danish, Swedish, Arabic, Mandarin, Japanese, and many smaller languages, our pipeline (Whisper-class) is meaningfully more accurate.
- Punctuation and capitalisation: YouTube auto-captions are notoriously sparse on punctuation. Our transcripts are properly punctuated.
- Verbatim, not paraphrase: We don't summarise. Word-for-word with timestamps. Useful when you're repurposing a video into a blog post or quoting in print.
- SRT export with proper line breaks: Our SRT files are formatted for use as actual video subtitles. YouTube's SRT export is sometimes timed awkwardly because it splits on captions, not natural sentence boundaries.
Use YouTube auto-captions when: English-only, single speaker, informal context (e.g. checking what someone said in a vlog).
Use TranscribeCat when: speaker labels, non-English, professional output (subtitles for re-upload, blog post repurpose, accessibility for a paying audience).
What YouTube transcription costs
$2 per hour of video. Real examples for content creators:
$2
20-min YouTube short
$4
90-min interview podcast
$10
5-hour back-catalog
If you upload weekly, transcribing every video for a year costs ~$100–200 — much less than any subscription product targeting creators.
Frequently asked questions
Can I transcribe a YouTube video by URL?+
Not directly — we don't fetch from YouTube. Download the audio or video first using yt-dlp (CLI) or cobalt.tools (web UI), then upload the file. This keeps responsibility for what you transcribe on you, not us.
Is this legal?+
For your own content, yes. For others' content, only if you have explicit permission, the content is under a permissive license, or it's in the public domain. YouTube's ToS technically prohibits downloading without permission — that's between you and YouTube. We don't police uploads.
How is this different from YouTube auto-captions?+
YouTube auto-captions are free but: no speaker labels, English-first accuracy, sparse punctuation, paraphrased rather than verbatim. TranscribeCat: speaker labels, 90+ languages, properly punctuated, verbatim with SRT timing aligned to natural sentences.
How long does it take?+
Most YouTube videos under 1 hour finish in 4–7 minutes. A 2-hour podcast video typically takes 8–12 minutes. Audio-only files are slightly faster than MP4 because we skip the audio extraction step.
Can I get the transcript back as YouTube SRT to upload as captions?+
Yes. Our SRT export uses standard SubRip format with timestamps aligned to natural sentence boundaries — better for caption display than YouTube's own auto-caption export. Upload via YouTube Studio → Subtitles → Add Language → Upload File.
What about YouTube Shorts?+
Same workflow — download the .mp4 or audio, upload to TranscribeCat. Shorts are short, so most are under the $2 minimum cost. If you have many shorts, batch them by concatenating with ffmpeg or QuickTime first, then transcribe as one file.
Can you handle a 4-hour podcast video?+
Yes — up to 10 hours per file. A 4-hour podcast video at 720p is typically around 1 GB; that's over our 500 MB upload limit. Either compress with Handbrake or extract audio-only first (yt-dlp -x --audio-format mp3 <URL>). Audio-only at 192 kbps is roughly 60 MB per hour.
Related video and content-creator resources
Drop your YouTube video and get a transcript
Speaker-labeled, 90+ languages, SRT-ready. $2 per hour.
Start Transcribing ($2/hr)Free to sign up · Pay only when you transcribe