文字起こし WAV をテキストに $2/時間で
Studio-quality WAV recordings — DAW exports, multi-mic interviews, depositions, archival audio — transcribed with speaker labels in under 5 minutes for most files. $2 per hour, no subscription.
ロスレス音声の入力
Upload uncompressed PCM WAV at any sample rate or bit depth (16-bit, 24-bit, 32-bit float — all fine). We don't re-compress; the model gets the highest-quality signal.
話者分離
Multi-track recordings, panel interviews, court depositions — we identify and label each speaker automatically, even with overlapping speech.
$2/時間、定額
A 4-hour studio recording costs the same as a 4-hour MP3: $2 per hour. Lossless audio doesn't cost more, even though the files are 10× the size of MP3.
500 MB / 10 時間まで
A 4-hour 24-bit/48 kHz stereo WAV is about 2.4 GB and won't fit. Convert to FLAC (also lossless) or split the session — 10 hours of 16-bit/16 kHz mono WAV is roughly 1.1 GB.
From WAV file to clean transcript in 3 steps
WAV をアップロード
Drop the .wav file into the upload area. Mono or stereo, any sample rate, any common bit depth. No conversion needed.
文字起こし
WAV files are decoded directly — we skip the lossy decode step, which means no encoder artifacts in the input. Most WAVs under 2 hours finish in 4–7 minutes.
逐語の文字起こしを取得
Lossless input means the highest accuracy our pipeline can produce. Copy as text, export as SRT, or download as a Word document with speaker labels.
Why pros choose WAV (and when MP3 is fine)
WAV is uncompressed audio — the raw waveform stored sample-by-sample in a simple container. A 1-hour stereo WAV at 44.1 kHz / 16-bit (CD quality) is about 600 MB; at 48 kHz / 24-bit (broadcast standard) it's ~1 GB per hour. The files are huge because there's no compression — every sample is preserved exactly as the microphone captured it.
For transcription specifically, WAV produces marginally better results than MP3 in three cases:
- Multi-microphone setups where each speaker has their own channel. We can use channel separation to improve diarisation.
- Very low signal levels (whisper-quiet speech, distant mic placement), where MP3 compression discards detail that helps the model.
- Heavy background noise or music, where the extra dynamic range of WAV gives the model more to work with for separation.
For everything else — clean podcasts, single-speaker recordings, normal meeting audio — a 192 kbps MP3 transcribes within 1% of WAV accuracy. If the difference between 96% and 97% accuracy matters, use WAV. Otherwise the extra storage is wasted on a transcription workflow.
Where WAV files come from in real workflows
- Pro DAWs (Pro Tools, Logic, Reaper, Audacity): WAV is the default export for archival masters. Sessions are typically 24-bit / 48 kHz. We accept these as-is.
- Field recorders (Zoom H5/H6, Tascam DR-40, Sound Devices MixPre): record directly to WAV. Multi-mic field recordings often produce 4-track or 8-track WAV files — those are fine; we sum to mono internally for transcription.
- Broadcast and archival systems: BBC, NPR, court reporters, and many legal/medical contexts mandate WAV (or BWF — Broadcast WAV — which we read identically) because lossy compression is considered a chain-of-custody issue.
- Voice acting and ADR: WAV preserves the take exactly as recorded for later editing.
- Older Windows recording apps: still default to WAV. If you have a decade-old recording, it's probably WAV.
Tip for very long sessions: if you have a 5-hour deposition WAV at 24-bit/48 kHz that's 5+ GB, convert to FLAC first. FLAC is also lossless, transcribes identically to WAV, and typically halves the file size. We accept FLAC directly.
What WAV transcription actually costs
$2 per hour, regardless of bit depth or sample rate. Real examples:
$2
30 分のインタビュー WAV
$8
4 時間のスタジオセッション
$20
10 時間の証言録取
Lossless audio doesn't cost more. WAV files are larger but transcription is priced by length, not size.
Frequently asked questions
What WAV bit depths and sample rates are supported?+
All common ones: 16-bit, 24-bit, 32-bit integer, and 32-bit float. Sample rates from 8 kHz (legacy phone) through 192 kHz (high-res audio) all decode. We resample internally for transcription.
Will a WAV transcript be more accurate than the same audio as MP3?+
Marginally — typically <1% accuracy difference for clean spoken-word audio. The cases where WAV meaningfully wins: multi-mic recordings, low-level audio, and heavy background noise. For normal meeting/interview audio, MP3 at 192 kbps is essentially identical.
My WAV is 4 GB and won't upload — what now?+
Convert to FLAC (also lossless, much smaller) using Audacity, ffmpeg, or any pro audio tool. A typical 4 GB WAV becomes a 1.5–2 GB FLAC with no audio quality loss. Or split the session into two halves at a natural break. Our limits are 500 MB and 10 hours per file.
Do you handle multi-track WAV (BWF, multi-channel)?+
Yes. Multi-channel WAV files (4-track, 8-track) decode and sum to mono before transcription. We can't use individual channel labels yet for diarisation — speaker identification still runs on the summed audio. Multi-channel diarisation is on the roadmap.
Are BWF (Broadcast WAV) files supported?+
Yes — BWF is just WAV with extra metadata in a "bext" chunk. We ignore the metadata and decode the audio as standard WAV. Timecode and recorder metadata are preserved in our backend but not surfaced in the transcript editor yet.
Can I get word-level timestamps from a WAV?+
Currently we provide segment-level (sentence-level) timestamps in the transcript and SRT export. Word-level timestamps are on the roadmap. The format you start with — WAV vs MP3 — doesn't change what we output.
For court-reporter or legal-deposition use, is WAV preferred?+
Many legal workflows mandate lossless audio for the master archive. Upload the WAV directly — no transcoding step in the chain of custody. Note that TranscribeCat is not court-certified; the AI transcript should be reviewed by a human before official use.
関連する WAV・プロオーディオリソース
Drop your WAV and get a clean transcript
Studio audio in, speaker-labeled text out. $2 per hour with no compression artifacts.
Start transcribingNo card needed to sign up.