Transcribe WAV to text for $2/hour

Studio-quality WAV recordings — DAW exports, multi-mic interviews, depositions, archival audio — transcribed with speaker labels in under 5 minutes for most files. $2 per hour, no subscription.

Lossless audio in

Upload uncompressed PCM WAV at any sample rate or bit depth (16-bit, 24-bit, 32-bit float — all fine). We don't re-compress; the model gets the highest-quality signal.

Speaker diarisation

Multi-track recordings, panel interviews, court depositions — we identify and label each speaker automatically, even with overlapping speech.

$2 per hour, flat

A 4-hour studio recording costs the same as a 4-hour MP3: $2 per hour. Lossless audio doesn't cost more, even though the files are 10× the size of MP3.

Up to 500 MB / 10 hours

A 4-hour 24-bit/48 kHz stereo WAV is about 2.4 GB and won't fit. Convert to FLAC (also lossless) or split the session — 10 hours of 16-bit/16 kHz mono WAV is roughly 1.1 GB.

From WAV file to clean transcript in 3 steps

Upload your WAV

Drop the .wav file into the upload area. Mono or stereo, any sample rate, any common bit depth. No conversion needed.

We transcribe

WAV files are decoded directly — we skip the lossy decode step, which means no encoder artifacts in the input. Most WAVs under 2 hours finish in 4–7 minutes.

Get a verbatim transcript

Lossless input means the highest accuracy our pipeline can produce. Copy as text, export as SRT, or download as a Word document with speaker labels.

Why pros choose WAV (and when MP3 is fine)

WAV is uncompressed audio — the raw waveform stored sample-by-sample in a simple container. A 1-hour stereo WAV at 44.1 kHz / 16-bit (CD quality) is about 600 MB; at 48 kHz / 24-bit (broadcast standard) it's ~1 GB per hour. The files are huge because there's no compression — every sample is preserved exactly as the microphone captured it.

For transcription specifically, WAV produces marginally better results than MP3 in three cases:

Multi-microphone setups where each speaker has their own channel. We can use channel separation to improve diarisation.
Very low signal levels (whisper-quiet speech, distant mic placement), where MP3 compression discards detail that helps the model.
Heavy background noise or music, where the extra dynamic range of WAV gives the model more to work with for separation.

For everything else — clean podcasts, single-speaker recordings, normal meeting audio — a 192 kbps MP3 transcribes within 1% of WAV accuracy. If the difference between 96% and 97% accuracy matters, use WAV. Otherwise the extra storage is wasted on a transcription workflow.

Where WAV files come from in real workflows

Pro DAWs (Pro Tools, Logic, Reaper, Audacity): WAV is the default export for archival masters. Sessions are typically 24-bit / 48 kHz. We accept these as-is.
Field recorders (Zoom H5/H6, Tascam DR-40, Sound Devices MixPre): record directly to WAV. Multi-mic field recordings often produce 4-track or 8-track WAV files — those are fine; we sum to mono internally for transcription.
Broadcast and archival systems: BBC, NPR, court reporters, and many legal/medical contexts mandate WAV (or BWF — Broadcast WAV — which we read identically) because lossy compression is considered a chain-of-custody issue.
Voice acting and ADR: WAV preserves the take exactly as recorded for later editing.
Older Windows recording apps: still default to WAV. If you have a decade-old recording, it's probably WAV.

Tip for very long sessions: if you have a 5-hour deposition WAV at 24-bit/48 kHz that's 5+ GB, convert to FLAC first. FLAC is also lossless, transcribes identically to WAV, and typically halves the file size. We accept FLAC directly.

What WAV transcription actually costs

$2 per hour, regardless of bit depth or sample rate. Real examples:

30-min interview WAV

4-hour studio session

$20

10-hour deposition

Lossless audio doesn't cost more. WAV files are larger but transcription is priced by length, not size.

Frequently asked questions

What WAV bit depths and sample rates are supported?+

All common ones: 16-bit, 24-bit, 32-bit integer, and 32-bit float. Sample rates from 8 kHz (legacy phone) through 192 kHz (high-res audio) all decode. We resample internally for transcription.

Will a WAV transcript be more accurate than the same audio as MP3?+

Marginally — typically <1% accuracy difference for clean spoken-word audio. The cases where WAV meaningfully wins: multi-mic recordings, low-level audio, and heavy background noise. For normal meeting/interview audio, MP3 at 192 kbps is essentially identical.

My WAV is 4 GB and won't upload — what now?+

Convert to FLAC (also lossless, much smaller) using Audacity, ffmpeg, or any pro audio tool. A typical 4 GB WAV becomes a 1.5–2 GB FLAC with no audio quality loss. Or split the session into two halves at a natural break. Our limits are 500 MB and 10 hours per file.

Do you handle multi-track WAV (BWF, multi-channel)?+

Yes. Multi-channel WAV files (4-track, 8-track) decode and sum to mono before transcription. We can't use individual channel labels yet for diarisation — speaker identification still runs on the summed audio. Multi-channel diarisation is on the roadmap.

Are BWF (Broadcast WAV) files supported?+

Yes — BWF is just WAV with extra metadata in a "bext" chunk. We ignore the metadata and decode the audio as standard WAV. Timecode and recorder metadata are preserved in our backend but not surfaced in the transcript editor yet.

Can I get word-level timestamps from a WAV?+

Currently we provide segment-level (sentence-level) timestamps in the transcript and SRT export. Word-level timestamps are on the roadmap. The format you start with — WAV vs MP3 — doesn't change what we output.

For court-reporter or legal-deposition use, is WAV preferred?+

Many legal workflows mandate lossless audio for the master archive. Upload the WAV directly — no transcoding step in the chain of custody. Note that TranscribeCat is not court-certified; the AI transcript should be reviewed by a human before official use.

Related WAV and pro-audio resources

MP3 to text

When MP3 is good enough (most cases) and when WAV wins.

For legal

Deposition transcription and confidentiality.

For journalism

Multi-source interview transcription with speaker labels.

Improve transcription accuracy

Mic placement, noise floors, and what actually moves accuracy.

Drop your WAV and get a clean transcript

Studio audio in, speaker-labeled text out. $2 per hour with no compression artifacts.

Start Transcribing ($2/hr)

Free to sign up · Pay only when you transcribe

Transcribe WAV to text for $2/hour

Lossless audio in

Speaker diarisation

$2 per hour, flat

Up to 500 MB / 10 hours

From WAV file to clean transcript in 3 steps

Upload your WAV

We transcribe

Get a verbatim transcript

Why pros choose WAV (and when MP3 is fine)

Where WAV files come from in real workflows

What WAV transcription actually costs

Frequently asked questions

Related WAV and pro-audio resources

Drop your WAV and get a clean transcript

See also

Transcribe WAV to text for $2/hour

Lossless audio in

Speaker diarisation

$2 per hour, flat

Up to 500 MB / 10 hours

From WAV file to clean transcript in 3 steps

Upload your WAV

We transcribe

Get a verbatim transcript

Why pros choose WAV (and when MP3 is fine)

Where WAV files come from in real workflows

What WAV transcription actually costs

Frequently asked questions

Related WAV and pro-audio resources

Drop your WAV and get a clean transcript

See also