AssemblyAI Universal-3 Pro from the command line.

AssemblyAI's Universal-3 Pro is the first promptable speech model — guide transcription with natural language before it listens, not after. With a 5.6% mean WER, 30% fewer hallucinations than Whisper, and built-in speaker diarization, it's the choice when accuracy and control matter. OSTT connects it to your terminal, hotkey, and shell.

AssemblyAI Universal-3 Pro

Prompt it. Don't post-process it.

Universal-3 Pro accepts a natural language prompt parameter that shapes transcription during inference — not after. Tell it the domain, formatting style, or vocabulary before it listens. It delivers 5.6% WER on English benchmarks, 30% fewer hallucinations than Whisper, and language detection across 99 languages. OSTT surfaces all of this through your terminal with zero extra tooling.

# ~/.config/ostt/ostt.toml
[transcription]
provider = "assemblyai"
model = "universal-3-pro"

[assemblyai.universal-3-pro.params]
punctuate = true
format_text = true
language_detection = true

# Pick interactively
ostt model

# Record with hotkey, transcribe, copy to clipboard
ostt launch -c

Promptable transcription

Universal-3 Pro is the first speech model that accepts natural language instructions. Describe the domain, speaker roles, formatting preferences, or vocabulary before it transcribes. The model applies them at inference — no post-processing LLM call needed.

30% fewer hallucinations than Whisper

Whisper's hallucination problem on silent or low-quality audio is well documented. Universal-3 Pro reduces fabrications, omissions, and extended hallucinated sequences by 30% — critical for medical, legal, or compliance recordings.

5.6% mean WER

AssemblyAI benchmarks Universal-3 Pro at 5.6% mean WER across English test sets (median 4.9%). On specialist domains like financial earnings calls, it has outperformed GPT-4o-transcribe in independent evaluations.

Speaker diarization included

Identify and label up to 30 speakers in a recording without a separate API call. Particularly strong on short files and single-word responses — the kind of audio that breaks most diarization systems.

99 languages with code-switching

Universal-3 Pro handles mixed-language audio natively, with a 19% relative WER improvement on code-switching benchmarks. Set expected_languages in config to restrict detection when you know the source language.

Keywords vocabulary boosting

Add technical terms, names, and domain vocabulary via OSTT's keywords feature. OSTT sends them as keyterm hints to Universal-3 Pro, reducing errors on the words that matter most in your workflow.

Workflow

From speech to useful output.

1. RecordPress your global hotkey or run ostt in the terminal.

2. TranscribeUniversal-3 Pro processes the audio via the AssemblyAI API.

3. ProcessOptionally run AI prompts or shell commands on the result.

4. SendPrint to stdout, copy to clipboard, write to a file, or pipe onward.

Pipeline

Accurate transcription inside your shell.

OSTT routes Universal-3 Pro output to wherever your workflow needs it. Plain text on stdout, clipboard with -c, file with -o, or pipe through any CLI tool. Combine with OSTT's -p flag to run processing actions on the result without a separate tool.

# Transcribe a meeting recording
ostt transcribe meeting.mp3 -o notes.md

# Record, run "clean" processing action, copy
ostt -p clean -c

# Transcribe and pipe to downstream command
ostt | my-tool.sh

Accurate, promptable transcription in your terminal.

Read the docs AssemblyAI provider reference