GPT-4o Transcribe on Linux.

gpt-4o-transcribe is OpenAI's most accurate transcription model — built on GPT-4o rather than Whisper, with a 4.1% word error rate and substantially fewer hallucinations on long recordings. OSTT is the fastest way to use it from a Linux or macOS terminal, bound to a hotkey and wired into your shell.

GPT-4o Transcribe

More accurate. Fewer hallucinations. Same price.

gpt-4o-transcribe delivers a 4.1% WER — a meaningful improvement over whisper-1's 5.3%, especially on noisy audio, accented speech, and long recordings. Whisper's documented hallucination problem on silence is eliminated. At $0.006/min — identical to whisper-1 — there is no cost reason to stay on the legacy model for standard transcription work.

# ~/.config/ostt/ostt.toml
[transcription]
provider = "openai"
model = "gpt-4o-transcribe"

[openai.gpt-4o-transcribe.params]
language = "en"
prompt = "Technical dictation with project names."
include = ["logprobs"]

# Pick interactively
ostt model

# Record, transcribe with gpt-4o-transcribe, copy result
ostt launch -c

No more hallucinations

Whisper-1 has a documented problem generating text that was never spoken on silent or low-quality audio. gpt-4o-transcribe eliminates this. Long meeting recordings that previously hallucinated filler text now return accurate output.

Better formatting out of the box

gpt-4o-transcribe auto-inserts punctuation, sentence casing, and handles code-switching between languages — output that previously required a post-processing step with an LLM.

Promptable transcription

Pass a prompt to gpt-4o-transcribe to guide style, domain vocabulary, and formatting. Whisper's prompt support is limited to 224 tokens and ignores instructions; gpt-4o-transcribe follows them.

Logprobs when needed

OpenAI can return token log probabilities for gpt-4o-transcribe and gpt-4o-mini-transcribe with include=logprobs. OSTT validates the option and still emits clean transcript text for shell workflows.

Diarization model available

Switch to openai/gpt-4o-transcribe-diarize when you need OpenAI's diarized JSON response. OSTT returns the combined transcript text while preserving the model-specific request options.

Any output target

Send transcription to the clipboard with -c, write to a file with -o, print to stdout and pipe it wherever you like.

Switch models anytime

Run ostt model to switch between gpt-4o-transcribe, gpt-4o-mini-transcribe, and whisper-1. One command, no config file editing required.

Workflow

From speech to useful output.

1. RecordPress your global hotkey or run ostt in the terminal.
2. Transcribegpt-4o-transcribe processes the audio via the OpenAI API.
3. ProcessOptionally run AI prompts or shell commands on the result.
4. SendPrint to stdout, copy to clipboard, write to a file, or pipe onward.

Pipeline

GPT-4o accuracy in your shell.

OSTT routes gpt-4o-transcribe output to wherever your workflow needs it. Print to stdout, copy to clipboard, write to a file, or pipe through any CLI. Use --param language=en, --param prompt=..., or --param include=logprobs for per-run OpenAI params.

# Transcribe a recording, write to file
ostt transcribe meeting.mp3 -o notes.md

# Record, process with "summary" action, copy
ostt -p summary -c

# Transcribe and pipe to custom script
ostt | ./process.sh

Run gpt-4o-transcribe from your terminal.