gpt-4o-transcribe
Built on GPT-4o, not just Whisper. Produces lower word error rates on noisy audio, better formatting, and significantly fewer hallucinations than whisper-1 on long recordings. Same price per minute as the legacy model.
OpenAI offers GPT-4o transcription, GPT-4o Mini, GPT-4o diarization, and the legacy hosted Whisper model through the same transcription API. whisper-1 remains useful for timestamp metadata; gpt-4o-transcribe and gpt-4o-mini-transcribe are the newer generation with lower word error rates; gpt-4o-transcribe-diarize adds speaker-segment annotations. OSTT connects them to your terminal with a single config change.
OpenAI Speech Models
OpenAI's transcription API covers the spectrum from faster GPT-4o Mini to higher-accuracy GPT-4o transcription, diarized GPT-4o output, and legacy whisper-1 timestamp metadata. OSTT lets you switch between them through ostt model without touching your workflow.
# ~/.config/ostt/ostt.toml
[transcription]
provider = "openai"
model = "gpt-4o-transcribe"
# Or use Whisper timestamps / GPT-4o diarization
# model = "whisper-1"
# model = "gpt-4o-transcribe-diarize"
[openai.gpt-4o-transcribe.params]
language = "en"
include = ["logprobs"]
# Pick interactively
ostt model
# Record with hotkey, transcribe with OpenAI, copy to clipboard
ostt launch -cBuilt on GPT-4o, not just Whisper. Produces lower word error rates on noisy audio, better formatting, and significantly fewer hallucinations than whisper-1 on long recordings. Same price per minute as the legacy model.
Half the cost of gpt-4o-transcribe at $0.003/min, with accuracy that still exceeds whisper-1 on most audio types. The right default for high-volume transcription where cost matters.
The original OpenAI hosted Whisper model. Supports verbose_json with word-level timestamps and SRT/VTT output formats — useful for subtitle generation and pipelines that need precise timing.
Use OpenAI's diarized JSON response for speaker-segment annotations. OSTT returns the combined transcript text and keeps model-specific params under openai.gpt-4o-transcribe-diarize.params.
Set OpenAI params per model with --param or persistent openai.<model>.params: language, prompt, temperature, GPT-4o include=logprobs, Whisper timestamp granularities, and diarization fields.
Use -p to run a processing action after transcription. Route OpenAI output directly into OpenCode, Claude Code, Gemini CLI, or any shell command without manual copy-paste.
OSTT saves every recording locally. Run ostt retry to re-transcribe the same audio with a different OpenAI model, or switch to any other provider — no need to speak again.
Workflow
ostt in the terminal.Pipeline
OSTT turns OpenAI transcription into a standard Unix primitive. The output is plain text on stdout — pipe it through jq, sed, or any CLI tool. Use -p to chain processing actions. Switch between whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribe via ostt model without changing your shell aliases.
# Transcribe a file with gpt-4o-transcribe
ostt transcribe interview.mp3 -o transcript.txt
# Record, clean with AI processing action, copy to clipboard
ostt -p clean -c
# Transcribe and pipe straight to your editor
ostt | nvim -