OpenAI Whisper from the command line.

OpenAI offers GPT-4o transcription, GPT-4o Mini, GPT-4o diarization, and the legacy hosted Whisper model through the same transcription API. whisper-1 remains useful for timestamp metadata; gpt-4o-transcribe and gpt-4o-mini-transcribe are the newer generation with lower word error rates; gpt-4o-transcribe-diarize adds speaker-segment annotations. OSTT connects them to your terminal with a single config change.

OpenAI Speech Models

Four models, one API, one config.

OpenAI's transcription API covers the spectrum from faster GPT-4o Mini to higher-accuracy GPT-4o transcription, diarized GPT-4o output, and legacy whisper-1 timestamp metadata. OSTT lets you switch between them through ostt model without touching your workflow.

# ~/.config/ostt/ostt.toml
[transcription]
provider = "openai"
model = "gpt-4o-transcribe"

# Or use Whisper timestamps / GPT-4o diarization
# model = "whisper-1"
# model = "gpt-4o-transcribe-diarize"

[openai.gpt-4o-transcribe.params]
language = "en"
include = ["logprobs"]

# Pick interactively
ostt model

# Record with hotkey, transcribe with OpenAI, copy to clipboard
ostt launch -c

gpt-4o-transcribe

Built on GPT-4o, not just Whisper. Produces lower word error rates on noisy audio, better formatting, and significantly fewer hallucinations than whisper-1 on long recordings. Same price per minute as the legacy model.

gpt-4o-mini-transcribe

Half the cost of gpt-4o-transcribe at $0.003/min, with accuracy that still exceeds whisper-1 on most audio types. The right default for high-volume transcription where cost matters.

whisper-1 (legacy)

The original OpenAI hosted Whisper model. Supports verbose_json with word-level timestamps and SRT/VTT output formats — useful for subtitle generation and pipelines that need precise timing.

gpt-4o-transcribe-diarize

Use OpenAI's diarized JSON response for speaker-segment annotations. OSTT returns the combined transcript text and keeps model-specific params under openai.gpt-4o-transcribe-diarize.params.

Validated request options

Set OpenAI params per model with --param or persistent openai.<model>.params: language, prompt, temperature, GPT-4o include=logprobs, Whisper timestamp granularities, and diarization fields.

Pipe to AI tools

Use -p to run a processing action after transcription. Route OpenAI output directly into OpenCode, Claude Code, Gemini CLI, or any shell command without manual copy-paste.

Retry without re-recording

OSTT saves every recording locally. Run ostt retry to re-transcribe the same audio with a different OpenAI model, or switch to any other provider — no need to speak again.

Workflow

From speech to useful output.

1. RecordPress your global hotkey or run ostt in the terminal.
2. TranscribeOpenAI transcribes the audio via the API.
3. ProcessOptionally run AI prompts or shell commands on the result.
4. SendPrint to stdout, copy to clipboard, write to a file, or pipe onward.

Pipeline

OpenAI quality inside your shell.

OSTT turns OpenAI transcription into a standard Unix primitive. The output is plain text on stdout — pipe it through jq, sed, or any CLI tool. Use -p to chain processing actions. Switch between whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribe via ostt model without changing your shell aliases.

# Transcribe a file with gpt-4o-transcribe
ostt transcribe interview.mp3 -o transcript.txt

# Record, clean with AI processing action, copy to clipboard
ostt -p clean -c

# Transcribe and pipe straight to your editor
ostt | nvim -

Add OpenAI transcription to your terminal.