Linux-first
Global hotkey setup for Omarchy/Hyprland, GNOME, KDE, and other Linux desktops, with macOS support too.
OSTT is a terminal-native speech-to-text tool. Record from a hotkey, transcribe with your chosen cloud or local model, then paste the result into the focused app or route it to your clipboard, a file, stdout, an AI prompt, or any shell command.
Global hotkey setup for Omarchy/Hyprland, GNOME, KDE, and other Linux desktops, with macOS support too.
Use ostt launch --paste to dictate into editors, browsers, chats, issue trackers, AI apps, and terminals. OSTT closes the popup, waits for focus to return, then sends the configured paste shortcut.
Use OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, ElevenLabs, Mistral, or local Whisper-compatible models. Switch when quality, latency, price, privacy, or offline access matters.
Local models run on your GPU. Metal on macOS, CUDA on NVIDIA, Vulkan on AMD and Intel. The install script detects your hardware and picks the right build automatically.
Keep a local model loaded in the background between calls. After the first load, transcription starts instantly with no model loading overhead. Install as a login service and the model is always ready.
Run AI prompts or shell commands after transcription with ostt -p and ostt process.
Recordings are saved locally, so you can re-transcribe the same audio with a different provider or model.
Launch OSTT from anywhere. Press your global hotkey to open the recorder, speak in any application, then stop and paste the transcription back into the app that regains focus.
The code is public, the config is local, and the providers are the ones you choose.
Workflow
Pipeline
OSTT's pipeline plugs transcription into anything you already run. Dictate directly into the focused app with --paste. Process audio files from scripts and CI jobs. Wire the output to OpenCode, Claude Code, or any AI tool. Pipe through custom bash commands. The -p flag puts it all one flag away.
# Open popup, record, clean up, paste into focused app
ostt launch --paste -p clean
# Record and transcribe, run "clean" action, copy to clipboard
ostt -p clean -c
# Transcribe mp3, run "summary" action, write to file
ostt transcribe meeting.mp3 -p summary -o notes.md
# Process second latest recording, run "cmd" action, print to stdout
ostt process 2 cmdTranscribe audio files from scripts, cron jobs, or CI pipelines without a microphone. Process meeting recordings, voicemails, or dictation files with the same provider choice and processing pipeline as live recording.
Use OSTT with OpenCode, Claude Code, Gemini CLI, Codex CLI, OpenClaw, Hermes, or other agentic harnesses when their built-in voice tools are too limited. Configure providers, models, prompts, and shell actions once, then route flexible transcription into the agents you already use.
Pipe transcription through jq, sed, awk, or any CLI tool. Chain recording, processing, and output into a single shell pipeline. Alias your most common workflow and run it from a hotkey.