Open source voice-to-text for Linux. And macOS.

OSTT is a terminal-native speech-to-text tool. Record from a hotkey, transcribe with your chosen cloud or local model, then paste the result into the focused app or route it to your clipboard, a file, stdout, an AI prompt, or any shell command.

Linux-first

Global hotkey setup for Omarchy/Hyprland, GNOME, KDE, and other Linux desktops, with macOS support too.

Paste into any focused app

Use ostt launch --paste to dictate into editors, browsers, chats, issue trackers, AI apps, and terminals. OSTT closes the popup, waits for focus to return, then sends the configured paste shortcut.

Cloud or local models

Use OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, ElevenLabs, Mistral, or local Whisper-compatible models. Switch when quality, latency, price, privacy, or offline access matters.

GPU acceleration

Local models run on your GPU. Metal on macOS, CUDA on NVIDIA, Vulkan on AMD and Intel. The install script detects your hardware and picks the right build automatically.

Daemon mode

Keep a local model loaded in the background between calls. After the first load, transcription starts instantly with no model loading overhead. Install as a login service and the model is always ready.

Process with AI or bash

Run AI prompts or shell commands after transcription with ostt -p and ostt process.

Retry without re-recording

Recordings are saved locally, so you can re-transcribe the same audio with a different provider or model.

Bind it to a hotkey

Launch OSTT from anywhere. Press your global hotkey to open the recorder, speak in any application, then stop and paste the transcription back into the app that regains focus.

Open source, no subscription

The code is public, the config is local, and the providers are the ones you choose.

Workflow

From speech to useful output.

1. RecordUse the terminal or a global hotkey popup.
2. TranscribeChoose the provider and model that fit the job.
3. ProcessOptionally run AI prompts or shell commands.
4. SendPaste into the focused app, print, copy, write to a file, or pipe onward.

Pipeline

Transcribe once. Route anywhere.

OSTT's pipeline plugs transcription into anything you already run. Dictate directly into the focused app with --paste. Process audio files from scripts and CI jobs. Wire the output to OpenCode, Claude Code, or any AI tool. Pipe through custom bash commands. The -p flag puts it all one flag away.

# Open popup, record, clean up, paste into focused app
ostt launch --paste -p clean

# Record and transcribe, run "clean" action, copy to clipboard
ostt -p clean -c

# Transcribe mp3, run "summary" action, write to file
ostt transcribe meeting.mp3 -p summary -o notes.md

# Process second latest recording, run "cmd" action, print to stdout
ostt process 2 cmd

File transcription in CI

Transcribe audio files from scripts, cron jobs, or CI pipelines without a microphone. Process meeting recordings, voicemails, or dictation files with the same provider choice and processing pipeline as live recording.

Agent-ready transcription

Use OSTT with OpenCode, Claude Code, Gemini CLI, Codex CLI, OpenClaw, Hermes, or other agentic harnesses when their built-in voice tools are too limited. Configure providers, models, prompts, and shell actions once, then route flexible transcription into the agents you already use.

Terminal-native automation

Pipe transcription through jq, sed, awk, or any CLI tool. Chain recording, processing, and output into a single shell pipeline. Alias your most common workflow and run it from a hotkey.