Provider choice is the big difference
OSTT is not tied to one engine family. Use OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, ElevenLabs, Mistral, built-in local Whisper, or a custom command/HTTP engine.
Voxtype and OSTT are both open source speech-to-text tools for people who want voice input outside a proprietary dictation subscription. Voxtype is strongest as an integrated Linux dictation daemon with local engine setup. OSTT is strongest when you want provider choice, cloud or local models, paste output, saved recordings, retry, files, stdout, and shell or AI processing.
Short Answer
Voxtype is a serious Rust-native dictation project with detailed local model, daemon, hotkey, output, GPU, and packaging docs. OSTT takes a different route: it treats speech-to-text as a terminal primitive that can paste into apps, write files, stream through stdout, call cloud or local providers, retry a saved recording, and post-process text through AI prompts or bash commands.
# Dictate into the focused app
ostt launch --paste
# Clean the transcript, then paste it
ostt launch --paste -p clean
# Transcribe a file and summarize it
ostt transcribe meeting.mp3 -p summary -o notes.md
# Retry the same saved audio with another model
ostt retry -m deepgram/nova-3Feature Comparison
| Capability | OSTT | Voxtype |
|---|---|---|
| Open source | ✅ MIT, Rust | ✅ MIT, Rust |
| Linux support | ✅ Linux-first docs for Omarchy/Hyprland, GNOME, KDE, and other desktops | ✅ Linux-first, optimized for Wayland and documented for multiple distros |
| macOS support | ✅ macOS support with Homebrew and Shortcuts.app hotkeys | ✅ Apple Silicon macOS support documented |
| Focused app insertion | ✅ --paste inserts into the focused app and can restore the previous clipboard | ✅ Public docs describe typing/clipboard output backends and direct dictation flow |
| Cloud transcription providers | ✅ OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, ElevenLabs, Mistral | ⚠️ Primarily local-engine oriented, with documented remote/OpenAI-style options |
| Built-in local transcription | ✅ Whisper-compatible local models through whisper-rs | ✅ Whisper and ONNX engine setup documented |
| Custom local engines | ✅ command/<profile> wrappers and http/<profile> OpenAI-compatible endpoints | ✅ Public docs describe multiple local engines and engine variants |
| Provider/model switching | ✅ ostt model, -m PROVIDER/MODEL, and per-run --param overrides | ✅ Engine/model configuration and setup flows |
| Retry same recording with another model | ✅ First-class ostt retry -m PROVIDER/MODEL | ⚠️ Not the main documented workflow |
| File/stdout/shell pipelines | ✅ Core workflow: stdout, files, clipboard, paste, scripts, and CI-friendly file transcription | ⚠️ File transcription is documented, but the product focus is dictation |
| AI and shell post-processing | ✅ ostt -p and ostt process for AI prompts or bash commands | ✅ Public docs describe profiles and post-processing hooks/configuration |
| Text cleanup | ✅ Keywords plus deterministic ostt replace rules for casing, acronyms, and names | ✅ Public docs describe prompts, replacements, and spoken punctuation features |
| Visual desktop integration | ⚠️ Popup terminal and platform hotkey guides | ✅ More desktop-dictation-oriented setup, notifications, and OSD options |
OSTT is not tied to one engine family. Use OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, ElevenLabs, Mistral, built-in local Whisper, or a custom command/HTTP engine.
If one transcription is wrong, run ostt retry -m openai/gpt-4o-transcribe or ostt retry -m deepgram/nova-3 on the same saved recording.
ostt launch --paste records from a popup, closes it, waits for focus to return, pastes into the focused app, and restores the previous clipboard when configured.
Run faster-whisper, Parakeet, Speaches, LocalAI, Cohere Transcribe, or your own ASR wrapper separately, then connect OSTT with command/* or http/*.
Use keywords to help recognition, ostt replace to fix final casing, and processing actions to clean, translate, summarize, or reshape text for code and AI tools.
If your priority is a dedicated Linux dictation daemon with deep local-engine and desktop-output setup, Voxtype may fit. If you want speech as a Unix-style pipeline, choose OSTT.
Use Cases