OSTT vs Voxtype: which open source speech-to-text tool should you choose?

Voxtype and OSTT are both open source speech-to-text tools for people who want voice input outside a proprietary dictation subscription. Voxtype is strongest as an integrated Linux dictation daemon with local engine setup. OSTT is strongest when you want provider choice, cloud or local models, paste output, saved recordings, retry, files, stdout, and shell or AI processing.

Try OSTT Choose a model

Short Answer

Choose OSTT when model choice and developer workflow matter.

Voxtype is a serious Rust-native dictation project with detailed local model, daemon, hotkey, output, GPU, and packaging docs. OSTT takes a different route: it treats speech-to-text as a terminal primitive that can paste into apps, write files, stream through stdout, call cloud or local providers, retry a saved recording, and post-process text through AI prompts or bash commands.

# Dictate into the focused app
ostt launch --paste

# Clean the transcript, then paste it
ostt launch --paste -p clean

# Transcribe a file and summarize it
ostt transcribe meeting.mp3 -p summary -o notes.md

# Retry the same saved audio with another model
ostt retry -m deepgram/nova-3

Feature Comparison

OSTT and Voxtype solve overlapping but different jobs.

Capability	OSTT	Voxtype
Open source	✅ MIT, Rust	✅ MIT, Rust
Linux support	✅ Linux-first docs for Omarchy/Hyprland, GNOME, KDE, and other desktops	✅ Linux-first, optimized for Wayland and documented for multiple distros
macOS support	✅ macOS support with Homebrew and Shortcuts.app hotkeys	✅ Apple Silicon macOS support documented
Focused app insertion	✅ `--paste` inserts into the focused app and can restore the previous clipboard	✅ Public docs describe typing/clipboard output backends and direct dictation flow
Cloud transcription providers	✅ OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, ElevenLabs, Mistral	⚠️ Primarily local-engine oriented, with documented remote/OpenAI-style options
Built-in local transcription	✅ Whisper-compatible local models through `whisper-rs`	✅ Whisper and ONNX engine setup documented
Custom local engines	✅ `command/<profile>` wrappers and `http/<profile>` OpenAI-compatible endpoints	✅ Public docs describe multiple local engines and engine variants
Provider/model switching	✅ `ostt model`, `-m PROVIDER/MODEL`, and per-run `--param` overrides	✅ Engine/model configuration and setup flows
Retry same recording with another model	✅ First-class `ostt retry -m PROVIDER/MODEL`	⚠️ Not the main documented workflow
File/stdout/shell pipelines	✅ Core workflow: stdout, files, clipboard, paste, scripts, and CI-friendly file transcription	⚠️ File transcription is documented, but the product focus is dictation
AI and shell post-processing	✅ `ostt -p` and `ostt process` for AI prompts or bash commands	✅ Public docs describe profiles and post-processing hooks/configuration
Text cleanup	✅ Keywords plus deterministic `ostt replace` rules for casing, acronyms, and names	✅ Public docs describe prompts, replacements, and spoken punctuation features
Visual desktop integration	⚠️ Popup terminal and platform hotkey guides	✅ More desktop-dictation-oriented setup, notifications, and OSD options

Provider choice is the big difference

OSTT is not tied to one engine family. Use OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, ElevenLabs, Mistral, built-in local Whisper, or a custom command/HTTP engine.

Retry makes model choice testable

If one transcription is wrong, run ostt retry -m openai/gpt-4o-transcribe or ostt retry -m deepgram/nova-3 on the same saved recording.

Paste is now first-class

ostt launch --paste records from a popup, closes it, waits for focus to return, pastes into the focused app, and restores the previous clipboard when configured.

External engines without bundling bloat

Run faster-whisper, Parakeet, Speaches, LocalAI, Cohere Transcribe, or your own ASR wrapper separately, then connect OSTT with command/* or http/*.

Developer text cleanup

Use keywords to help recognition, ostt replace to fix final casing, and processing actions to clean, translate, summarize, or reshape text for code and AI tools.

Voxtype is still strong

If your priority is a dedicated Linux dictation daemon with deep local-engine and desktop-output setup, Voxtype may fit. If you want speech as a Unix-style pipeline, choose OSTT.

Use Cases

When OSTT is the better Voxtype alternative.

1. You use multiple providersSwitch between cloud, local, and custom engines when quality, latency, language, or privacy changes.

2. You work in terminalsSend voice to stdout, files, scripts, AI prompts, or shell commands without leaving the keyboard.

3. You compare modelsSave recordings and retry the same audio with a different provider instead of re-recording.

4. You need cleanupCombine keywords, replace rules, and processing actions for developer vocabulary and consistent formatting.

Try OSTT as your Voxtype alternative.

Read the docs See providers