Skip to content

Why OSTT?

OSTT exists for developers and power users who want speech-to-text to behave like the rest of their toolchain: transparent, scriptable, provider-agnostic, and comfortable on Linux.

Many dictation apps are polished GUI products built around a proprietary cloud, a subscription, or a Mac-first workflow. OSTT takes a smaller approach. It is open source, terminal-native, bring-your-own-key, and designed so transcription output can move through shell pipelines, AI prompts, files, clipboards, hotkeys, and scripts.

Where OSTT Is Different

Linux Is a First-Class Target

Many popular dictation tools focus on macOS, Windows, and mobile. OSTT is built with Linux desktop workflows in mind: Wayland clipboard handling, Omarchy/Hyprland setup, GNOME and KDE guides, AUR packaging, native Linux packages, and a popup launcher that works from global hotkeys.

Provider Choice

OSTT does not force one transcription backend. You can choose from OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, and ElevenLabs, then switch models when your needs change.

This matters for quality, latency, price, language support, and regulatory requirements. For example, Berget provides Swedish/EU-hosted transcription options for users who care about data residency.

Retry With Another Model

OSTT stores recordings locally so you can re-transcribe the same audio later.

bash
ostt retry
ostt retry 3 -c

If one model gets a term wrong, switch provider/model with ostt auth and retry the same recording without speaking again.

Scriptable Post-Processing

OSTT can run processing actions after transcription. AI actions work like configurable dictation modes in GUI tools: transcription goes into a prompt, processed text comes out.

Bash actions are the unusual part. They let you pipe transcribed text through any command or script, so processing can be as simple as text cleanup or as specific as your own local workflow.

Terminal-Native Workflow

OSTT treats stdout, files, clipboard output, command aliases, logs, shell completions, and pipelines as core features rather than afterthoughts.

bash
ostt | grep invoice
ostt -p clean -c
ostt transcribe meeting.mp3 -p summary -o summary.txt
ostt process 2 cmd

For users who live in the terminal, this is the main idea.

Comparison

This table is a high-level comparison against common voice-to-text tools. Capabilities change over time, so treat it as positioning rather than a permanent claim.

CapabilityOSTTTypical GUI Dictation Apps
Linux-first setupYesUsually no
macOS supportYesUsually yes
Open sourceYesUsually no
BYO API keyYesSometimes
Multiple transcription providersYesOften one provider or model family
Terminal-native workflowYesNo
Global hotkey popupYesYes
File transcriptionYesSometimes
Local transcription historyYesUsually yes
Retry same recording with another modelYesRare
AI post-processingYesOften yes
Bash/shell post-processingYesNo
Context/screen awarenessNoOften yes
Streaming transcriptionNoOften yes
Direct text injectionClipboard by defaultOften yes
On-device transcriptionNoSometimes

What OSTT Does Not Try To Be

OSTT is not trying to be a mobile dictation app, a real-time streaming transcription overlay, or a screen-aware assistant that reads every app you use. Those are valid tools, but they pull in a different direction from a terminal-native, scriptable tool.

OSTT optimizes for control, composability, provider choice, and developer workflows. If you want voice input that behaves like a command-line tool, OSTT is built for that niche.

Compared With Other Open Source Dictation Tools

OSTT overlaps with Linux dictation tools such as Voxtype and Hyprwhspr, but the focus is different. Those projects are strongest when you want a dedicated system-wide dictation app. OSTT is strongest when you want speech-to-text to work as part of a developer workflow: provider switching, file transcription, retry, stdout, paste output, text replace rules, and AI or shell processing.

For a more detailed comparison, see: