Skip to content

Configuration

OSTT stores its main configuration at ~/.config/ostt/ostt.toml. The default config is generated on first run with inline documentation for every option.

Open it with:

bash
ostt config

API keys are stored separately at ~/.local/share/ostt/credentials with restricted permissions (0600). They are not part of ostt.toml.

Audio

toml
[audio]
device = "default"
peak_volume_threshold = 90
reference_level_db = -20
output_format = "mp3 -ab 16k -ar 12000"
visualization = "spectrum"

device

Audio input device. Run ostt config list-devices to see available options.

ostt config list-devices

Available audio input devices:

  ID: 0
    Name: default [DEFAULT]
    Config: (44100Hz, 2 channels)

  ID: 2
    Name: USB Microphone
    Config: (48000Hz, 1 channels)

device accepts:

ValueExampleDescription
"default"device = "default"System default input device
Numeric indexdevice = "0"Device by index from config list-devices
Device namedevice = "USB Microphone"Device by name from config list-devices

peak_volume_threshold

Peak volume threshold for the red clipping indicator (0--100, percentage of reference level). Default 90 means the indicator activates at 90% of reference_level_db, giving 10% headroom before clipping.

reference_level_db

Reference level in dBFS for the 100% meter display. Set this to your audio card's maximum input level for accurate metering.

ValueDescription
-6Very hot recording (near clipping)
-12Hot recording (professional standard)
-18Moderate recording level
-20Conservative level, typical audio card max

If your volume meter never reaches 100%, run ostt record, maximize your microphone gain, note the peak dBFS value, and set reference_level_db to match.

output_format

Output audio format for API calls. All audio is saved as mono with ffmpeg handling resampling and compression. Format: "codec [ffmpeg_options]".

Format stringBitrateSizeUse case
"mp3 -ab 16k -ar 12000"16 kbps~1.8 MB/hourWhisper API optimal (default)
"libopus -ab 32k -ar 16000"32 kbps~3.6 MB/hourBetter quality, smaller than AAC
"aac -ab 32k -ar 16000"32 kbps~3.6 MB/hourGood quality, wide support
"flac -ar 16000"Lossless~20 MB/hourLossless, no quality loss
"pcm_s16le -ar 16000"UncompressedLargestRequired for local transcription

Local models use output_format = "pcm_s16le -ar 16000". Configure it under [whisper] or [whisper."model"] when you want only local Whisper recordings to use that format.

visualization

Display mode during recording.

ValueDescription
"spectrum"Frequency spectrum showing energy distribution across frequency bands, optimized for the human voice range (100--1500 Hz)
"waveform"Time-domain waveform showing amplitude over time, classic oscilloscope-style display

Transcription Params

Transcription request params are configured under provider tables. Provider params apply to every model for that provider; model params override provider params:

toml
[deepgram.params]
smart_format = true

[deepgram.nova-3.params]
smart_format = true
diarize = true
keyterm = ["OSTT", "VitePress"]

[openai.gpt-4o-transcribe.params]
language = "en"
prompt = "Technical dictation with project names."

[whisper.turbo.params]
language = "auto"
temperature = 0.0
no_context = true

For a single invocation, pass repeatable --param key=value overrides:

bash
ostt record --param smart_format=true --param diarize=true
ostt transcribe meeting.mp3 -m deepgram/nova-3 --param keyterm=OSTT,VitePress
ostt retry 2 -m whisper/turbo --param language=en --param temperature=0.0

--param overrides apply only to the current command. They do not change ostt.toml and they take precedence over persistent params for the selected model.

Validation happens before transcription starts:

  • keys must be supported by the selected provider/model
  • duplicate --param keys fail
  • values must match the param type
  • provider-specific ranges and conflicts are checked
  • unknown keys show the valid param names for that model

CLI list values use commas, for example --param keyterm=OSTT,VitePress. TOML list values must use TOML list syntax, for example keyterm = ["OSTT", "VitePress"].

To list supported params for a model, run:

bash
ostt model params
ostt model params openai/gpt-4o-transcribe
ostt model params whisper/turbo --format json

See Providers and Models for supported model IDs and provider-specific param tables.

Text Replace

Text replace rules are deterministic, provider-neutral post-transcription find-and-replace rules. They fix final text casing, acronyms, project names, and common misrecognitions without using AI.

toml
[text.replace]
"ostt" = "OSTT"
"api" = "API"
"typescript" = "TypeScript"
"github" = "GitHub"
"open ai" = "OpenAI"

You can also manage rules interactively:

bash
ostt replace

Replace rules are applied after transcription and before processing actions, output, and history saves. This means a processing action receives already-fixed text such as OSTT API when those rules are configured.

Matching behavior:

  • keys are literal text, not regex patterns
  • matching is case-insensitive
  • word boundaries prevent replacing inside larger words, so api does not change capital
  • phrase boundaries apply at phrase edges, so open ai. can become OpenAI.
  • target values are preserved exactly as configured
  • replace output is not recursively processed by later rules

Keywords and replace rules have different jobs. Use ostt keyword when the model mishears a term. Use [text.replace] when the model hears the term but formats it wrong.

Paste Output

Paste output sends the final transcript into the currently focused application. It is opt-in with --paste; stdout remains the default output mode.

bash
ostt --paste
ostt launch --paste
ostt launch --paste -p clean
ostt transcribe voice.ogg --paste
ostt retry 2 --paste
ostt process clean --paste

Configure paste behavior under [output.paste]:

toml
[output.paste]
paste_key = "ctrl+v"
restore_clipboard = true
restore_delay_ms = 750
post_popup_delay_ms = 1000

Default paste_key values:

  • macOS: cmd+v
  • Omarchy: shift+insert
  • other Linux desktops: ctrl+v

restore_clipboard controls whether OSTT restores the previous clipboard contents after pasting. Paste mode uses the clipboard as transport, so disabling restore leaves the transcript on the clipboard.

restore_delay_ms gives the focused app time to read the clipboard before OSTT restores the previous contents. Increase it if paste works intermittently or the old clipboard appears instead of the transcript.

post_popup_delay_ms is used by ostt launch --paste. The popup closes first, then OSTT waits for focus to return before sending the paste shortcut on supported desktops. On desktops where focus return cannot be detected, this value is used as the fallback delay.

Linux paste shortcuts differ by app and desktop. GUI apps commonly use ctrl+v; terminals often use ctrl+shift+v; Omarchy maps SUPER+v to shift+insert. OSTT cannot reliably detect whether a paste succeeded, so it does not try multiple shortcuts automatically.

On macOS, paste mode uses osascript/System Events to send cmd+v. macOS may prompt for Accessibility permission for the terminal app running OSTT, such as Ghostty. If paste does not work, open System Settings > Privacy & Security > Accessibility and enable the app shown in the permission prompt.

Local Transcription

Local models are selected and managed with ostt model. Built-in local Whisper uses the whisper provider. [whisper.params] controls global Whisper inference defaults; [whisper."model".params] overrides them for a specific model. The param meanings are the same in both places.

toml
[whisper]
output_format = "pcm_s16le -ar 16000"

[whisper.params]
language = "auto"
no_timestamps = true
no_context = true
temperature = 0.0
entropy_thold = 2.4
no_speech_thold = 0.6
ParamDefaultDescription
language"auto"Language hint for local inference. Use "auto" or an ISO code such as "en" or "sv".
no_timestampstrueSuppress timestamp output.
no_contexttrueDo not reuse text context between segments.
temperature0.0Sampling temperature. 0.0 uses greedy deterministic decoding.
entropy_thold2.4Entropy threshold for fallback behavior.
no_speech_thold0.6No-speech probability threshold.

See Local Models for setup, storage, and audio-format requirements.

External Engines

External command and HTTP engines are configured as provider/model profiles. Use command/<profile> for shell-command wrappers and http/<profile> for OpenAI-compatible /v1/audio/transcriptions endpoints:

toml
[command.parakeet-fast]
command = "~/.config/ostt/backends/parakeet-wrapper {audio_path}"
output_format = "pcm_s16le -ar 16000"

[http.cohere-transcribe]
endpoint = "http://localhost:8080/v1/audio/transcriptions"
output_format = "pcm_s16le -ar 16000"

[http.cohere-transcribe.params]
model = "cohere-transcribe"
language = "en"
response_format = "json"

See External Engines for full setup examples and rationale.

Processing Actions

Actions are defined as named tables under [process.actions]. The table key becomes the action's id on the CLI. Each action has a type of either "ai" (runs an AI CLI tool) or "bash" (runs a shell command).

AI actions can inherit their tool and model from [process] defaults. Set tool or model on an individual action only when that action should override the default.

See Processing Actions for full examples and common recipes.

AI Actions

toml
[process]
default_tool = "opencode"
default_model = "anthropic/claude-sonnet-4-6"

[process.actions.clean]
name = "Clean up text"
type = "ai"
# tool_binary = "/usr/local/bin/opencode"  # Override binary path
# tool_args = ["--quiet"]                   # Extra CLI arguments
inputs = [
  { role = "system", content = "Clean up the transcribed text." },
  { role = "user", source = "transcription" },
]

For every AI action, OSTT must resolve both a tool and a model from either the action itself or [process] defaults. If either value is missing, config validation fails.

The inputs field is an array of inline tables. Each entry has a role and exactly one content source.

Supported AI tools:

toolBinaryRequired arguments
opencodeopencode--pure run --model <model>
claude-codeclaude-p --model <model> --no-session-persistence --mcp-config <json> --strict-mcp-config --allowedTools ""
gemini-cligemini-p -m <model>
codex-clicodexexec --model <model>

The selected tool must be installed and authenticated outside OSTT.

Bash Actions

toml
[process.actions.upper]
name = "UPPERCASE"
type = "bash"
command = "tr '[:lower:]' '[:upper:]'"

Bash actions receive the transcription on stdin and return stdout as the processed result.

Input Sources

Each AI input entry has a role ("system" or "user") and exactly one content source. If multiple sources are specified in the same entry, precedence is: source > file > content.

SourceExampleDescription
source = "transcription"{ role = "user", source = "transcription" }Dynamic content: the recorded transcription
source = "keywords"{ role = "user", source = "keywords" }Dynamic content: the keyword list
content = "..."{ role = "system", content = "You are a helpful assistant." }Literal inline text
file = "..."{ role = "system", file = "~/prompts/clean.txt" }Path to a file whose contents become the message

Popup window settings for ostt launch.

toml
[popup]
# terminal = "ghostty"
# x = 630
# y = 790
width = 90
height = 15
font_size = 6
borderless = true
OptionDefaultDescription
terminalauto-detectTerminal emulator for popup. Skips auto-detection if set (faster startup).
x630Window position in pixels from left of screen. Ignored on GNOME Wayland (compositor controls placement).
y790Window position in pixels from top of screen. Ignored on GNOME Wayland.
width90Window width in terminal columns.
height15Window height in terminal rows.
font_size6Font size for the popup terminal.
borderlesstrueHide window decorations (titlebar, borders) when supported.

Terminal Auto-Detection

If terminal is unset, OSTT probes terminals in this order:

  1. Preferred: ghostty, kitty, alacritty
  2. Fallbacks: foot, konsole, gnome-terminal, xfce4-terminal

On macOS, Terminal.app does not support true color. Install a preferred terminal from the supported list.

Logging

OSTT logs all activity to daily-rotated log files. Logs are kept for the 7 most recent days; older logs are automatically cleaned up on startup.

CommandDescription
ostt logsShow the last 50 lines of the most recent log
RUST_LOG=debug ostt recordEnable debug output for troubleshooting

Available log levels: error, warn, info (default), debug, trace.

Recording Controls

During recording:

KeyAction
EnterStop recording and transcribe
SpacePause or resume recording
Esc, q, Ctrl+CCancel without saving

Display elements visible during recording:

  • Visualization: Real-time audio display (spectrum or waveform)
    • Spectrum mode: Frequency distribution across the voice range (100--1500 Hz focus)
    • Waveform mode: Amplitude envelope over time
  • Vol %: Current volume level
  • Peak %: Maximum volume in the last 3 seconds
  • Red indicator: Clipping warning when volume exceeds peak_volume_threshold

File Locations

~/.config/ostt/
├── ostt.toml              # Main configuration

~/.local/share/ostt/
├── credentials            # API keys (0600 permissions)
├── recordings/            # Saved recordings
├── models/                # Local model files and metadata

~/.local/state/ostt/
├── ostt.log.*             # Daily-rotated logs (7-day retention)