Open weights, Apache 2.0
Voxtral is open-source. The 3B and 24B model weights are available on Hugging Face for self-hosting, private deployment, or on-premise use. The API routes to a transcription-optimised version of the mini model when you don't want to manage infrastructure.
Best price-performance
At $0.003/min, Voxtral Mini Transcribe V2 costs half of GPT-4o-mini-transcribe and one-fifth of ElevenLabs Scribe v2, while matching or beating both on accuracy benchmarks. For high-volume transcription work, no other API comes close on this ratio.
Context biasing for technical vocabulary
Provide up to 100 words or phrases to guide the model toward correct spellings of names, technical terms, and domain-specific vocabulary. OSTT sends your configured keywords as Voxtral context_bias terms automatically.
13 languages with speaker diarization
Voxtral Mini Transcribe V2 supports English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch — with speaker diarization and word-level timestamps in all 13.
3-hour recordings in a single request
Unlike most transcription APIs that require chunking at 25MB or 25 minutes, Voxtral processes recordings up to 3 hours in one request. Transcribe a full workday of audio without writing chunking logic.
Global hotkey
Bind OSTT to a system-wide shortcut. Press to open the recorder, speak, press again to stop. Voxtral transcribes and the result lands in your clipboard or stdout — without touching the mouse.