Skip to content

Groq

Groq runs Whisper models on LPU infrastructure for very fast transcription. OSTT supports Groq’s OpenAI-compatible transcription endpoint.

API documentation: Groq audio transcriptions

Models

Model IDNotes
groq/whisper-large-v3Full Whisper Large V3. Groq documents this as the accuracy-sensitive choice with transcription and translation support.
groq/whisper-large-v3-turboFine-tuned, pruned Whisper Large V3 Turbo variant. Groq documents this as the best price/performance choice for multilingual transcription.

Groq documents whisper-large-v3 at 189x real-time speed with 10.3% WER, and whisper-large-v3-turbo at 216x real-time speed with 12% WER. Turbo does not support translation; OSTT currently uses Groq’s transcription endpoint.

Params

toml
[groq.whisper-large-v3-turbo.params]
language = "en"
prompt = "Meeting about Rust, OSTT, and terminal transcription."
temperature = 0.0
response_format = "verbose_json"
timestamp_granularities = ["word", "segment"]
bash
ostt transcribe meeting.mp3 -m groq/whisper-large-v3-turbo --param language=en --param response_format=verbose_json --param timestamp_granularities=word,segment
ostt model params groq/whisper-large-v3-turbo --format json

OSTT always returns plain transcript text. verbose_json and timestamp params are supported because the response still contains a top-level text field; timestamp metadata is not emitted in command output.

ParamTypeDescription
languagestringOptional ISO-639-1 language hint such as en. Groq documents this as improving accuracy and latency when the source language is known.
promptstringContext prompt for terminology, spelling, or output style. Groq documents a 224-token prompt limit. Saved ostt keyword terms are used as fallback only when prompt is not set.
temperaturenumberSampling temperature, 0.0 to 1.0. Groq recommends the default 0 for transcription.
response_formatstringSupported values in OSTT: json, verbose_json. Groq also documents text, but OSTT does not expose it because this provider path parses JSON.
timestamp_granularitiesstring listSupported values: word, segment. Requires response_format = "verbose_json"; OSTT sets verbose_json automatically if timestamps are set without an explicit response format.

Audio Limits

Groq documents direct uploads for flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Current limits are 25 MB on the free tier and 100 MB on the dev tier, with a 10-second minimum billed length. Groq downsamples audio to 16 kHz mono before transcription.