Groq
Groq runs Whisper models on LPU infrastructure for very fast transcription. OSTT supports Groq’s OpenAI-compatible transcription endpoint.
API documentation: Groq audio transcriptions
Models
| Model ID | Notes |
|---|---|
groq/whisper-large-v3 | Full Whisper Large V3. Groq documents this as the accuracy-sensitive choice with transcription and translation support. |
groq/whisper-large-v3-turbo | Fine-tuned, pruned Whisper Large V3 Turbo variant. Groq documents this as the best price/performance choice for multilingual transcription. |
Groq documents whisper-large-v3 at 189x real-time speed with 10.3% WER, and whisper-large-v3-turbo at 216x real-time speed with 12% WER. Turbo does not support translation; OSTT currently uses Groq’s transcription endpoint.
Params
[groq.whisper-large-v3-turbo.params]
language = "en"
prompt = "Meeting about Rust, OSTT, and terminal transcription."
temperature = 0.0
response_format = "verbose_json"
timestamp_granularities = ["word", "segment"]ostt transcribe meeting.mp3 -m groq/whisper-large-v3-turbo --param language=en --param response_format=verbose_json --param timestamp_granularities=word,segment
ostt model params groq/whisper-large-v3-turbo --format jsonOSTT always returns plain transcript text. verbose_json and timestamp params are supported because the response still contains a top-level text field; timestamp metadata is not emitted in command output.
| Param | Type | Description |
|---|---|---|
language | string | Optional ISO-639-1 language hint such as en. Groq documents this as improving accuracy and latency when the source language is known. |
prompt | string | Context prompt for terminology, spelling, or output style. Groq documents a 224-token prompt limit. Saved ostt keyword terms are used as fallback only when prompt is not set. |
temperature | number | Sampling temperature, 0.0 to 1.0. Groq recommends the default 0 for transcription. |
response_format | string | Supported values in OSTT: json, verbose_json. Groq also documents text, but OSTT does not expose it because this provider path parses JSON. |
timestamp_granularities | string list | Supported values: word, segment. Requires response_format = "verbose_json"; OSTT sets verbose_json automatically if timestamps are set without an explicit response format. |
Audio Limits
Groq documents direct uploads for flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Current limits are 25 MB on the free tier and 100 MB on the dev tier, with a 10-second minimum billed length. Groq downsamples audio to 16 kHz mono before transcription.