DeepInfra

DeepInfra hosts open speech recognition models behind its inference API. OSTT supports DeepInfra-hosted Whisper models and Voxtral speech-recognition models.

DeepInfra documentation:

Models

Model ID	Notes
`deepinfra/openai/whisper-large-v3`	High-accuracy multilingual Whisper Large V3 model.
`deepinfra/openai/whisper-large-v3-turbo`	Faster pruned Whisper Large V3 Turbo model. DeepInfra lists this at `$0.00020 / minute`.
`deepinfra/openai/whisper-large`	DeepInfra-documented best-accuracy Whisper model.
`deepinfra/openai/whisper-medium`	Faster, lighter Whisper model.
`deepinfra/openai/whisper-small`	Smaller Whisper model for lightweight transcription.
`deepinfra/openai/whisper-base`	Smallest supported Whisper model.
`deepinfra/openai/whisper-timestamped-medium`	Whisper Medium variant documented for per-word timestamp segmentation.
`deepinfra/mistralai/Voxtral-Mini-3B-2507`	Voxtral Mini speech-recognition model for transcription, translation, and audio understanding.
`deepinfra/mistralai/Voxtral-Small-24B-2507`	Larger Voxtral speech-recognition model.

Params

toml

[deepinfra."openai/whisper-large-v3".params]
language = "sv"
initial_prompt = "Names: OSTT, DeepInfra, Whisper."
temperature = 0.0
task = "transcribe"
chunk_level = "segment"
chunk_length_s = 30

bash

ostt transcribe meeting.mp3 -m deepinfra/openai/whisper-large-v3 --param language=sv --param initial_prompt=OSTT
ostt model params deepinfra/openai/whisper-large-v3-turbo --format json

Param	Type	Description
`language`	string	Optional language hint.
`initial_prompt`	string	Optional text prompt for the first transcription window. Saved `ostt keyword` terms are used as fallback only when `initial_prompt` is not set.
`temperature`	number	Sampling temperature, `0.0` to `1.0`.
`task`	string	Supported values: `transcribe`, `translate`.
`chunk_level`	string	Supported values: `segment`, `word`. DeepInfra documents this as the chunk level for timestamp segmentation.
`chunk_length_s`	integer	Chunk length in seconds. DeepInfra documents `1` to `30`, default `30`.

Audio Formats

DeepInfra's speech API documents direct upload support for mp3 and wav. Responses include a top-level text field and segment timestamps; OSTT returns the transcript text.

DeepInfra ​

Models ​

Params ​

Audio Formats ​

DeepInfra

Models

Params

Audio Formats