Berget
Berget is a Swedish cloud provider. OSTT uses Berget for Swedish and Norwegian optimized Whisper models as well as general-purpose Whisper Large V3, with processing hosted on European infrastructure.
Berget documentation:
Models
| Model ID | Notes |
|---|---|
berget/KBLab/kb-whisper-large | Swedish-optimized KB Whisper Large. KBLab reports 50,000+ hours of Swedish speech training and 47% average WER reduction versus OpenAI Whisper Large V3 across FLEURS, CommonVoice, and NST. |
berget/NbAiLab/nb-whisper-large | Norwegian-optimized NB-Whisper Large. NbAiLab reports 66,000 hours of training data and support for Norwegian, Bokmal, Nynorsk, and English. |
berget/openai/whisper-large-v3 | General-purpose multilingual Whisper Large V3. |
Berget lists all three speech-to-text models at €3.00 / 1,000 min.
Params
[berget."KBLab/kb-whisper-large".params]
language = "sv"
hotwords = ["OSTT", "KBLab", "Berget"]
prompt = "Swedish technical dictation."
temperature = 0.0
response_format = "verbose_json"
align = true
diarize = trueostt transcribe meeting.mp3 -m berget/KBLab/kb-whisper-large --param language=sv --param hotwords=OSTT,KBLab --param align=true
ostt model params berget/KBLab/kb-whisper-large --format jsonOSTT always returns plain transcript text. verbose_json, word alignment, and diarization params are supported because Berget responses still include a top-level text field; metadata is not emitted in command output.
| Param | Type | Description |
|---|---|---|
language | string | Optional language hint, such as sv or no. |
hotwords | string list | Berget keyword boosting terms. Saved ostt keyword terms are used as fallback only when hotwords is not set. |
prompt | string | Whisper-compatible context prompt. Saved ostt keyword terms are used as fallback only when prompt is not set. |
temperature | number | Sampling temperature, 0.0 to 1.0. |
response_format | string | Supported values in OSTT: json, verbose_json. Berget also documents text, srt, and vtt, but OSTT does not expose them because this provider path parses JSON. |
timestamp_granularities | string list | Supported values: word, segment. |
align | boolean | Enable word-level timestamp alignment. Berget documents this as adding word start/end timestamps and confidence scores. |
diarize | boolean | Enable speaker diarization with automatic speaker labels. |
speaker_embeddings | boolean | Enable speaker embeddings. |
chunk_size | integer | Chunk size in seconds, 1 to 60. |
batch_size | integer | Processing batch size, 1 to 32. |
Limits
Berget documents supported upload formats mp3, mp4, mpeg, mpga, m4a, wav, and webm, with a maximum file size of 100 MB and a maximum processing time of 30 minutes per request. Streaming is documented as not yet implemented, so OSTT does not expose stream.