loop utterance

by vinnitu - opened Oct 17, 2025

Oct 17, 2025

How to handle long files?

model = gigaam.load_model("ctc", use_flash=False)
recognition_result = model.transcribe_longform("long_example.wav")

for utterance in recognition_result:
transcription = utterance["transcription"]
start, end = utterance["boundaries"]
print(f"[{gigaam.format_time(start)} - {gigaam.format_time(end)}]: {transcription}")

in non-onnx version here

https://colab.research.google.com/github/salute-developers/GigaAM/blob/main/inference_example.ipynb#scrollTo=DI_tb_N918FS

istupakov

Owner Oct 17, 2025

You can use VAD (voice activity detection). There's an example at the link:
https://github.com/istupakov/onnx-asr?tab=readme-ov-file#vad

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment