loop utterance

#2
by vinnitu - opened

How to handle long files?

Like

model = gigaam.load_model("ctc", use_flash=False)
recognition_result = model.transcribe_longform("long_example.wav")

for utterance in recognition_result:
transcription = utterance["transcription"]
start, end = utterance["boundaries"]
print(f"[{gigaam.format_time(start)} - {gigaam.format_time(end)}]: {transcription}")

in non-onnx version here

https://colab.research.google.com/github/salute-developers/GigaAM/blob/main/inference_example.ipynb#scrollTo=DI_tb_N918FS

You can use VAD (voice activity detection). There's an example at the link:
https://github.com/istupakov/onnx-asr?tab=readme-ov-file#vad

Sign up or log in to comment