nocaptions

by mmichelli - opened Sep 22

Sep 22

•

The <|nocaptions|> token is missing.

from faster_whisper import WhisperModel

model = WhisperModel("NbAiLab/nb-whisper-large", device="cuda")
nocaptions_token_id = model.hf_tokenizer.token_to_id("<|nocaptions|>")
print(f"<|nocaptions|> token ID: {nocaptions_token_id}")

<|nocaptions|> token ID: None

With the tiny model, which has been updated more recently:
<|nocaptions|> token ID: 50362

versae

Nasjonalbiblioteket AI Lab org Sep 22

Hi,

It was renamed to <|nospeech|> in the later versions of the large Whisper.

...
    {
      "id": 50363,
      "content": "<|nospeech|>",
      "single_word": false,
      "lstrip": false,
      "rstrip": false,
      "normalized": false,
      "special": true
    },
...

Cheers.

versae changed discussion status to closed Sep 22

mmichelli

Sep 25

Thanks :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment