nocaptions

#4
by mmichelli - opened

The <|nocaptions|> token is missing.

from faster_whisper import WhisperModel

model = WhisperModel("NbAiLab/nb-whisper-large", device="cuda")
nocaptions_token_id = model.hf_tokenizer.token_to_id("<|nocaptions|>")
print(f"<|nocaptions|> token ID: {nocaptions_token_id}")

<|nocaptions|> token ID: None

With the tiny model, which has been updated more recently:
<|nocaptions|> token ID: 50362

Nasjonalbiblioteket AI Lab org

Hi,

It was renamed to <|nospeech|> in the later versions of the large Whisper.

...
    {
      "id": 50363,
      "content": "<|nospeech|>",
      "single_word": false,
      "lstrip": false,
      "rstrip": false,
      "normalized": false,
      "special": true
    },
...

Cheers.

versae changed discussion status to closed

Thanks :)

Sign up or log in to comment