nocaptions
#4
by
mmichelli
- opened
The <|nocaptions|> token is missing.
from faster_whisper import WhisperModel
model = WhisperModel("NbAiLab/nb-whisper-large", device="cuda")
nocaptions_token_id = model.hf_tokenizer.token_to_id("<|nocaptions|>")
print(f"<|nocaptions|> token ID: {nocaptions_token_id}")
<|nocaptions|> token ID: None
With the tiny model, which has been updated more recently:
<|nocaptions|> token ID: 50362
Hi,
It was renamed to <|nospeech|> in the later versions of the large Whisper.
...
{
"id": 50363,
"content": "<|nospeech|>",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": false,
"special": true
},
...
Cheers.
versae
changed discussion status to
closed
Thanks :)