Spaces:
Runtime error
This demo loads the
FlaxCLIPVisionBertForMaskedLMpresent in themodeldirectory of this repository. The checkpoint is loaded fromflax-community/clip-vision-bert-cc12m-70kwhich is pre-trained checkpoint with 70k steps.100 random validation set examples are present in the
cc12m_data/vqa_val.tsvwith respective images in thecc12m_data/images_datadirectory.You can get a random example by clicking on
Get a random examplebutton. The caption is tokenized and a random token is masked by replacing it with[MASK].We provide
English Translationof the caption for users who are not well-acquainted with the other languages. This is done usingmtranslateto keep things flexible enough and needs internet connection as it uses the Google Translate API.The model predicts the scores for tokens from the
bert-base-multilingual-uncasedcheckpoint.The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.