AI & ML interests

Creation of Language Resources and Models for Bulgarian and Multilingual NLP.

Artificial Intelligence and Language Technologies Department at IICT-BAS

Welcome to the HuggingFace organization page of the Artificial Intelligence and Language Technologies Department at the Institute of Information and Communication Technologies, Bulgarian Academy of Sciences!

The department focuses on developing language resources, theoretical machine learning, information retrieval, speech recognition and generation, and language models development for Bulgarian NLP applications.

This repository offers openly available pre-trained language models designed for the Bulgarian language:

  • ModernBERT based models
    • base (149M) and large (395M) variants with 8192 tokens context length
  • BERT based models
    • base (124M) and large (355M) both uncased and cased variants
    • extra large variant (859M)
  • T5 based models
    • 403M and 1.1B variants
    • 470M with character level tokenization suitable for spelling correction tasks

datasets 0

None public yet