Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -24,7 +24,7 @@ language: | |
| 24 | 
             
            - lv
         | 
| 25 | 
             
            - et
         | 
| 26 | 
             
            - bg
         | 
| 27 | 
            -
            -  | 
| 28 | 
             
            - ca
         | 
| 29 | 
             
            - hr
         | 
| 30 | 
             
            - ga
         | 
| @@ -41,93 +41,4 @@ base_model: | |
| 41 | 
             
            - utter-project/EuroLLM-9B
         | 
| 42 | 
             
            ---
         | 
| 43 |  | 
| 44 | 
            -
             | 
| 45 | 
            -
             | 
| 46 | 
            -
            This is the model card for EuroLLM-9B-Instruct. You can also check the pre-trained version: [EuroLLM-9B](https://huggingface.co/utter-project/EuroLLM-9B).
         | 
| 47 | 
            -
             | 
| 48 | 
            -
            - **Developed by:** Unbabel, Instituto Superior Técnico, Instituto de Telecomunicações, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université.
         | 
| 49 | 
            -
            - **Funded by:** European Union.
         | 
| 50 | 
            -
            - **Model type:** A 9B parameter multilingual transfomer LLM.
         | 
| 51 | 
            -
            - **Language(s) (NLP):** Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian. 
         | 
| 52 | 
            -
            - **License:** Apache License 2.0.
         | 
| 53 | 
            -
             | 
| 54 | 
            -
            ## Model Details
         | 
| 55 | 
            -
             | 
| 56 | 
            -
            The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
         | 
| 57 | 
            -
            EuroLLM-9B is a 9B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
         | 
| 58 | 
            -
            EuroLLM-9B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.
         | 
| 59 | 
            -
             | 
| 60 | 
            -
             | 
| 61 | 
            -
            ### Model Description
         | 
| 62 | 
            -
             | 
| 63 | 
            -
            EuroLLM uses a standard, dense Transformer architecture:
         | 
| 64 | 
            -
            - We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
         | 
| 65 | 
            -
            - We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
         | 
| 66 | 
            -
            - We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
         | 
| 67 | 
            -
            - We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.
         | 
| 68 | 
            -
             | 
| 69 | 
            -
            For pre-training, we use 400 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 2,800 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
         | 
| 70 | 
            -
            Here is a summary of the model hyper-parameters:
         | 
| 71 | 
            -
            |                                      |                      |
         | 
| 72 | 
            -
            |--------------------------------------|----------------------|
         | 
| 73 | 
            -
            | Sequence Length                      |      4,096           |
         | 
| 74 | 
            -
            | Number of Layers                     |         42           |
         | 
| 75 | 
            -
            | Embedding Size                       |           4,096      |
         | 
| 76 | 
            -
            | FFN Hidden Size                      |            12,288    |
         | 
| 77 | 
            -
            | Number of Heads                      |        32            |
         | 
| 78 | 
            -
            | Number of KV Heads (GQA)             |         8            |
         | 
| 79 | 
            -
            | Activation Function                  | SwiGLU               |
         | 
| 80 | 
            -
            | Position Encodings                   | RoPE (\Theta=10,000) |
         | 
| 81 | 
            -
            | Layer Norm                           | RMSNorm              |
         | 
| 82 | 
            -
            | Tied Embeddings                      | No                   |
         | 
| 83 | 
            -
            | Embedding Parameters                 | 0.524B               |
         | 
| 84 | 
            -
            | LM Head Parameters                   | 0.524B               |
         | 
| 85 | 
            -
            | Non-embedding Parameters             | 8.105B               |
         | 
| 86 | 
            -
            | Total Parameters                     | 9.154B               |
         | 
| 87 | 
            -
             | 
| 88 | 
            -
            ## Run the model
         | 
| 89 | 
            -
                
         | 
| 90 | 
            -
                from transformers import AutoModelForCausalLM, AutoTokenizer
         | 
| 91 | 
            -
                
         | 
| 92 | 
            -
                model_id = "utter-project/EuroLLM-9B-Instruct"
         | 
| 93 | 
            -
                tokenizer = AutoTokenizer.from_pretrained(model_id)
         | 
| 94 | 
            -
                model = AutoModelForCausalLM.from_pretrained(model_id)
         | 
| 95 | 
            -
             | 
| 96 | 
            -
                messages = [
         | 
| 97 | 
            -
                    {
         | 
| 98 | 
            -
                        "role": "system",
         | 
| 99 | 
            -
                        "content": "You are EuroLLM --- an AI assistant specialized in European languages that provides safe, educational and helpful answers.",
         | 
| 100 | 
            -
                    },
         | 
| 101 | 
            -
                    {
         | 
| 102 | 
            -
                        "role": "user", "content": "What is the capital of Portugal? How would you describe it?"
         | 
| 103 | 
            -
                    },
         | 
| 104 | 
            -
                    ]
         | 
| 105 | 
            -
             | 
| 106 | 
            -
                inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
         | 
| 107 | 
            -
                outputs = model.generate(inputs, max_new_tokens=1024)
         | 
| 108 | 
            -
                print(tokenizer.decode(outputs[0], skip_special_tokens=True))
         | 
| 109 | 
            -
             | 
| 110 | 
            -
            ## Results
         | 
| 111 | 
            -
             | 
| 112 | 
            -
            ### EU Languages
         | 
| 113 | 
            -
             | 
| 114 | 
            -
             | 
| 115 | 
            -
            
         | 
| 116 | 
            -
            **Table 1:** Comparison of open-weight LLMs on multilingual benchmarks. The borda count corresponds to the average ranking of the models (see  ([Colombo et al., 2022](https://arxiv.org/abs/2202.03799))). For Arc-challenge, Hellaswag, and MMLU we are using Okapi datasets ([Lai et al., 2023](https://aclanthology.org/2023.emnlp-demo.28/)) which include 11 languages. For MMLU-Pro and MUSR we translate the English version with Tower ([Alves et al., 2024](https://arxiv.org/abs/2402.17733)) to 6 EU languages.  
         | 
| 117 | 
            -
            \* As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions.
         | 
| 118 | 
            -
             | 
| 119 | 
            -
            The results in Table 1 highlight EuroLLM-9B's superior performance on multilingual tasks compared to other European-developed models (as shown by the Borda count of 1.0), as well as its strong competitiveness with non-European models, achieving results comparable to Gemma-2-9B and outperforming the rest on most benchmarks.
         | 
| 120 | 
            -
             | 
| 121 | 
            -
            ### English
         | 
| 122 | 
            -
             | 
| 123 | 
            -
             | 
| 124 | 
            -
            
         | 
| 125 | 
            -
             | 
| 126 | 
            -
            **Table 2:** Comparison of open-weight LLMs on English general benchmarks.    
         | 
| 127 | 
            -
            \*  As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions.
         | 
| 128 | 
            -
             | 
| 129 | 
            -
            The results in Table 2 demonstrate EuroLLM's strong performance on English tasks, surpassing most European-developed models and matching the performance of Mistral-7B (obtaining the same Borda count).
         | 
| 130 | 
            -
             | 
| 131 | 
            -
            ## Bias, Risks, and Limitations
         | 
| 132 | 
            -
             | 
| 133 | 
            -
            EuroLLM-9B has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).
         | 
|  | |
| 24 | 
             
            - lv
         | 
| 25 | 
             
            - et
         | 
| 26 | 
             
            - bg
         | 
| 27 | 
            +
            - no
         | 
| 28 | 
             
            - ca
         | 
| 29 | 
             
            - hr
         | 
| 30 | 
             
            - ga
         | 
|  | |
| 41 | 
             
            - utter-project/EuroLLM-9B
         | 
| 42 | 
             
            ---
         | 
| 43 |  | 
| 44 | 
            +
            EuroLLM 9B Instruct but ungated
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
