Add XLM-R tokenizer files (#2)
Browse files- Copy XLM-R tokenizer to this repo (007e82656cdb52dda51a3f6b674aa57df98ba058)
Co-authored-by: Jannis Vamvas <jvamvas@users.noreply.huggingface.co>
- README.md +2 -8
- tokenizer.json +0 -0
- tokenizer_config.json +4 -0
README.md
CHANGED
|
@@ -93,13 +93,7 @@ Because it has been pre-trained with language-specific modular components (_lang
|
|
| 93 |
# Usage
|
| 94 |
|
| 95 |
## Tokenizer
|
| 96 |
-
This model reuses the tokenizer of [XLM-R](https://huggingface.co/xlm-roberta-base)
|
| 97 |
-
|
| 98 |
-
```python
|
| 99 |
-
from transformers import AutoTokenizer
|
| 100 |
-
|
| 101 |
-
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
|
| 102 |
-
```
|
| 103 |
|
| 104 |
## Input Language
|
| 105 |
Because this model uses language adapters, you need to specify the language of your input so that the correct adapter can be activated:
|
|
@@ -107,7 +101,7 @@ Because this model uses language adapters, you need to specify the language of y
|
|
| 107 |
```python
|
| 108 |
from transformers import XmodModel
|
| 109 |
|
| 110 |
-
model = XmodModel.from_pretrained("
|
| 111 |
model.set_default_language("en_XX")
|
| 112 |
```
|
| 113 |
|
|
|
|
| 93 |
# Usage
|
| 94 |
|
| 95 |
## Tokenizer
|
| 96 |
+
This model reuses the tokenizer of [XLM-R](https://huggingface.co/xlm-roberta-base).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
## Input Language
|
| 99 |
Because this model uses language adapters, you need to specify the language of your input so that the correct adapter can be activated:
|
|
|
|
| 101 |
```python
|
| 102 |
from transformers import XmodModel
|
| 103 |
|
| 104 |
+
model = XmodModel.from_pretrained("facebook/xmod-base")
|
| 105 |
model.set_default_language("en_XX")
|
| 106 |
```
|
| 107 |
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"tokenizer_class": "XLMRobertaTokenizer"
|
| 3 |
+
}
|
| 4 |
+
|