Markus Heiervang PRO
marksverdhei
AI & ML interests
NLP
Organizations
posted
an
update
5 days ago
reacted to
tomaarsen's
post with π₯
29 days ago
Post
3029
π¦βπ₯ I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details:
- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just
- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing
- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass
- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!
- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.
Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0
I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just
device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing
dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass
output_scores=True to get similarity scores returned. This can be useful for some distillation losses!- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!
- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.
Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0
I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
reacted to
tomaarsen's
post with β€οΈ
4 months ago
Post
5745
ModernBERT goes MULTILINGUAL! One of the most requested models I've seen, The Johns Hopkins University's CLSP has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT.
Model details:
- 2 model sizes:
- jhu-clsp/mmBERT-small
- jhu-clsp/mmBERT-base
- Uses the ModernBERT architecture, but with the Gemma2 multilingual tokenizer (so: flash attention, alternating global/local attention, unpadding/sequence packing, etc.)
- Maximum sequence length of 8192 tokens, on the high end for encoders
- Trained on 1833 languages using DCLM, FineWeb2, and many more sources
- 3 training phases: 2.3T tokens pretraining on 60 languages, 600B tokens mid-training on 110 languages, and 100B tokens decay training on all 1833 languages.
- Both models are MIT Licensed, and the full datasets and intermediary checkpoints are also publicly released
Evaluation details:
- Very competitive with ModernBERT at equivalent sizes on English (GLUE, MTEB v2 English after finetuning)
- Consistently outperforms equivalently sized models on all Multilingual tasks (XTREME, classification, MTEB v2 Multilingual after finetuning)
- In short: beats commonly used multilingual base models like mDistilBERT, XLM-R (multilingual RoBERTa), multilingual MiniLM, etc.
- Additionally: the ModernBERT-based mmBERT is much faster than the alternatives due to its architectural benefits. Easily up to 2x throughput in common scenarios.
Check out the full blogpost with more details. It's super dense & gets straight to the point: https://huggingface.co/blog/mmbert
Based on these results, mmBERT should be the new go-to multilingual encoder base models at 300M and below. Do note that the mmBERT models are "base" models, i.e. they're currently only trained to perform Mask Filling. They'll need to be finetuned for downstream tasks like semantic search, classification, clustering, etc.
Model details:
- 2 model sizes:
- jhu-clsp/mmBERT-small
- jhu-clsp/mmBERT-base
- Uses the ModernBERT architecture, but with the Gemma2 multilingual tokenizer (so: flash attention, alternating global/local attention, unpadding/sequence packing, etc.)
- Maximum sequence length of 8192 tokens, on the high end for encoders
- Trained on 1833 languages using DCLM, FineWeb2, and many more sources
- 3 training phases: 2.3T tokens pretraining on 60 languages, 600B tokens mid-training on 110 languages, and 100B tokens decay training on all 1833 languages.
- Both models are MIT Licensed, and the full datasets and intermediary checkpoints are also publicly released
Evaluation details:
- Very competitive with ModernBERT at equivalent sizes on English (GLUE, MTEB v2 English after finetuning)
- Consistently outperforms equivalently sized models on all Multilingual tasks (XTREME, classification, MTEB v2 Multilingual after finetuning)
- In short: beats commonly used multilingual base models like mDistilBERT, XLM-R (multilingual RoBERTa), multilingual MiniLM, etc.
- Additionally: the ModernBERT-based mmBERT is much faster than the alternatives due to its architectural benefits. Easily up to 2x throughput in common scenarios.
Check out the full blogpost with more details. It's super dense & gets straight to the point: https://huggingface.co/blog/mmbert
Based on these results, mmBERT should be the new go-to multilingual encoder base models at 300M and below. Do note that the mmBERT models are "base" models, i.e. they're currently only trained to perform Mask Filling. They'll need to be finetuned for downstream tasks like semantic search, classification, clustering, etc.
reacted to
MonsterMMORPG's
post with π₯
about 1 year ago
Post
2006
FLUX Redux is a hidden Gem
I am still doing huge research to publish an amazing fully Public - no paywalled Tutorial, but this is generated via SwarmUI
Style Model Merge Strength : 0.5
FLUX Guidance Scale is : 6
Used base model is my FLUX fine tuned model with 256 images via Kohya SS GUI as shown in tutorial ( https://youtu.be/FvpWy1x5etM ) - 70 epoch
Prompt : anime ohwx man walking in a jungle <segment:yolo-face_yolov9c.pt-1,0.7,0.5> ohwx man, anime
I am still doing huge research to publish an amazing fully Public - no paywalled Tutorial, but this is generated via SwarmUI
Style Model Merge Strength : 0.5
FLUX Guidance Scale is : 6
Used base model is my FLUX fine tuned model with 256 images via Kohya SS GUI as shown in tutorial ( https://youtu.be/FvpWy1x5etM ) - 70 epoch
Prompt : anime ohwx man walking in a jungle <segment:yolo-face_yolov9c.pt-1,0.7,0.5> ohwx man, anime
reacted to
mlabonne's
post with π₯
over 1 year ago
Post
19553
βοΈ Uncensor any LLM with abliteration
I wrote an article about abliteration and how NeuralDaredevil-8B was created. Beyond removing alignment, I believe it's an interesting technique with a lot of potential. It's basically fine-tuning without retraining.
In this article, we see how it works, implement it in Google Colab, and heal the abliterated model to recover the performance drop due to this technique. The final model is an uncensored and high-quality model with the highest MMLU score on the Open LLM Leaderboard (8B category).
https://huggingface.co/blog/mlabonne/abliteration
I wrote an article about abliteration and how NeuralDaredevil-8B was created. Beyond removing alignment, I believe it's an interesting technique with a lot of potential. It's basically fine-tuning without retraining.
In this article, we see how it works, implement it in Google Colab, and heal the abliterated model to recover the performance drop due to this technique. The final model is an uncensored and high-quality model with the highest MMLU score on the Open LLM Leaderboard (8B category).
https://huggingface.co/blog/mlabonne/abliteration