SparseLLM
/

BlockFFN-Large

@@ -1,18 +1,50 @@
 ---
-license: apache-2.0
 language:
 - en
 - zh
 pipeline_tag: text-generation
 ---
 # BlockFFN-Large
 This is the original 0.8B BlockFFN checkpoint used in the paper *BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity* for acceleration tests.
-You can load and use this model simply by using `AutoTokenizer` and `AutoModelForCausalLM`.
 Links: [[Paper](https://arxiv.org/pdf/2507.08771)] [[Codes](https://github.com/thunlp/BlockFFN)]
 ### Citation
 If you find our work useful for your research, please kindly cite our paper as follows:
@@ -24,5 +56,4 @@ If you find our work useful for your research, please kindly cite our paper as f
       journal={arXiv preprint arXiv:2507.08771},
       year={2025},
       url={https://arxiv.org/pdf/2507.08771},
-}
-```

 ---
 language:
 - en
 - zh
+license: apache-2.0
 pipeline_tag: text-generation
+library_name: transformers
+tags:
+- moe
+- llm
+- acceleration
 ---
 # BlockFFN-Large
 This is the original 0.8B BlockFFN checkpoint used in the paper *BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity* for acceleration tests.
 Links: [[Paper](https://arxiv.org/pdf/2507.08771)] [[Codes](https://github.com/thunlp/BlockFFN)]
+### How to use
+You can load and use this model directly with the `transformers` library. Ensure you set `trust_remote_code=True` due to the custom architecture.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_name = "SparseLLM/BlockFFN-Large"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+model.eval() # Set model to evaluation mode
+text = "The quick brown fox jumps over the lazy"
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+# Generate text
+outputs = model.generate(**inputs, max_new_tokens=20, do_sample=True, temperature=0.8, top_p=0.8)
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+```
 ### Citation
 If you find our work useful for your research, please kindly cite our paper as follows:
       journal={arXiv preprint arXiv:2507.08771},
       year={2025},
       url={https://arxiv.org/pdf/2507.08771},
+}