RWKV
/

RWKV7-Goose-World2.8-0.1B-HF

@@ -1,5 +1,6 @@
 ---
-license: apache-2.0
 language:
 - en
 - zh
@@ -9,10 +10,9 @@ language:
 - ar
 - es
 - pt
 metrics:
 - accuracy
-base_model:
-- BlinkDL/rwkv-7-world
 pipeline_tag: text-generation
 library_name: transformers
 ---
@@ -44,15 +44,15 @@ This is RWKV-7 model under flash-linear attention format.
 <!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
-- **Paper:** [RWKV-7 "Goose" with Expressive Dynamic State Evolution](https://arxiv.org/abs/2503.14456)
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-Install `flash-linear-attention` <= 0.1.2 and the latest version of `transformers` before using this model:
 ```bash
-pip install --no-use-pep517 flash-linear-attention==0.1.2
 pip install 'transformers>=4.48.0'
 ```
@@ -64,11 +64,9 @@ You can use this model just as any other HuggingFace models:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-191M-world', trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-191M-world', trust_remote_code=True)
-model = model.cuda()
 prompt = "What is a large language model?"
 messages = [
-    {"role": "user", "content": "Who are you?"},
-    {"role": "assistant", "content": "I am a GPT-3 based model."},
     {"role": "user", "content": prompt}
 ]
 text = tokenizer.apply_chat_template(
@@ -81,7 +79,11 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(
     **model_inputs,
-    max_new_tokens=1024,
 )
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)

 ---
+base_model:
+- BlinkDL/rwkv-7-world
 language:
 - en
 - zh
 - ar
 - es
 - pt
+license: apache-2.0
 metrics:
 - accuracy
 pipeline_tag: text-generation
 library_name: transformers
 ---
 <!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
+- **Paper:** https://arxiv.org/abs/2503.14456
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+Install `flash-linear-attention` and the latest version of `transformers` before using this model:
 ```bash
+pip install git+https://github.com/fla-org/flash-linear-attention
 pip install 'transformers>=4.48.0'
 ```
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-191M-world', trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-191M-world', trust_remote_code=True)
+model = model.cuda() # Supported on Nvidia/AMD/Intel eg. model.xpu()
 prompt = "What is a large language model?"
 messages = [
     {"role": "user", "content": prompt}
 ]
 text = tokenizer.apply_chat_template(
 generated_ids = model.generate(
     **model_inputs,
+    max_new_tokens=4096,
+    do_sample=True,
+    temperature=1.0,
+    top_p=0.3,
+    repetition_penalty=1.2
 )
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3f0311309628c416294fd6e21348bec5f228cfff8b8c41f5dd25f5b475fecd3b
-size 764174040

 version https://git-lfs.github.com/spec/v1
+oid sha256:b96a8bdc21e15f71e0c95653dcc3be89e564b619ad5073c9edbfbd07f7849453
+size 382111072