Improve model card: Add pipeline tag, library name, tags, and citation
Browse filesThis PR enhances the model card for `Spark-VL-7B` by adding crucial metadata and completing the citation:
* **`pipeline_tag: video-text-to-text`**: Improves discoverability on the Hugging Face Hub for multimodal models that generate text from both image and video inputs, reflecting the model's capabilities as an LVLM.
* **`library_name: transformers`**: Enables the automated "How to use" widget, as the model is fully compatible with the `transformers` library, evidenced by the provided sample usage.
* **`tags: [lvlm, reasoning, multimodal, qwen]`**: Adds additional descriptive tags for better searchability and categorization, aligned with the model's architecture (Qwen base) and capabilities (LVLM, reasoning, multimodal).
* **Citation**: Populates the **Citation** section with the correct BibTeX entry from the associated paper, replacing the "TBD" placeholder.
These changes will make the model more easily discoverable and ensure its information is consistent and accurate.
|
@@ -1,9 +1,16 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
-
datasets:
|
| 4 |
-
- TIGER-Lab/ViRL39K
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen2.5-VL-7B-Instruct
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
<p align="center">
|
|
@@ -106,7 +113,53 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" \
|
|
| 106 |
```
|
| 107 |
|
| 108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
## ✒️Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
```
|
| 111 |
-
|
| 112 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen2.5-VL-7B-Instruct
|
| 4 |
+
datasets:
|
| 5 |
+
- TIGER-Lab/ViRL39K
|
| 6 |
+
license: mit
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: video-text-to-text
|
| 9 |
+
tags:
|
| 10 |
+
- lvlm
|
| 11 |
+
- reasoning
|
| 12 |
+
- multimodal
|
| 13 |
+
- qwen
|
| 14 |
---
|
| 15 |
|
| 16 |
<p align="center">
|
|
|
|
| 113 |
```
|
| 114 |
|
| 115 |
|
| 116 |
+
## Training
|
| 117 |
+
|
| 118 |
+
### Spark Training
|
| 119 |
+
After downloading the dataset, you can start training using the following example bash script. Our bash scripts are in ```/Spark/Lmm_XC/XC/scripts/spark_training```
|
| 120 |
+
You need to modify the dataset paths and model paths to your own locations.
|
| 121 |
+
```
|
| 122 |
+
export WORKSPACE_DIR="/fs-computility/....../Lmm_XC" # Path to project root directory
|
| 123 |
+
export DATASET_PATH="/fs-computility/....../infer_data_ViRL_19k.json" # Path to your dataset
|
| 124 |
+
export PRETRAIN_MODEL_PATH="/fs-computility/....../Qwen2.5-VL-7B-Instruct" # Path to pretrained model
|
| 125 |
+
export WANDB_PROJECT="Observation" # Name for this project
|
| 126 |
+
export MODEL_CPK_NAME="Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2" # Name for this training run
|
| 127 |
+
export LOG_PATH='/fs-computility/....../Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2.txt' #Log file save path
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
export WANDB_API_KEY="......"
|
| 131 |
+
export SAVE_PATH="/fs-computility/....../${WANDB_PROJECT}/${MODEL_CPK_NAME}" # Absolute path to save everything about this training run
|
| 132 |
+
export CKPT_PATH="${SAVE_PATH}/ckpt" # Path to save checkpoints
|
| 133 |
+
export FINAL_CKPT_PATH="${SAVE_PATH}/final_ckpt" # Path to save final checkpoints
|
| 134 |
+
export TIMESTAMP=$(date +%Y%m%d_%H%M%S) # Timestamp
|
| 135 |
+
export CUR_LOG_DIR="${SAVE_PATH}/training_logs/${TIMESTAMP}" # Path to save current run logs
|
| 136 |
+
export LOG_DIR="${SAVE_PATH}/tb_logs"
|
| 137 |
+
```
|
| 138 |
+
⏰ Attention:
|
| 139 |
+
```
|
| 140 |
+
export DEV_MODE=0 # Set to 1 for debug mode on single dev machine
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
## Evaluation
|
| 144 |
+
The integrated multimodal mathematics dataset can be downloaded from 🤗<a href="https://huggingface.co/datasets/internlm/Spark-Data">datasets</a> and evaluated using the scripts provided in the `Evaluation` folder. The evaluation results will be stored, and accuracy can subsequently be computed with the `calculate_acc.py` file.
|
| 145 |
+
```
|
| 146 |
+
bash ./Evaluation/eval_spark_vl_7b.sh
|
| 147 |
+
python calculate_acc.py --result_path ./your_result_path.json
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
## ✒️Citation
|
| 151 |
+
```bibtex
|
| 152 |
+
@article{liu2025spark,
|
| 153 |
+
title={SPARK: Synergistic Policy And Reward Co-Evolving Framework},
|
| 154 |
+
author={Ziyu Liu and Yuhang Zang and Shengyuan Ding and Yuhang Cao and Xiaoyi Dong and Haodong Duan and Dahua Lin and Jiaqi Wang},
|
| 155 |
+
journal={arXiv preprint arXiv:2509.22624},
|
| 156 |
+
year={2025}
|
| 157 |
+
}
|
| 158 |
```
|
| 159 |
+
|
| 160 |
+
## 📄 License
|
| 161 |
+
  **Usage and License Notices**: The data and code are intended and licensed for research use only.
|
| 162 |
+
License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
|
| 163 |
+
|
| 164 |
+
## Acknowledgement
|
| 165 |
+
We sincerely thank projects <a href="https://github.com/TideDra/lmm-r1">lmm-r1</a> and <a href="https://github.com/OpenRLHF/OpenRLHF">OpenRLHF</a> for providing their open-source resources.
|