Improve model card: Update `library_name`, add relevant tags, and update paper links (#4)
Browse files- Improve model card: Update `library_name`, add relevant tags, and update paper links (bb4f0fd524b3ab6ac6b2602d935540e8370abfe4)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,13 +1,17 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
-
library_name: LongCat-Video
|
| 4 |
-
pipeline_tag: text-to-video
|
| 5 |
-
tags:
|
| 6 |
-
- transformers
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
- zh
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
|
|
|
| 11 |
# LongCat-Video
|
| 12 |
|
| 13 |
<div align="center">
|
|
@@ -17,7 +21,7 @@ language:
|
|
| 17 |
|
| 18 |
<div align="center" style="line-height: 1;">
|
| 19 |
<a href='https://meituan-longcat.github.io/LongCat-Video/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
|
| 20 |
-
<a href='https://
|
| 21 |
<a href='https://huggingface.co/meituan-longcat/LongCat-Video'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a>
|
| 22 |
</div>
|
| 23 |
|
|
@@ -39,7 +43,13 @@ We introduce LongCat-Video, a foundational video generation model with 13.6B par
|
|
| 39 |
- 🌟 **Efficient inference**: LongCat-Video generates $720p$, $30fps$ videos within minutes by employing a coarse-to-fine generation strategy along both the temporal and spatial axes. Block Sparse Attention further enhances efficiency, particularly at high resolutions
|
| 40 |
- 🌟 **Strong performance with multi-reward RLHF**: Powered by multi-reward Group Relative Policy Optimization (GRPO), comprehensive evaluations on both internal and public benchmarks demonstrate that LongCat-Video achieves performance comparable to leading open-source video generation models as well as the latest commercial solutions.
|
| 41 |
|
| 42 |
-
For more detail, please refer to the comprehensive [***LongCat-Video Technical Report***](https://
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
## Quick Start
|
| 45 |
|
|
@@ -126,6 +136,16 @@ torchrun run_demo_long_video.py --checkpoint_dir=./weights/LongCat-Video --enabl
|
|
| 126 |
torchrun --nproc_per_node=2 run_demo_long_video.py --context_parallel_size=2 --checkpoint_dir=./weights/LongCat-Video --enable_compile
|
| 127 |
```
|
| 128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
### Run Streamlit
|
| 130 |
|
| 131 |
```shell
|
|
@@ -166,13 +186,20 @@ The *Image-to-Video* MOS evaluation results on our internal benchmark.
|
|
| 166 |
| Motion Quality↑ | 3.77 | 3.80 | 3.79 | 3.59 |
|
| 167 |
| Overall Quality↑ | 3.35 | 3.27 | 3.26 | 3.17 |
|
| 168 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
## License Agreement
|
| 170 |
|
| 171 |
The **model weights** are released under the **MIT License**.
|
| 172 |
|
| 173 |
Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents.
|
| 174 |
|
| 175 |
-
See the [LICENSE](LICENSE) file for the full license text.
|
| 176 |
|
| 177 |
## Usage Considerations
|
| 178 |
This model has not been specifically designed or comprehensively evaluated for every possible downstream application.
|
|
@@ -186,14 +213,14 @@ Nothing in this Model Card should be interpreted as altering or restricting the
|
|
| 186 |
We kindly encourage citation of our work if you find it useful.
|
| 187 |
|
| 188 |
```
|
| 189 |
-
@misc{
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
}
|
| 198 |
```
|
| 199 |
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- zh
|
| 5 |
+
library_name: diffusers
|
| 6 |
+
license: mit
|
| 7 |
+
pipeline_tag: text-to-video
|
| 8 |
+
tags:
|
| 9 |
+
- transformers
|
| 10 |
+
- diffusers
|
| 11 |
+
- image-to-video
|
| 12 |
+
- video-continuation
|
| 13 |
---
|
| 14 |
+
|
| 15 |
# LongCat-Video
|
| 16 |
|
| 17 |
<div align="center">
|
|
|
|
| 21 |
|
| 22 |
<div align="center" style="line-height: 1;">
|
| 23 |
<a href='https://meituan-longcat.github.io/LongCat-Video/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
|
| 24 |
+
<a href='https://huggingface.co/papers/2510.22200'><img src='https://img.shields.io/badge/Paper-HuggingFace-red'></a>
|
| 25 |
<a href='https://huggingface.co/meituan-longcat/LongCat-Video'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a>
|
| 26 |
</div>
|
| 27 |
|
|
|
|
| 43 |
- 🌟 **Efficient inference**: LongCat-Video generates $720p$, $30fps$ videos within minutes by employing a coarse-to-fine generation strategy along both the temporal and spatial axes. Block Sparse Attention further enhances efficiency, particularly at high resolutions
|
| 44 |
- 🌟 **Strong performance with multi-reward RLHF**: Powered by multi-reward Group Relative Policy Optimization (GRPO), comprehensive evaluations on both internal and public benchmarks demonstrate that LongCat-Video achieves performance comparable to leading open-source video generation models as well as the latest commercial solutions.
|
| 45 |
|
| 46 |
+
For more detail, please refer to the comprehensive [***LongCat-Video Technical Report***](https://huggingface.co/papers/2510.22200).
|
| 47 |
+
|
| 48 |
+
## 🎥 Teaser Video
|
| 49 |
+
|
| 50 |
+
<div align="center">
|
| 51 |
+
<video src="https://github.com/user-attachments/assets/00fa63f0-9c4e-461a-a79e-c662ad596d7d" width="2264" height="384"> </video>
|
| 52 |
+
</div>
|
| 53 |
|
| 54 |
## Quick Start
|
| 55 |
|
|
|
|
| 136 |
torchrun --nproc_per_node=2 run_demo_long_video.py --context_parallel_size=2 --checkpoint_dir=./weights/LongCat-Video --enable_compile
|
| 137 |
```
|
| 138 |
|
| 139 |
+
### Run Interactive Video Generation
|
| 140 |
+
|
| 141 |
+
```shell
|
| 142 |
+
# Single-GPU inference
|
| 143 |
+
torchrun run_demo_interactive_video.py --checkpoint_dir=./weights/LongCat-Video --enable_compile
|
| 144 |
+
|
| 145 |
+
# Multi-GPU inference
|
| 146 |
+
torchrun --nproc_per_node=2 run_demo_interactive_video.py --context_parallel_size=2 --checkpoint_dir=./weights/LongCat-Video --enable_compile
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
### Run Streamlit
|
| 150 |
|
| 151 |
```shell
|
|
|
|
| 186 |
| Motion Quality↑ | 3.77 | 3.80 | 3.79 | 3.59 |
|
| 187 |
| Overall Quality↑ | 3.35 | 3.27 | 3.26 | 3.17 |
|
| 188 |
|
| 189 |
+
## Community Works
|
| 190 |
+
|
| 191 |
+
Community works are welcome! Please PR or inform us in Issue to add your work.
|
| 192 |
+
|
| 193 |
+
- [CacheDiT](https://github.com/vipshop/cache-dit) offers Fully Cache Acceleration support for LongCat-Video with DBCache and TaylorSeer, achieved nearly 1.7x speedup without obvious loss of precision. Visit their [example](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_longcat_video.py) for more details.
|
| 194 |
+
|
| 195 |
+
|
| 196 |
## License Agreement
|
| 197 |
|
| 198 |
The **model weights** are released under the **MIT License**.
|
| 199 |
|
| 200 |
Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents.
|
| 201 |
|
| 202 |
+
See the [LICENSE](https://huggingface.co/meituan-longcat/LongCat-Video/blob/main/LICENSE) file for the full license text.
|
| 203 |
|
| 204 |
## Usage Considerations
|
| 205 |
This model has not been specifically designed or comprehensively evaluated for every possible downstream application.
|
|
|
|
| 213 |
We kindly encourage citation of our work if you find it useful.
|
| 214 |
|
| 215 |
```
|
| 216 |
+
@misc{meituanlongcatteam2025longcatvideotechnicalreport,
|
| 217 |
+
title={LongCat-Video Technical Report},
|
| 218 |
+
author={Meituan LongCat Team and Xunliang Cai and Qilong Huang and Zhuoliang Kang and Hongyu Li and Shijun Liang and Liya Ma and Siyu Ren and Xiaoming Wei and Rixu Xie and Tong Zhang},
|
| 219 |
+
year={2025},
|
| 220 |
+
eprint={2510.22200},
|
| 221 |
+
archivePrefix={arXiv},
|
| 222 |
+
primaryClass={cs.CV},
|
| 223 |
+
url={https://arxiv.org/abs/2510.22200},
|
| 224 |
}
|
| 225 |
```
|
| 226 |
|