zhuoliang nielsr HF Staff commited on
Commit
03b5552
·
verified ·
1 Parent(s): 842b9ca

Improve model card: Update `library_name`, add relevant tags, and update paper links (#4)

Browse files

- Improve model card: Update `library_name`, add relevant tags, and update paper links (bb4f0fd524b3ab6ac6b2602d935540e8370abfe4)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +43 -16
README.md CHANGED
@@ -1,13 +1,17 @@
1
  ---
2
- license: mit
3
- library_name: LongCat-Video
4
- pipeline_tag: text-to-video
5
- tags:
6
- - transformers
7
  language:
8
  - en
9
  - zh
 
 
 
 
 
 
 
 
10
  ---
 
11
  # LongCat-Video
12
 
13
  <div align="center">
@@ -17,7 +21,7 @@ language:
17
 
18
  <div align="center" style="line-height: 1;">
19
  <a href='https://meituan-longcat.github.io/LongCat-Video/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
20
- <a href='https://github.com/meituan-longcat/LongCat-Video/blob/main/longcatvideo_tech_report.pdf'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
21
  <a href='https://huggingface.co/meituan-longcat/LongCat-Video'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a>
22
  </div>
23
 
@@ -39,7 +43,13 @@ We introduce LongCat-Video, a foundational video generation model with 13.6B par
39
  - 🌟 **Efficient inference**: LongCat-Video generates $720p$, $30fps$ videos within minutes by employing a coarse-to-fine generation strategy along both the temporal and spatial axes. Block Sparse Attention further enhances efficiency, particularly at high resolutions
40
  - 🌟 **Strong performance with multi-reward RLHF**: Powered by multi-reward Group Relative Policy Optimization (GRPO), comprehensive evaluations on both internal and public benchmarks demonstrate that LongCat-Video achieves performance comparable to leading open-source video generation models as well as the latest commercial solutions.
41
 
42
- For more detail, please refer to the comprehensive [***LongCat-Video Technical Report***](https://github.com/meituan-longcat/LongCat-Video/blob/main/longcatvideo_tech_report.pdf).
 
 
 
 
 
 
43
 
44
  ## Quick Start
45
 
@@ -126,6 +136,16 @@ torchrun run_demo_long_video.py --checkpoint_dir=./weights/LongCat-Video --enabl
126
  torchrun --nproc_per_node=2 run_demo_long_video.py --context_parallel_size=2 --checkpoint_dir=./weights/LongCat-Video --enable_compile
127
  ```
128
 
 
 
 
 
 
 
 
 
 
 
129
  ### Run Streamlit
130
 
131
  ```shell
@@ -166,13 +186,20 @@ The *Image-to-Video* MOS evaluation results on our internal benchmark.
166
  | Motion Quality↑ | 3.77 | 3.80 | 3.79 | 3.59 |
167
  | Overall Quality↑ | 3.35 | 3.27 | 3.26 | 3.17 |
168
 
 
 
 
 
 
 
 
169
  ## License Agreement
170
 
171
  The **model weights** are released under the **MIT License**.
172
 
173
  Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents.
174
 
175
- See the [LICENSE](LICENSE) file for the full license text.
176
 
177
  ## Usage Considerations
178
  This model has not been specifically designed or comprehensively evaluated for every possible downstream application.
@@ -186,14 +213,14 @@ Nothing in this Model Card should be interpreted as altering or restricting the
186
  We kindly encourage citation of our work if you find it useful.
187
 
188
  ```
189
- @misc{meituan2025longcatvideotechnicalreport,
190
- title={LongCat-Video Technical Report},
191
- author={Meituan LongCat Team},
192
- year={2025},
193
- eprint={xxx},
194
- archivePrefix={arXiv},
195
- primaryClass={cs.CL},
196
- url={https://arxiv.org/abs/xxx},
197
  }
198
  ```
199
 
 
1
  ---
 
 
 
 
 
2
  language:
3
  - en
4
  - zh
5
+ library_name: diffusers
6
+ license: mit
7
+ pipeline_tag: text-to-video
8
+ tags:
9
+ - transformers
10
+ - diffusers
11
+ - image-to-video
12
+ - video-continuation
13
  ---
14
+
15
  # LongCat-Video
16
 
17
  <div align="center">
 
21
 
22
  <div align="center" style="line-height: 1;">
23
  <a href='https://meituan-longcat.github.io/LongCat-Video/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
24
+ <a href='https://huggingface.co/papers/2510.22200'><img src='https://img.shields.io/badge/Paper-HuggingFace-red'></a>
25
  <a href='https://huggingface.co/meituan-longcat/LongCat-Video'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a>
26
  </div>
27
 
 
43
  - 🌟 **Efficient inference**: LongCat-Video generates $720p$, $30fps$ videos within minutes by employing a coarse-to-fine generation strategy along both the temporal and spatial axes. Block Sparse Attention further enhances efficiency, particularly at high resolutions
44
  - 🌟 **Strong performance with multi-reward RLHF**: Powered by multi-reward Group Relative Policy Optimization (GRPO), comprehensive evaluations on both internal and public benchmarks demonstrate that LongCat-Video achieves performance comparable to leading open-source video generation models as well as the latest commercial solutions.
45
 
46
+ For more detail, please refer to the comprehensive [***LongCat-Video Technical Report***](https://huggingface.co/papers/2510.22200).
47
+
48
+ ## 🎥 Teaser Video
49
+
50
+ <div align="center">
51
+ <video src="https://github.com/user-attachments/assets/00fa63f0-9c4e-461a-a79e-c662ad596d7d" width="2264" height="384"> </video>
52
+ </div>
53
 
54
  ## Quick Start
55
 
 
136
  torchrun --nproc_per_node=2 run_demo_long_video.py --context_parallel_size=2 --checkpoint_dir=./weights/LongCat-Video --enable_compile
137
  ```
138
 
139
+ ### Run Interactive Video Generation
140
+
141
+ ```shell
142
+ # Single-GPU inference
143
+ torchrun run_demo_interactive_video.py --checkpoint_dir=./weights/LongCat-Video --enable_compile
144
+
145
+ # Multi-GPU inference
146
+ torchrun --nproc_per_node=2 run_demo_interactive_video.py --context_parallel_size=2 --checkpoint_dir=./weights/LongCat-Video --enable_compile
147
+ ```
148
+
149
  ### Run Streamlit
150
 
151
  ```shell
 
186
  | Motion Quality↑ | 3.77 | 3.80 | 3.79 | 3.59 |
187
  | Overall Quality↑ | 3.35 | 3.27 | 3.26 | 3.17 |
188
 
189
+ ## Community Works
190
+
191
+ Community works are welcome! Please PR or inform us in Issue to add your work.
192
+
193
+ - [CacheDiT](https://github.com/vipshop/cache-dit) offers Fully Cache Acceleration support for LongCat-Video with DBCache and TaylorSeer, achieved nearly 1.7x speedup without obvious loss of precision. Visit their [example](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_longcat_video.py) for more details.
194
+
195
+
196
  ## License Agreement
197
 
198
  The **model weights** are released under the **MIT License**.
199
 
200
  Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents.
201
 
202
+ See the [LICENSE](https://huggingface.co/meituan-longcat/LongCat-Video/blob/main/LICENSE) file for the full license text.
203
 
204
  ## Usage Considerations
205
  This model has not been specifically designed or comprehensively evaluated for every possible downstream application.
 
213
  We kindly encourage citation of our work if you find it useful.
214
 
215
  ```
216
+ @misc{meituanlongcatteam2025longcatvideotechnicalreport,
217
+ title={LongCat-Video Technical Report},
218
+ author={Meituan LongCat Team and Xunliang Cai and Qilong Huang and Zhuoliang Kang and Hongyu Li and Shijun Liang and Liya Ma and Siyu Ren and Xiaoming Wei and Rixu Xie and Tong Zhang},
219
+ year={2025},
220
+ eprint={2510.22200},
221
+ archivePrefix={arXiv},
222
+ primaryClass={cs.CV},
223
+ url={https://arxiv.org/abs/2510.22200},
224
  }
225
  ```
226