Hello,I encountered the same problem. But now I find a solution, which was found on the github repository prulong.https://github.com/princeton-pli/PruLong/blob/main/prulong/training/dataset.py.
StreamingDataset provides a replication parameter, which you can set to seq_parallel_size. In this way, you can get the same inputs in the same seq_parrallel_group.
Young
Yiny11
AI & ML interests
None yet
Recent Activity
commented on
an
article
24 days ago
Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation
liked
a model
about 1 month ago
openbmb/MiniCPM4.1-8B
updated
a model
about 1 month ago
Yiny11/my_awesome_eli5_clm-model
Organizations
None yet