Image-Text-to-Text
Transformers
TensorBoard
Safetensors
feature-extraction
conversational
custom_code
xiangan commited on
Commit
9072dbf
Β·
verified Β·
1 Parent(s): ae3e394

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -199,4 +199,21 @@ If you find *LLaVA-OneVision-1.5* useful in your research, please consider to ci
199
  journal={Transactions on Machine Learning Research}
200
  year={2024}
201
  }
202
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
199
  journal={Transactions on Machine Learning Research}
200
  year={2024}
201
  }
202
+ ```
203
+
204
+
205
+ ## Acknowledgement
206
+
207
+ We extend our sincere gratitude to **AIAK team of the** [**Baige AI computing platform**](https://cloud.baidu.com/product/aihc.html) **from Baidu AI Cloud** for providing the exceptional training framework. The outstanding capabilities of AIAK-Training-LLM and AIAK-Megatron have significantly accelerated our training process with remarkable efficiency. These cutting-edge frameworks have been instrumental in achieving our research goals. `To get full AIAK support, you can contact Baidu Cloud.`
208
+
209
+ We acknowledge the support of [Synvo AI](https://synvo.ai/) for contributing to the partial data annotation in this work, and also thank the maintainers and contributors of the following open-source projects, whose work greatly inspired and supported our research:
210
+
211
+ - LLaVA: Large Language-and-Vision Assistant β€” [LLaVA](https://github.com/haotian-liu/LLaVA)
212
+ - LLaVA-NeXT: Next-generation multi-modal assistant β€” [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT)
213
+ - lmms-eval: A standardized evaluation framework for Large Multimodal Models β€” [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)
214
+ - Megatron-LM: Efficient, scalable training for large language models β€” [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
215
+ - Qwen2.5-VL: Strong vision-language foundation model β€” [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL)
216
+ - InternVL: Open-source large-scale vision-language foundation model β€” [InternVL](https://github.com/OpenGVLab/InternVL)
217
+ - Qwen3: Next-generation Qwen LLM β€” [Qwen](https://github.com/QwenLM/Qwen)
218
+ - MetaCLIP: Scalable contrastive pretraining β€” [MetaCLIP](https://github.com/facebookresearch/MetaCLIP)
219
+ - FineVision: Open Data Is All You Need β€” [FineVision](https://huggingface.co/spaces/HuggingFaceM4/FineVision)