Flo Schneider's picture

Flo Schneider

floschne

·

https://www.inf.uni-hamburg.de/en/inst/ab/lt/people/florian-schneider.html

AI & ML interests

Large Vision-Language Models, Cross-modal Retrieval

Recent Activity

liked a dataset 9 days ago

floschne/gimmick-civqa

upvoted an article 4 months ago

Train 400x faster Static Embedding Models with Sentence Transformers

upvoted an article 4 months ago

🪆 Introduction to Matryoshka Embedding Models

View all activity

Organizations

upvoted 2 articles 4 months ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

Jan 15

• 216

Article

🪆 Introduction to Matryoshka Embedding Models

Feb 23, 2024

• 178

upvoted a collection 4 months ago

GIMMICK

Datasets of the GIMMICK Benchmark • 3 items • Updated Jun 20 • 1

upvoted an article 5 months ago

Article

The Transformers Library: standardizing model definitions

May 15

• 120

upvoted a paper 5 months ago

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134

upvoted 5 papers 8 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 152

MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching

Paper • 2502.12852 • Published Feb 18 • 3

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 207

GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking

Paper • 2502.13766 • Published Feb 19 • 3

How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Paper • 2502.12769 • Published Feb 18 • 3

upvoted a collection 9 months ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 544

upvoted a collection 10 months ago

Centurio

Artifacts of the paper "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model" • 6 items • Updated Feb 4 • 4

upvoted 2 papers 10 months ago

Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model

Paper • 2501.05122 • Published Jan 9 • 20

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 159

upvoted a collection 10 months ago

Qwen2-VL

Vision-language model series based on Qwen2 • 16 items • Updated Jul 21 • 226

upvoted 3 papers 10 months ago

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published Dec 18, 2024 • 18

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published Dec 19, 2024 • 73

upvoted a paper about 1 year ago

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 111

upvoted a collection about 1 year ago

LLaVA-Onevision

LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18, 2024 • 16