Daniel van Strien's picture

Daniel van Strien PRO

davanstrien

·

https://danielvanstrien.xyz/

AI & ML interests

Machine Learning Librarian

Recent Activity

new activity 30 minutes ago

nvidia/Nemotron-VLM-Dataset-v2:Switch to use hf client library for download

liked a model about 4 hours ago

nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16

updated a dataset about 5 hours ago

data-is-better-together/fineweb-c-progress

View all activity

Organizations

upvoted an article about 17 hours ago

Article

NVIDIA Releases 8 Million Sample Open Dataset and Tooling for OCR, Image Reasoning, Image and Video QA Tasks

By

and 6 others •

about 17 hours ago

• 11

upvoted a collection about 20 hours ago

Granite 4.0 Nano Language Models

9 items • Updated about 19 hours ago • 43

upvoted an article about 20 hours ago

Article

Granite 4.0 Nano: Just how small can you go?

By

and 1 other •

about 20 hours ago

• 48

upvoted 2 articles 2 days ago

Article

Streaming datasets: 100x More Efficient

2 days ago

• 25

Article

Parquet Content-Defined Chunking

Jul 25

• 67

upvoted a collection 5 days ago

Qwen 3 VL - CATMuS

A collection of finetunes of Qwen 3 VL. These models were finetuned on the CATMuS dataset via TRL SFT. • 3 items • Updated 5 days ago • 2

upvoted an article 5 days ago

Article

Unlock the power of images with AI Sheets

8 days ago

• 23

upvoted a paper 5 days ago

PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

Paper • 2510.19060 • Published 8 days ago • 2

upvoted a collection 6 days ago

LightOnOCR

The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR • 6 items • Updated 2 days ago • 12

upvoted 2 articles 6 days ago

Article

LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR

By

and 2 others •

6 days ago

• 52

Article

Building the Open Agent Ecosystem Together: Introducing OpenEnv

6 days ago

• 100

upvoted 3 papers 6 days ago

Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs

Paper • 2508.15877 • Published Aug 21 • 1

olmOCR 2: Unit Test Rewards for Document OCR

Paper • 2510.19817 • Published 7 days ago • 10

DeepSeek-OCR: Contexts Optical Compression

Paper • 2510.18234 • Published 8 days ago • 63

upvoted a changelog 6 days ago

Changelog

Cleaner Collection URLs

6 days ago

• 59

upvoted an article 7 days ago

Article

Sentence Transformers is joining Hugging Face!

7 days ago

• 71

upvoted an article 8 days ago

Article

Supercharge your OCR Pipelines with Open Models

8 days ago

• 210

upvoted a collection 9 days ago

Icelandic OCR

3 items • Updated 28 days ago • 1

upvoted a collection 16 days ago

Nanonets-OCR2

2 items • Updated 16 days ago • 24

upvoted an article 20 days ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

• 309