AI & ML interests

The Fellowship is a network of exceptional people from different backgrounds who contribute to open-source machine learning ๐Ÿง™โ€โ™‚๏ธ๐Ÿฆธโ€โ™€๏ธ๐Ÿฆน๐Ÿงโ€โ™‚๏ธ

Recent Activity

prithivMLmodsย 
posted an update 4 days ago
view post
Post
3303
Introducing demos for new SOTA models from AI2: SAGE-MM (Smart Any-Horizon Agents for Long-Video Reasoning) and Molmo-2, an open vision-language model that supports multi-image (QA and pointing) and video (QA, pointing, and tracking). The respective demo-related collections are listed below. ๐ŸŽƒ๐Ÿ”ฅ

โœจ SAGE-MM [Video-Reasoning]: prithivMLmods/SAGE-MM-Video-Reasoning
โœจ Molmo2 [Demo]: prithivMLmods/Molmo2-HF-Demo

๐ŸŽƒ GitHub[SAGE-MM]: https://github.com/PRITHIVSAKTHIUR/SAGE-MM-Video-Reasoning
๐ŸŽƒ GitHub[Molmo2]: https://github.com/PRITHIVSAKTHIUR/Molmo2-HF-Demo
๐ŸŽƒ Multimodal Implementations: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update 5 days ago
view post
Post
1968
Introducing TRELLIS.2 Text-to-3D. The demo for the TRELLIS.2-4B (Image-to-3D) model is streamlined with the Z-Image Turbo image generation model to enable Text-to-3D functionality. There is no need for input assets, making a small leap forward for ideation. Optionally, it also includes default support for Image-to-3D inference using direct image assets. Find the demo and related collections below... ๐Ÿค—๐Ÿ”ฅ

โœจ TRELLIS.2-Text-to-3D [Demo]: prithivMLmods/TRELLIS.2-Text-to-3D
โœจ Multimodal Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
โœจ Github: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update 7 days ago
view post
Post
1968
Demo for Molmo2 on Hugging Face is live now, including Single/Multi-Image VQA, Visual Pointing/Grounding, Video VQA, and Video Point Tracking. Find the demo and related collections below. ๐Ÿ”ฅ๐Ÿค—

โ— Molmo2 HF Demo๐Ÿ–ฅ๏ธ: prithivMLmods/Molmo2-HF-Demo
โ— Model Collection: https://huggingface.co/collections/allenai/molmo2
โ— Related Multimodal Space Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update 8 days ago
view post
Post
5489
Introducing the Z Image Turbo LoRA DLC App, a gallery space for plug-and-play Z-Image-Turbo LoRAs. It features a curated collection of impressive LoRAs for generating high-quality images. By default, it runs on the base model. Simply choose a LoRA, type your prompt, and generate images. You can find the app and more details below. ๐Ÿค—๐Ÿงช

โ— Space [Demo]: prithivMLmods/Z-Image-Turbo-LoRA-DLC
โ— Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
โ— Check the list of Z-Image LoRA's: https://huggingface.co/models?other=base_model:adapter:Tongyi-MAI/Z-Image-Turbo
โ— Github: https://github.com/PRITHIVSAKTHIUR/Z-Image-Turbo-LoRA-DLC

Other related image gen spaces:-

โ— FLUX-LoRA-DLC2: prithivMLmods/FLUX-LoRA-DLC2
โ— FLUX-LoRA-DLC: prithivMLmods/FLUX-LoRA-DLC
โ— Qwen-Image-LoRA-DLC: prithivMLmods/Qwen-Image-LoRA-DLC
โ— Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
โ— Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

& more...

To know more about it, visit the app page or the respective model page!
  • 2 replies
ยท
tomaarsenย 
posted an update 13 days ago
view post
Post
2720
๐Ÿฆโ€๐Ÿ”ฅ I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details:

- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.

- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.

- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass output_scores=True to get similarity scores returned. This can be useful for some distillation losses!

- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!

- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.

Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0

I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
prithivMLmodsย 
posted an update 16 days ago
view post
Post
2713
Introducing the D.Markdown Experimental Models, Proxima and Epsilon OCR models, built on top of Qwen3-VL and Qwen2.5-VL respectively. Proxima is optimized for Markdown generation and is capable of embedding inline programming code snippets and generating rich nodes such as HTML, XML, JSON, and YAML. Epsilon is optimized for reconstructing complex layouts including tables, forms, and mathematical content. ๐ŸŒŒโœจ

โ— proxima-ocr-d.markdown-post3.0.l: prithivMLmods/proxima-ocr-d.markdown-post3.0.l
โ— epsilon-ocr-d.markdown-post3.0.m: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m
โ— proxima-ocr-d.markdown-post3.0.l-gguf: prithivMLmods/proxima-ocr-d.markdown-post3.0.l-GGUF
โ— epsilon-ocr-d.markdown-post3.0.m-gguf: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m-GGUF

โ— Collection: https://huggingface.co/collections/prithivMLmods/dynamic-markdowns
โ— Multimodal Apps: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

๐Ÿ‘‰ These models are stage progression models, and currently they may contain artifacts.

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update 17 days ago
view post
Post
1114
Try CUA GUI Operator ๐Ÿ–ฅ๏ธ Space, the demo of some interesting multimodal ultra-compact Computer Use Agent (CUA) models in a single app, including Fara-7B, UI-TARS-1.5-7B, and Holo models, to perform GUI localization tasks.

โ— CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
โ— Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Other related multimodal spaces

โ— Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
โ— Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
โ— Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en

I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.

To know more about it, visit the app page or the respective model page!
  • 1 reply
ยท
prithivMLmodsย 
posted an update 19 days ago
view post
Post
3555
One speech model with seven voices, streamlined with multimodal capabilities for vision tasks. Performs vision(image-text) to audio inference with Qwen2.5-VL + VibeVoice-Realtime-0.5B. Vision to VibeVoice (EN) - The demo is live. ๐Ÿ—ฃ๏ธ๐Ÿ”ฅ

๐Ÿค— Vision-to-VibeVoice-en [Demo]: prithivMLmods/Vision-to-VibeVoice-en
โœจ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
โœจ Speech [VibeVoice-Realtime-0.5B]: microsoft/VibeVoice-Realtime-0.5B
โœจ Vision [Qwen2.5-VL]: Qwen/Qwen2.5-VL-7B-Instruct

To know more about it, visit the app page or the respective model page!
ยท
mrfakenameย 
posted an update 20 days ago
view post
Post
3664
Excited to share that I've joined the Hugging Face Fellows program! ๐Ÿค—

Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! ๐Ÿš€
prithivMLmodsย 
posted an update 23 days ago
view post
Post
3702
Hello everyone,

The
strangerzonehf
[HF] Community / Organization Page, which is maintained by me, has reached the Top 10 Developer Pages ranking at 6th place, contributing 3.4% in the calendar cycle from August 2024 to August 2025. It is also the only South Asia / Indian page in the list. I could not be more proud to be doing things for the community. โค๏ธ๐Ÿค—

Source: https://www.dataprovenance.org/economies-of-open-intelligence.pdf

It is a pleasure to be a part of it.
Thank you!
@prithivMLmods
prithivMLmodsย 
posted an update 27 days ago
view post
Post
10663
Introducing the Super-OCRs Demo, a comparison of state-of-the-art multimodal OCR VLMs, including HunyuanOCR, DeepSeekOCR, Dots, and Nanonets in one space for performing OCR, rendering LaTeX and Markdown, and visual grounding (layout). Find the related Spaces and models below.๐Ÿค—๐Ÿ”ฅ

โœจSuper-OCRs[Demo]: prithivMLmods/Super-OCRs-Demo
โœจCollection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
โœจGitHub: https://github.com/PRITHIVSAKTHIUR/Super-OCRs-Demo

โญ Models Used:
โœฆ HunyuanOCR: tencent/HunyuanOCR
โœฆ DeepSeek-OCR: (-) deepseek-ai/DeepSeek-OCR (+) prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
โœฆ Dots.OCR: (-) rednote-hilab/dots.ocr (+) prithivMLmods/Dots.OCR-Latest-BF16
โœฆ Nanonets-OCR2-3B: nanonets/Nanonets-OCR2-3B

โญ Some Other Relevant Apps:
โœฆ Qwen3-VL-HF-Demo: prithivMLmods/Qwen3-VL-HF-Demo
โœฆ Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
โœฆ Multimodal-OCR: prithivMLmods/Multimodal-OCR
โœฆ Multimodal-OCR2: prithivMLmods/Multimodal-OCR2
โœฆ Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
โœฆ DeepSeek-OCR-experimental: prithivMLmods/DeepSeek-OCR-experimental

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
3220
Introducing the advanced sketch-board editor "Nano-Banana-Pro-Sketch-Board" powered by the Gemini 2.5 Flash Image and Gemini 3 Pro Preview Image models through the Gemini API. This version includes more features than the Nano-Banana-AIO app for drawing and prompt-based concept transformation of freestyle sketches. ๐Ÿ”ฅ๐ŸŒ

โœจNano-Banana-Pro-Sketch-Board: prithivMLmods/Nano-Banana-Pro-Sketch-Board
โœจCollection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
โœจGithub: https://github.com/PRITHIVSAKTHIUR/Nano-Banana-Pro-Sketch-Board
โœจModel-Garden: https://tinyurl.com/4xxs9dvy

Some Other Relevant Apps [OSS]

โญQwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
โญQwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
โญPhoto-Mate-i2i: prithivMLmods/Photo-Mate-i2i
โญKontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2

Note: The Nano-Banana-Pro-Sketch-Board demo requires a Gemini API key for the editing process. Your API key will be removed when the app is reloaded or closed. Your key remains safe and will not be exposed to any medium. Also, the Gemini 3 Pro Preview Image model may require a paid API key from a Google Cloud project with billing enabled.

To know more about it, visit the app info section or the respective Model Garden page!
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
1328
Try the demo of NVIDIA Nemotron Parse v1.1, NVIDIA's latest VLM for understanding document semantics and extracting text and table elements with spatial grounding. It is capable of comprehensive text understanding and document structure analysis in a given document, and can provide bounding boxes with coordinates.

โญSpace[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-OCR
โญModel: nvidia/NVIDIA-Nemotron-Parse-v1.1
โญMultimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Some relevant Spaces

โญDeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
โญQwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
โญMultimodal-OCR3: prithivMLmods/Multimodal-OCR3

Check out the other spaces in the multimodal implementation collection.

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
1500
Try the all-new trending Qwen-Image-Edit-2509 (Multi-Image-Edits) specialized adapter demos, including Cloth-Design-Fuse, Texture Edit, Guided-Objects-Patching, and more โ€” all in a single Hugging Face Space. The demo link is provided below. ๐Ÿค—๐Ÿ”ฅ

โฎž Space[Demo]: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
โฎž Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
โฎž Base Model: Qwen/Qwen-Image-Edit-2509

Similar applicationsโ†—๏ธ

โฎž Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2
โฎž Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
โฎž Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
3530
Made a demo for multimodal understanding of Qwen3-VL space for tasks including point annotation, detection, captioning, guided text inferences, and more. Find the demo link below. ๐Ÿค—โ†—๏ธ

โฎž Space[Demo]: prithivMLmods/Qwen3-VL-HF-Demo
โฎž Model Used: Qwen/Qwen3-VL-4B-Instruct
โฎž Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
โฎž GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-3VL-Multimodal-Understanding

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
3756
Made a small write up and experimental finetuning guide for MetaCLIP2 for Image Classification on Downstream Tasks. The blog titled Fine Tuning MetaCLIP 2 for Image Classification on Downstream Tasks demonstrates the step by step finetuning using CIFAR10 and is also flexible for adapting to other datasets. For more details, check out the linked blog below. ๐Ÿค—โ†—๏ธ

โฎž Blog Article: https://huggingface.co/blog/prithivMLmods/metaclip2-downstream-finetune
โฎž Demo Space[Zero-Shot Classification]: prithivMLmods/metaclip-2-demo

Some other models
โ•ฐโ€บ MetaCLIP-2-Cifar10: prithivMLmods/MetaCLIP-2-Cifar10
โ•ฐโ€บ MetaCLIP-2-Age-Range-Estimator: prithivMLmods/MetaCLIP-2-Age-Range-Estimator
โ•ฐโ€บ MetaCLIP-2-Gender-Identifier: prithivMLmods/MetaCLIP-2-Gender-Identifier
โ•ฐโ€บ MetaCLIP-2-Open-Scene: prithivMLmods/MetaCLIP-2-Open-Scene

โฎž Collection: https://huggingface.co/collections/prithivMLmods/metaclip2-image-classification-experiments

To know more about it, visit the app page or the respective model page!
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
3276
Try the all-new trending Qwen-Image-Edit specialized adapter demos, including Photo-to-Anime, Light Restoration, Multi-Angle Edits, Relighting, and more โ€” all in a single Hugging Face Space. Below is the demo link. ๐Ÿค—๐ŸŒ 

โฎž Demo-Space: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
โฎž How-to-Use: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast#2
โฎž Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
ยท
prithivMLmodsย 
posted an update about 2 months ago
view post
Post
2873
Introducing Photo-Mate-v2, based on FLUX.1-Kontext-dev, for advanced image manipulation tasks. It supports transforming scenes into top-down/bottom-up perspectives, CAM-right/left-view and its reverse, as well as general kontext-specified object removal. Below is the list of demos and adapters.๐Ÿ”ฅ๐Ÿค—

โžค Spaces [Demo] : https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2

Kontext-Adapters :
โœฆ Kontext-Bottom-Up-View: prithivMLmods/Kontext-Bottom-Up-View
โœฆ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
โœฆ Kontext-Top-Down-View: prithivMLmods/Kontext-Top-Down-View
โœฆ Kontext-CAM-Left-View: prithivMLmods/Kontext-CAM-Left-View
โœฆ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
โœฆ Kontext-Unblur-Upscale: prithivMLmods/Kontext-Unblur-Upscale
โœฆ Kontext-0811-exp: prithivMLmods/Kontext-0811-exp

Photo-Mate Collection:
โœฆ Kontext CAM Angles: https://huggingface.co/collections/prithivMLmods/kontext-cam-angles
โœฆ i2i - Kontext (exp): https://huggingface.co/collections/prithivMLmods/i2i-kontext-exp
โœฆ LZO-1 (Lossless Zoom Operator): https://huggingface.co/collections/prithivMLmods/lzo-1-lossless-zoom-operator

Related-Apps:
โœฆ Photo-Mate [Version 1.0]: prithivMLmods/Photo-Mate-i2i
โœฆ Image Generation Apps [Collection]: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
@prithivMLmods