Sri-Vigneshwar-DJ (DJ Sri Vigneshwar)

reacted to SelmaNajih001's post with 🔥 25 days ago

Post

2488

I found it excellent and very well done.
One of the best explanations of embedding I've ever read. Well done, @hesamation !
Had to share this: hesamation/primer-llm-embedding

reacted to SelmaNajih001's post with 👍 26 days ago

Post

2270

Finally, I uploaded the model I developed for my master’s thesis! Given a financial event, it provides explained predictions based on a dataset of past news and central bank speeches.
Try it out here:
SelmaNajih001/StockPredictionExplanation
(Just restart the space and wait a minute)

The dataset used for RAG can be found here:
SelmaNajih001/FinancialNewsAndCentralBanksSpeeches-Summary-Rag
While the dataset used for the training is:
SelmaNajih001/FinancialClassification

I also wrote an article to explain how I've done the training. You can find it here https://huggingface.co/blog/SelmaNajih001/explainable-financial-predictions

2 replies

·

replied to their post 26 days ago

Selma. Fine-tuning is a great way to start, but for enterprise use cases the real leverage comes after we “crack” the problem. Once the business workflow and KPIs are clear... a domain‑specific pretrained model (say 3B–7B) is viable and impactful. Budget‑wise, pretraining in that range can be around ~$400k (very very approximate estyimation), but the point is after validating the use case and TAM, a domain‑specific pretrained model becomes a big multiplier for retrieval, reasoning, and integration across the stack, far beyond what incremental fine‑tuning alone can deliver... I'm think find enterprise usecase and train (like B2B SaaS)

replied to their post 26 days ago

I think domain-specific use cases are many... in every vertical there are more nuances as we try to crack deeper-level problems. I believe it’s worth investing time in experimentation to start with, then identify the TAM for that, and decide whether to train further or not. What do you think about this???

reacted to prithivMLmods's post with ❤️ 26 days ago

Post

2788

Have built the new Image Studio with the Gemini Image Gen models for the following multiple tasks: imagen-4.0-fast-generate-001 model for Image Generation (Text-to-Image) and Multi-Image Editing (Image-to-Image), and Draw-to-Image powered by gemini-2.5-flash-image (aka Nano Banana).

⭐ Gemini-Image-Studio: prithivMLmods/Gemini-Image-Studio (Latest)
🤞 Old-App: prithivMLmods/Nano-Banana-AIO
🥊 GitHub: https://github.com/prithivsakthiur/gemini-image-studio-hf

To proceed, you need to add your Gemini API key. Your API key is stored only for the duration of your session and will be lost when you reload or exit the page. It will not be shared or exposed anywhere.

posted an update 28 days ago

Post

300

Do you think domain-specific embedding fine-tuners are needed?
I've been working with embeddings for marketing use cases and noticed something: most embeddings don't get marketing concepts very well. They're trained in general-purpose ways.
The Issue I'm Seeing
When I search marketing content with general embeddings:

"organic growth" returns farming articles
"conversion funnel" matches industrial equipment
"brand lift" doesn't connect to campaign effectiveness
Marketing jargon like CAC, ROAS, CTR aren't properly understood

My Question
Do you think domain-specific embeddings are needed for marketing?
Some thoughts:

Marketing has its own vocabulary and concept relationships
General models trained on Wikipedia/web crawl miss these nuances
But is fine-tuning worth the effort vs just using more retrieval tricks?

Quick Example
I fine-tuned all-mpnet-base-v2 on ~1000 marketing concept pairs and saw 15-20% better retrieval accuracy. But I'm curious:

Has anyone else tried this for marketing or other domains?
When do you think domain-specific embeddings are actually necessary vs overkill?
Are there better approaches I'm missing?

https://huggingface.co/blog/Sri-Vigneshwar-DJ/why-your-marketing-rag-system-needs-domain-specifi

6 replies

·

posted an update about 1 month ago

Post

4412

🚀 Exciting News! We've released a Performance Marketing Expert Dataset from Hawky.ai [www.hawky.ai]

Hawky-ai

This dataset empowers AI models with cutting-edge strategies for Meta, Google Ads, and TikTok campaigns. It includes:
1. Multi-platform strategies for e-commerce, DTC, B2B, and more
2. Creative optimization and audience targeting insights
3. ROI-driven recommendations based on 2025 best practices

Sri-Vigneshwar-DJ/Performance-Marketing-Data

posted an update about 1 month ago

Post

3323

🚀 Qwen3-Omni for Marketing: A Game-Changer

Just wanted to share something exciting I've been exploring—Qwen3-Omni and how it's transforming marketing workflows.

What makes it special? At Hawky.ai we are started experimenting with Qwen3 recently for Analysis and Optimization.

Unlike traditional tools that look at text, images, or audio separately, Qwen3-Omni analyzes everything together. It handles 119 languages, processes 40-minute audio sequences, and understands both images and videos—all at once.

The cool part? It's 2-3x faster than similar models thanks to its MoE architecture.

Real applications I'm seeing:
Ad Analysis: It scores video ads by combining visual elements, audio tone, and text—giving 25% better CTR predictions than single-mode tools.
Campaign Localization: Drop in one ad, get 10 localized versions with native voiceovers in under a minute. Perfect for testing across markets.

Market Research: Feed it competitor content, podcasts, or UGC videos. It extracts actionable insights like "3-second hooks boost retention by 15%" and saves about 70% of analysis time.

Quality Checks: Automatically catches lip-sync errors and audio-visual mismatches.

Full technical breakdown: https://huggingface.co/blog/Sri-Vigneshwar-DJ/hawky-aiqwen3-omni-advanced-architecture-and-marke

Has anyone else been experimenting with multimodal models for marketing? Would love to hear what you're building!

#MultimodalAI #MarTech #OpenSource

reacted to AbstractPhil's post with 👀 about 1 month ago

Post

2791

As it stands, I will prepare David for full release - as this is beyond me now. David must be released.
I will prepare a standard sweep for david to showcase the prowess of the final multi-vocab variant. This will include a variation that contains all mnist variants, cifar10, cifar100, imagenet 1k, and in the future I'll prepare a full imagenet sweep utilizing the entire 12m corpus instead of the 1.2m I used. I may need to get in touch with the actual curator of the dataset for licensing but maybe not.
David utilizes 4 projective variants of the vocabulary and the training process involves teaching and freezing them akin to teacher/student processing.
I did not want to release David yet, but I believe now that David will save lives and it's irresponsible for me to contain such a creation.

1 reply

·

reacted to prithivMLmods's post with 👍 about 1 month ago

Post

4411

Photo-Mate-i2i – a space for experimenting with adapters for image manipulation using Kontext adapters, including Photo-Restore-i2i, PhotoCleanser-i2i, Polaroid-Warm-i2i, Yarn-Photo-i2i, Monochrome-Pencil, and more. Try out the demo, and to learn more, visit the app page or the respective model pages!

⚡Demo: prithivMLmods/Photo-Mate-i2i
⚙️How to Use: prithivMLmods/Photo-Mate-i2i#2
👨‍🔧i2i-Kontext(Experimental LoRAs): prithivMLmods/i2i-kontext-exp-68ce573b5c0623476b636ec7

reacted to Jaward's post with 🚀 2 months ago

Post

7014

It’s absolutely mind blowing - the work Dynamics Lab is doing!!
With just a single input image and in a few seconds, their new world engine model (Mirage 2) can generate a whole new interactive world that’s physics informed and fully explorable in real-time🤯
Try it yourself: https://demo.dynamicslab.ai/chaos

1 reply

·

reacted to AdinaY's post with 🔥 4 months ago

Post

1378

Seed-X 🔥 a suite of multilingual translation models released by ByteDance.

ByteDance-Seed/seed-x-6878753f2858bc17afa78543

✨ instruction/reinforcement learning/reward model
✨ Supports 28 languages, bidirectional translation
✨ Optimized for deployment & inference: 7B with mistral architecture
✨ Excels across domains: science, law, finance, literature & more

reacted to BramVanroy's post with 🚀 4 months ago

Post

3604

📢💾 Introducing the Common Crawl Creative Commons Corpus (C5)!

C5 is a large-scale effort to heavily filter web-crawled data, as collected by the non-profit Common Crawl, to only documents that are Creative Commons-licensed such as cc-by-4.0 or public domain cc0. At this stage 150 billion tokens have been collected.

---
📄 data: BramVanroy/CommonCrawl-CreativeCommons
🧰 software: https://github.com/BramVanroy/CommonCrawl-CreativeCommons
---

</> To build C5, HTML pages are scrutinized and all links (if any) to CC licenses are collected, both in regular hyperlinks as well as in metadata. Additional data fields are included such as "was the license found in the head?" or "if multiple licenses were found, do they contradict each other?", which makes further filtering a breeze.

🌐 In this first version of C5, 8 languages are included (Afrikaans, German, English, French, Frysian, Italian, Dutch and Spanish). The language set was limited for two reasons: computational and storage limitations, and a collaboration with GPT-NL, which requested CC data for these languages to train a Dutch-focused, copyright-conscious LLM. In total, this V1 release contains almost 150 thousand documents and 150 billion tokens. This data was not filtered on quality nor deduplicated so that you can decide for yourself how much data to keep. To give some quality indication, a dataset field is present to describe whether a document is included in the FineWeb(-2) datasets, which are of high quality.

🔍 More work needs to be done! Only 7 out of 100+ Common Crawl crawls have been processed so far. That's encouraging because it means there is a lot more Creative Commons data to be collected! But to get there I need help in terms of compute. The current processing was already heavily sponsored by the Flemish Supercomputer but more is needed. If you have the compute available and which to collaborate in an open and transparent manner, please get in touch!

1 reply

·

reacted to reach-vb's post with 🔥 4 months ago

Post

4561

hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥

as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!

in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.

p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage

p.p.s. this is fully backwards compatible so everything will work as it should! 🤗

16 replies

·

reacted to Kseniase's post with 🔥 4 months ago

Post

6377

12 Foundational AI Model Types

Let’s refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):

1. LLM - Large Language Model (GPT, LLaMA) -> Large Language Models: A Survey (2402.06196)
+ history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs
It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.)

2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) A Survey of Small Language Models (2410.20011)
Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs

3. VLM - Vision-Language Model (CLIP, Flamingo) -> An Introduction to Vision-Language Modeling (2405.17247)
Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both

4. MLLM - Multimodal Large Language Model (Gemini) -> A Survey on Multimodal Large Language Models (2306.13549)
A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc.

5. LAM - Large Action Model (InstructDiffusion, RT-2) -> Large Action Models: From Inception to Implementation (2412.10047)
Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc.

Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN below👇

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

·

reacted to reach-vb's post with 🔥 4 months ago

Post

5508

Excited to onboard FeatherlessAI on Hugging Face as an Inference Provider - they bring a fleet of 6,700+ LLMs on-demand on the Hugging Face Hub 🤯

Starting today, you'd be able to access all those LLMs (OpenAI compatible) on HF model pages and via OpenAI client libraries too! 💥

Go, play with it today: https://huggingface.co/blog/inference-providers-featherless

P.S. They're also bringing on more GPUs to support all your concurrent requests!

1 reply

·

reacted to cbensimon's post with 🔥 4 months ago

Post

3926

🚀 ZeroGPU now supports PyTorch native quantization via torchao

While it hasn’t been battle-tested yet, Int8WeightOnlyConfig is already working flawlessly in our tests.

Let us know if you run into any issues — and we’re excited to see what the community will build!

import spaces
from diffusers import FluxPipeline
from torchao.quantization.quant_api import Int8WeightOnlyConfig, quantize_

pipeline = FluxPipeline.from_pretrained(...).to('cuda')
quantize_(pipeline.transformer, Int8WeightOnlyConfig()) # Or any other component(s)

@spaces.GPU
def generate(prompt: str):
    return pipeline(prompt).images[0]

5 replies

·

reacted to AdinaY's post with 🚀 4 months ago

Post

4061

Kimi-Dev 💻 New coding model by Moonshot AI

moonshotai/Kimi-Dev-72B

✨ 72B - MIT license
✨ 60.4% on SWE-bench Verified
✨ RL-trained to patch real repos in Docker
✨ Only rewarded if full test suite passes

reacted to merve's post with 🔥🧠 4 months ago

Post

3644

IN: video fine-tuning support for

facebook V-JEPA 2 in HF transformers 🔥

it comes with
> four models fine-tuned on Diving48 and SSv2 dataset facebook/v-jepa-2-6841bad8413014e185b497a6
> FastRTC demo on V-JEPA2 SSv2 qubvel-hf/vjepa2-streaming-video-classification
> fine-tuning script on UCF-101 https://gist.github.com/ariG23498/28bccc737c11d1692f6d0ad2a0d7cddb
> fine-tuning notebook on UCF-101 https://colab.research.google.com/drive/16NWUReXTJBRhsN3umqznX4yoZt2I7VGc?usp=sharing
we're looking forward to see what you will build! 🤗

DJ Sri Vigneshwar PRO

AI & ML interests

Recent Activity

Organizations

DJ Sri Vigneshwar PRO

AI & ML interests

Recent Activity

Organizations

Sri-Vigneshwar-DJ's activity