Matricardi Fabio's picture

Matricardi Fabio

FM-1976

·

https://medium.com/@fabio.matricardi

AI & ML interests

control system engineering, AI, LLM with python. ThePoorGPUguy on substack

Recent Activity

liked a model 4 days ago

LiquidAI/LFM2-2.6B-Exp

reacted to codelion's post with 🚀 4 days ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

liked a model 4 days ago

codelion/dhara-70m

View all activity

Organizations

None yet

FM-1976 's Spaces 9

Gemma3-1b-it GradioCHAT

Gradio Chatbot with Gemma 3 1B Instruct

TweetGeneration

Gradio and HF free tools - from articles to tweets

Gemma2 2B Reflection

OuteWorlderAI LiteMistral150M

Gemma2 2B Instruct ST

StableLM-Zepyhr-3B Playground

Starling7B PlayGround

MyFirstMiniChat

MyFAVModels