Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
0.7
TFLOPS
11
37
414
Matricardi Fabio
FM-1976
Follow
Kaytheist's profile picture
ltim's profile picture
Mi6paulino's profile picture
18 followers
·
99 following
https://medium.com/@fabio.matricardi
ThePoorGpuGuy
fabiomatricardi
AI & ML interests
control system engineering, AI, LLM with python. ThePoorGPUguy on substack
Recent Activity
liked
a model
4 days ago
LiquidAI/LFM2-2.6B-Exp
reacted
to
codelion
's
post
with 🚀
4 days ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
liked
a model
4 days ago
codelion/dhara-70m
View all activity
Organizations
None yet
FM-1976
's Spaces
9
Sort: Recently updated
Runtime error
Gemma3-1b-it GradioCHAT
🦀
Gradio Chatbot with Gemma 3 1B Instruct
Sleeping
1
TweetGeneration
👁
Gradio and HF free tools - from articles to tweets
Build error
Gemma2 2B Reflection
📉
Sleeping
OuteWorlderAI LiteMistral150M
📈
Build error
Gemma2 2B Instruct ST
🐢
Runtime error
5
StableLM-Zepyhr-3B Playground
🌍
Runtime error
2
Starling7B PlayGround
🦀
Sleeping
1
MyFirstMiniChat
🏃
No application file
MyFAVModels
🏃