s3nh PRO
s3nh
AI & ML interests
Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh
Recent Activity
liked
a model
11 days ago
mradermacher/EduHelp-8B-GGUF
liked
a model
11 days ago
mradermacher/EduHelp-8B-i1-GGUF
Organizations
reacted to
appvoid's
post with 👍
2 days ago
Post
4031
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
s3nh/EduHelp-8B
Glad to share my work, have a wonderful day!
Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
s3nh/EduHelp-8B
Glad to share my work, have a wonderful day!
reacted to
ZennyKenny's
post with 👍
16 days ago
posted
an
update
16 days ago
Post
455
Eduhelp with more empathy, based on model finetuned on
psychotheraputic preferences just landed on
Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
psychotheraputic preferences just landed on
Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
replied to
their
post
17 days ago
Thanks!
posted
an
update
18 days ago
Post
4031
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
s3nh/EduHelp-8B
Glad to share my work, have a wonderful day!
Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
s3nh/EduHelp-8B
Glad to share my work, have a wonderful day!
reacted to
Severian's
post with 👀
18 days ago
Post
300
New Technique to Deeply Poison AI on Images and Prove Creative Provenance
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
https://severian-poisonous-shield-for-images.static.hf.space
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
https://severian-poisonous-shield-for-images.static.hf.space
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.
reacted to
Severian's
post with 👍
23 days ago
Post
3154
MLX port of BDH (Baby Dragon Hatchling) is up!
I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.
Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.
I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map
Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.
Repo: https://github.com/severian42/BDH-MLX
HF model (coming soon): Severian/BDH-MLX
If you try it on your own data, feedback and PRs are welcome.
I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.
Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.
I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map
Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.
Repo: https://github.com/severian42/BDH-MLX
HF model (coming soon): Severian/BDH-MLX
If you try it on your own data, feedback and PRs are welcome.
reacted to
mitkox's
post with 🚀
23 days ago
Post
380
Hermes4 70B synthetic dataset generation on my desktop Z8 GPU rig:
307 tok/sec
1.1M tok/hour
The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.
307 tok/sec
1.1M tok/hour
The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.
reacted to
andywu-kby's
post with 👍
27 days ago
Post
2389
Hello everyone,
I hope you’re doing well.
We’re currently developing a chatbot that can analyze and forecast sales directly from Excel files. Do you think this would be useful?
Miragic-AI/Miragic-Sales-Pilot
Please share your feedback by 👍 or 👎 this post.
Best regards,
I hope you’re doing well.
We’re currently developing a chatbot that can analyze and forecast sales directly from Excel files. Do you think this would be useful?
Miragic-AI/Miragic-Sales-Pilot
Please share your feedback by 👍 or 👎 this post.
Best regards,
reacted to
Xenova's
post with 👍🔥
28 days ago
Post
4375
The next generation of AI-powered websites is going to be WILD! 🤯
In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.
To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by 🤗 Transformers.js: LiquidAI/LFM2-WebGPU
As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! 🚀
In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.
To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by 🤗 Transformers.js: LiquidAI/LFM2-WebGPU
As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! 🚀
reacted to
AdinaY's
post with 🔥
about 1 month ago
Post
1963
Qwen3Guard 🛡️ a series of safety moderation models built upon Qwen3
Qwen/qwen3guard-68d2729abbfae4716f3343a1
✨ 0.6B/4B/8B - Apache2.0
✨ Two variants: Gen & Steam
✨ Trained on a dataset of 1.19 million prompts
✨ Classifies content into Safe / Unsafe / Controversial
✨ Supports 119 languages & dialects
Qwen/qwen3guard-68d2729abbfae4716f3343a1
✨ 0.6B/4B/8B - Apache2.0
✨ Two variants: Gen & Steam
✨ Trained on a dataset of 1.19 million prompts
✨ Classifies content into Safe / Unsafe / Controversial
✨ Supports 119 languages & dialects
reacted to
vikhyatk's
post with 🔥
about 1 month ago
Post
4195
Just released a preview of Moondream 3!
moondream/moondream3-preview
This is a 9B parameter, 2B active MoE VLM with state of the art visual reasoning capabilities.
More details in the release blog post: https://moondream.ai/blog/moondream-3-preview
This is a 9B parameter, 2B active MoE VLM with state of the art visual reasoning capabilities.
More details in the release blog post: https://moondream.ai/blog/moondream-3-preview
reacted to
MikeDoes's
post with 🚀
3 months ago
Post
2019
🛡️ At Ai4Privacy, our goal is to empower researchers to build a safer AI ecosystem. Today, we're highlighting crucial research that does just that by exposing a new vulnerability.
The paper "Forget to Flourish" details a new model poisoning technique. It's a reminder that as we fine-tune LLMs, our anonymization and privacy strategies must evolve to counter increasingly sophisticated threats.
We're proud that the Ai4Privacy dataset was instrumental in this study. It served two key purposes:
Provided a Realistic Testbed: It gave the researchers access to a diverse set of synthetic and realistic PII samples in a safe, controlled environment.
Enabled Impactful Benchmarking: It allowed them to measure the actual effectiveness of their data extraction attack, proving it could compromise specific, high-value information.
This work reinforces our belief that progress in AI security is a community effort. By providing robust tools for benchmarking, we can collectively identify weaknesses and build stronger, more resilient systems. A huge congratulations to the authors on this important contribution.
🔗 Read the full paper: https://arxiv.org/html/2408.17354v1
#OpenSource #DataPrivacy #LLM #Anonymization #AIsecurity #HuggingFace #Ai4Privacy #World's largest open privacy masking dataset
The paper "Forget to Flourish" details a new model poisoning technique. It's a reminder that as we fine-tune LLMs, our anonymization and privacy strategies must evolve to counter increasingly sophisticated threats.
We're proud that the Ai4Privacy dataset was instrumental in this study. It served two key purposes:
Provided a Realistic Testbed: It gave the researchers access to a diverse set of synthetic and realistic PII samples in a safe, controlled environment.
Enabled Impactful Benchmarking: It allowed them to measure the actual effectiveness of their data extraction attack, proving it could compromise specific, high-value information.
This work reinforces our belief that progress in AI security is a community effort. By providing robust tools for benchmarking, we can collectively identify weaknesses and build stronger, more resilient systems. A huge congratulations to the authors on this important contribution.
🔗 Read the full paper: https://arxiv.org/html/2408.17354v1
#OpenSource #DataPrivacy #LLM #Anonymization #AIsecurity #HuggingFace #Ai4Privacy #World's largest open privacy masking dataset
reacted to
AtAndDev's
post with 🚀
3 months ago
Post
542
Qwen 3 Coder is a personal attack to k2, and I love it.
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...
Qwen ftw!
DECENTRALIZE DECENTRALIZE DECENTRALIZE
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...
Qwen ftw!
DECENTRALIZE DECENTRALIZE DECENTRALIZE
reacted to
KnutJaegersberg's
post with ❤️
6 months ago
Post
2758
The Intelligence Curse
The document warns of the "intelligence curse," a potential consequence of advanced AI (AGI) where powerful entities lose their incentive to invest in people as AI automates work[cite: 13, 297]. This could lead to job displacement, reduced social mobility, and a concentration of power and wealth based on AI ownership, similar to the "resource curse" in resource-rich states[cite: 17, 18, 31, 329, 353]. To counter this, the authors propose averting AI catastrophes to prevent centralization, diffusing AI widely to keep humans economically relevant, and democratizing institutions to remain anchored to human needs[cite: 22, 23, 25, 35, 36, 37, 566].
https://intelligence-curse.ai/intelligence-curse.pdf
The document warns of the "intelligence curse," a potential consequence of advanced AI (AGI) where powerful entities lose their incentive to invest in people as AI automates work[cite: 13, 297]. This could lead to job displacement, reduced social mobility, and a concentration of power and wealth based on AI ownership, similar to the "resource curse" in resource-rich states[cite: 17, 18, 31, 329, 353]. To counter this, the authors propose averting AI catastrophes to prevent centralization, diffusing AI widely to keep humans economically relevant, and democratizing institutions to remain anchored to human needs[cite: 22, 23, 25, 35, 36, 37, 566].
https://intelligence-curse.ai/intelligence-curse.pdf
reacted to
loubnabnl's
post with ❤️
6 months ago
Post
5134
SmolVLM is now available on PocketPal — you can run it offline on your smartphone to interpret the world around you. 🌍📱
And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai
And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai
reacted to
merve's
post with 👍🚀
6 months ago
Post
6686
A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to Hugging Face transformers 🔥
D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩
> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352
Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper 🎩
Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve 🥲☹️
D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate 🤩
Another core idea behind this model is Global Optimal Localization Self-Distillation ⤵️
this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.
D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩
> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352
Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper 🎩
Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve 🥲☹️
D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate 🤩
Another core idea behind this model is Global Optimal Localization Self-Distillation ⤵️
this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.