@Severian on Hugging Face: "MLX port of BDH (Baby Dragon Hatchling) is up! I’ve ported the BDH (…"

Post

3150

MLX port of BDH (Baby Dragon Hatchling) is up!

I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.

Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.

I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map
Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.

Repo: https://github.com/severian42/BDH-MLX
HF model (coming soon): Severian/BDH-MLX

If you try it on your own data, feedback and PRs are welcome.

Join the conversation