view post Post 6622 Run DeepSeek-V3.1 locally on 170GB RAM with Dynamic 1-bit GGUFs!šGGUFs: unsloth/DeepSeek-V3.1-GGUFThe 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.Guide: https://docs.unsloth.ai/basics/deepseek-v3.1 See translation ā¤ļø 19 19 š„ 9 9 š 5 5 + Reply
view post Post 4431 You can now run Kimi K2 Thinking locally with our Dynamic 1-bit GGUFs: unsloth/Kimi-K2-Thinking-GGUFWe shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.We also collaborated with the Moonshot AI Kimi team on a system prompt fix! š„°Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally See translation ā¤ļø 10 10 š 9 9 š„ 6 6 š¤ 4 4 𤯠3 3 + Reply
view post Post 8548 Qwen3-Next can now be Run locally! (30GB RAM)Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUFThe models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.š Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-nextThinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF See translation š„ 37 37 ā¤ļø 11 11 š 7 7 š¤ 3 3 + Reply
huihui-ai/Huihui-Kimi-Linear-REAP-35B-A3B-Instruct-abliterated Text Generation ⢠35B ⢠Updated Nov 27, 2025 ⢠65 ⢠5
huihui-ai/Huihui-Orchestrator-8B-abliterated Text Generation ⢠8B ⢠Updated Nov 30, 2025 ⢠53 ⢠5
ArliAI/GLM-4.5-Air-Derestricted-W8A8-INT8 Text Generation ⢠107B ⢠Updated Nov 24, 2025 ⢠25 ⢠7