DeepSeek-R1-256x21B-0528-BF16 GGUF?

#12
by Thireus - opened

Would you be able to share how you've obtained or produced DeepSeek-R1-256x21B-0528-BF16...gguf please?

Heya @Thireus sounds like you're making progress! I created this bf16 using the evshiron llama.cpp fork + triton-cpu method. It is somewhat discussed scattered around ik_llama.cpp, i have a very rough set of command buried inside a fold called Click here for how to make your own custom quants including repacking in the Custom Quants section of this github discussion called "option b" hah.. Sorry can't direct link to heaers in discussions for somereason html anchors don't work...

I did not yet add this in better detail to my quant cookers guide as I didn't want to overwhelm folks just getting started with smaller models.

If you get stuck holler at me, saood06 and I did our best to make it easier for folks with this triton-cpu compiling issue so hopefully you don't need his patch anymore.

Good luck!

Found it. Thank you so much!

My next plan was to compute an imatrix but then I read - "you might need like 1.5TB RAM to do this with bf16 model, but is easier to make q8_0_r8 quant first, and use that to generate the imatrix.dat with only ~715G RAM"

😭

My next plan was to compute an imatrix

haha yeah one of the most challenging steps in terms of hardware requirements is you need enough RAM+VRAM to run inferencing on a model equivalent to the original e.g.

  • bf16 safetensors -> bf16 GGUF
  • fp8 safetensors -> Q8_0 GGUF (though some might try bf16 good luck lol)

The goal is to create the imatrix using a model most similar to the original as possible.

Do you have a specific imatrix corpus / methodology in mind?

I've given my latest methodology and corpus in the quant cookers basic guide. If you send me a (secret) gist with your corpus text and the ik_llama.cpp llama-imatrix command to run I might be able to find time on a big remote rig I'm using to run it for you on the R1-0528 Q8_0 like I did. Then I can upload the ~1GB file somewhere for you to use. An imatrix corpus utf8 text file under 2 MiB in size would probably take like 4-6 hours depending on exact settings.

Not at the moment, I think I'll be relying on yours for the time being. I've read about ik and your research on the topic (comparing other imatrix), which was eye opening!

I've started uploading the BF16 GGUF version I've converted on https://huggingface.co/Thireus/DeepSeek-R1-0528-BF16-GGUF.

I will let you know of my progress. Thank you so much.

Sign up or log in to comment