DeepSeek-R1-256x21B-0528-BF16 GGUF?

#12

by Thireus - opened Jun 12

Discussion

Thireus

Jun 12

Would you be able to share how you've obtained or produced DeepSeek-R1-256x21B-0528-BF16...gguf please?

ubergarm

Owner Jun 12

Heya @Thireus sounds like you're making progress! I created this bf16 using the evshiron llama.cpp fork + triton-cpu method. It is somewhat discussed scattered around ik_llama.cpp, i have a very rough set of command buried inside a fold called Click here for how to make your own custom quants including repacking in the Custom Quants section of this github discussion called "option b" hah.. Sorry can't direct link to heaers in discussions for somereason html anchors don't work...

I did not yet add this in better detail to my quant cookers guide as I didn't want to overwhelm folks just getting started with smaller models.

If you get stuck holler at me, saood06 and I did our best to make it easier for folks with this triton-cpu compiling issue so hopefully you don't need his patch anymore.

Good luck!

Thireus

Jun 12

Found it. Thank you so much!

Thireus

Jun 12

My next plan was to compute an imatrix but then I read - "you might need like 1.5TB RAM to do this with bf16 model, but is easier to make q8_0_r8 quant first, and use that to generate the imatrix.dat with only ~715G RAM"

😭

ubergarm

Owner Jun 13

•

edited Jun 13

My next plan was to compute an imatrix

haha yeah one of the most challenging steps in terms of hardware requirements is you need enough RAM+VRAM to run inferencing on a model equivalent to the original e.g.

bf16 safetensors -> bf16 GGUF
fp8 safetensors -> Q8_0 GGUF (though some might try bf16 good luck lol)

The goal is to create the imatrix using a model most similar to the original as possible.

Do you have a specific imatrix corpus / methodology in mind?

I've given my latest methodology and corpus in the quant cookers basic guide. If you send me a (secret) gist with your corpus text and the ik_llama.cpp llama-imatrix command to run I might be able to find time on a big remote rig I'm using to run it for you on the R1-0528 Q8_0 like I did. Then I can upload the ~1GB file somewhere for you to use. An imatrix corpus utf8 text file under 2 MiB in size would probably take like 4-6 hours depending on exact settings.

Thireus

Jun 13

Not at the moment, I think I'll be relying on yours for the time being. I've read about ik and your research on the topic (comparing other imatrix), which was eye opening!

I've started uploading the BF16 GGUF version I've converted on https://huggingface.co/Thireus/DeepSeek-R1-0528-BF16-GGUF.

I will let you know of my progress. Thank you so much.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment