2.0 bpw

#1
by malamen4 - opened

Thanks a lot for sharing this! Could you please check that 2.0 bpw is uploaded? The folder only has a readme. 3.0 bpw is there.

Owner

My bad, I got confused as to which one was currently uploading. 3.0 is done, 2.0 is uploading, 4.0 is queued.

MikeRoz changed discussion status to closed

Thanks again! I'm going to download 2.0 bpw and 4.0 bpw, then merge shared experts and self attn from 4 bpw like this https://huggingface.co/turboderp/GLM-4.5-Air-exl3/discussions/2#6892c19b360fb5b3b6fe6351

This was about 2.12 bpw for 4.5.

Owner

I'm currently running turboderp's measurement script to compare the 2.0bpw and 3.0bpw quants, which I should then be able to feed into their optimization script. I plan to do a write-up of the process in the model card and upload any resulting optimized quants that are worthwhile.

MikeRoz changed discussion status to open

I don't know if that script can see shared experts and self atten layers from 4 bpw. It was worth +2 bpw for these. If you want to try, you might try merging 2,3 then (2,3) with 4, or some combination, with the same target bitrate.

Thanks again! I'm going to download 2.0 bpw and 4.0 bpw, then merge shared experts and self attn from 4 bpw like this https://huggingface.co/turboderp/GLM-4.5-Air-exl3/discussions/2#6892c19b360fb5b3b6fe6351

This was about 2.12 bpw for 4.5.

did this work well? Could you upload it?

Sign up or log in to comment