vito95311/Qwen3-Omni-30B-A3B-Thinking-GGUF-INT8FP16 · why the int8 and fp16 model size both are 31GB?

why the int8 and fp16 model size both are 31GB?

by snomile - opened Sep 25

Discussion

snomile

Sep 25

original model size is 61GB which is BF16, so I am confused...

21world

Sep 25

unable to make it work with both files ,shows following error:

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen3-omni'
llama_model_load_from_file_impl: failed to load model

f16 is 2Bytes size,q8 is 1Byte size, f16 must be somewhere 2 times bigger than q8

Mabbs

Sep 26

original model size is 61GB which is BF16, so I am confused...

Because there is no file for f16 here, these two files are completely identical

vito95311

Owner about 1 month ago

Yes, I provided the wrong file, I'm very sorry

clevnumb

29 days ago

Yes, I provided the wrong file, I'm very sorry

Are you uploading the correct quantized one soon? Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment