why the int8 and fp16 model size both are 31GB?

#1
by snomile - opened

original model size is 61GB which is BF16, so I am confused...

unable to make it work with both files ,shows following error:

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen3-omni'
llama_model_load_from_file_impl: failed to load model

f16 is 2Bytes size,q8 is 1Byte size, f16 must be somewhere 2 times bigger than q8

original model size is 61GB which is BF16, so I am confused...

Because there is no file for f16 here, these two files are completely identical

Yes, I provided the wrong file, I'm very sorry

Yes, I provided the wrong file, I'm very sorry

Are you uploading the correct quantized one soon? Thanks.

Sign up or log in to comment