FP8 please

#18

by aliquis-pe - opened Sep 14

Discussion

aliquis-pe

Sep 14

Any chances for FP8 variant, like you did for Qwen3-30B-A3B models? Will be much appreciated.

traphix

Sep 18

•

edited Sep 18

it's coming #25079

aliquis-pe

Sep 18

Super!

minethrower

Sep 18

Is there any receipt to produce correct FP8 quant for qwen3-next preserving MTP Draft model? llmcompressor? It looks like FP8 Qwen3-Next-80B-A3B produces no Accepted Tokens.

[metrics.py:96] SpecDecoding metrics: Mean acceptance length: 1.00, Accepted throughput: 0.00 tokens/s, Drafted throughput: 69.20 tokens/s, Accepted: 0 tokens, Drafted: 692 tokens, Per-position acceptance rate: 0.000, 0.000, Avg Draft acceptance rate: 0.0%
I use this model: https://huggingface.co/DevQuasar/Qwen.Qwen3-Next-80B-A3B-Instruct-FP8-Dynamic

Is it possible to support correct FP8 quantization of the Draft model for spec decoding? Any receipt for llmcompressor is highly apppreciated!

traphix

Sep 19

Currently qwen3-next fp8 support is not good, lots of bugs.

maybe we should wait for the official fp8 release.

traphix

Sep 22

official fp8 has come

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8

aliquis-pe

Sep 22

Unfortunately doesn't work with vLLM when offloading to CPU is enabled.
RuntimeError: Worker failed with error 'Cannot re-initialize the input batch when CPU weight offloading is enabled. See https://github.com/vllm-project/vllm/pull/18298 for more details.'
OK, will wait for llama.cpp to develop Qwen-Next support and then for quantized GGUFs to become available.
In any case, thanks to Qwen team for their models!

traphix

Sep 22

vllm v0.10.2 does not support qwen3 next fp8

maybe latest can run

aliquis-pe

Sep 22

I ran it with the latest (pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly) according to the recommendations in the model description. vllm v0.10.2 indeed fails earlier because of quantization.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment