This instruct model responds like a Thinking model - prompt template includes <think> as well
I tried replacing just the prompt template with the tokenizer_config from the base model and it made no difference, so I suspect that this is actually just a second FP8 quantization of the Thinking model. I didn't compare the weights to the other one, but it's definitely not the Instruct.
Despite starting replies with reasoning content followed by text, turning on a reasoning parser with vllm (--reasoning-parser qwen3) still returns everything as delta->content, not reasoning text, so this quant isn't even a working reasoning model. I suspect it's just broken and needs to be redone properly.
The safetensors files have the same SHA256 as Qwen/Qwen3-VL-32B-Thinking-FP8, so it looks like an accidental upload
The safetensors files have the same SHA256 as Qwen/Qwen3-VL-32B-Thinking-FP8, so it looks like an accidental upload
I also verified
fixed