This instruct model responds like a Thinking model - prompt template includes <think> as well

#1
by CED6688 - opened

I tried replacing just the prompt template with the tokenizer_config from the base model and it made no difference, so I suspect that this is actually just a second FP8 quantization of the Thinking model. I didn't compare the weights to the other one, but it's definitely not the Instruct.

Despite starting replies with reasoning content followed by text, turning on a reasoning parser with vllm (--reasoning-parser qwen3) still returns everything as delta->content, not reasoning text, so this quant isn't even a working reasoning model. I suspect it's just broken and needs to be redone properly.

The safetensors files have the same SHA256 as Qwen/Qwen3-VL-32B-Thinking-FP8, so it looks like an accidental upload

The safetensors files have the same SHA256 as Qwen/Qwen3-VL-32B-Thinking-FP8, so it looks like an accidental upload

I also verified

fixed

littlebird13 changed discussion status to closed

Sign up or log in to comment