Need4Speed

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

wenhuachΒ 
posted an update 1 day ago
view post
Post
127
πŸš€ AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!

After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang β€” bringing faster and more flexible deployment to your LLM workflows.

πŸ’‘ We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.

⭐ Star our repo and stay tuned for more exciting updates!
wenhuachΒ 
posted an update 13 days ago
wenhuachΒ 
posted an update about 2 months ago
wenhuachΒ 
posted an update 3 months ago
view post
Post
1938
πŸš€ AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!

We're excited to announce that AutoRound now supports:
βœ… GGUF format export – for seamless compatibility with popular inference engines.
βœ… Custom bit settings – tailor quantization to your needs for optimal performance.

Check out these newly released models:
πŸ”ΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
πŸ”ΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
πŸ”ΉIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

Stay tuned! An even more advanced algorithm for some configurations is coming soon.
wenhuachΒ 
posted an update 5 months ago
view post
Post
1907
AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.

Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
loubnabnlΒ 
posted an update 6 months ago
wenhuachΒ 
posted an update 6 months ago
wenhuachΒ 
posted an update 8 months ago
view post
Post
2537
Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
wenhuachΒ 
posted an update 8 months ago