Nemotron models that have been converted and/or quantized to work well in vLLM
Michael Goin
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Recent Activity
updated
a model
15 days ago
RedHatAI/Llama-3.2-1B-FP8
new activity
28 days ago
kernels-community/vllm-flash-attn3:Support for B200s?
liked
a model
about 1 month ago
moondream/moondream3-preview