MNLP_M3_quantized_model
This model is a quantized version of the best-performing MCQA model from our CS-552 Modern NLP project (Milestone 3). It was optimized for efficient inference while maintaining strong accuracy on STEM multiple-choice question answering tasks.
Model Summary
- Base model: hssawhney/Best-Performing-Model
- Quantization type: Post-Training Quantization (PTQ)
- Precision: W8A8
- Method: SmoothQuant + GPTQ via LLMCompressor
- Excluded layers:
lm_head(to preserve logits quality) - Final model size: ~717 MB
Calibration Details
- Calibration dataset: 512 samples randomly selected from
zay25/MNLP_M3_quantized_dataset - The calibration set preserves the original format (STEM MCQA) and was selected to represent a broad distribution of question types.
Intended Use
This model is intended for:
- STEM-focused multiple-choice question answering
- Educational assistant systems
- Low-resource inference environments (e.g., CPU, edge devices)
It is not intended for freeform generation or use outside the MCQA format.
License
This model inherits the license of the base model. Check the hssawhney/Best-Performing-Model repo for license terms.
- Downloads last month
- 9