MNLP_M3_quantized_model

This model is a quantized version of the best-performing MCQA model from our CS-552 Modern NLP project (Milestone 3). It was optimized for efficient inference while maintaining strong accuracy on STEM multiple-choice question answering tasks.

Model Summary

  • Base model: hssawhney/Best-Performing-Model
  • Quantization type: Post-Training Quantization (PTQ)
  • Precision: W8A8
  • Method: SmoothQuant + GPTQ via LLMCompressor
  • Excluded layers: lm_head (to preserve logits quality)
  • Final model size: ~717 MB

Calibration Details

  • Calibration dataset: 512 samples randomly selected from zay25/MNLP_M3_quantized_dataset
  • The calibration set preserves the original format (STEM MCQA) and was selected to represent a broad distribution of question types.

Intended Use

This model is intended for:

  • STEM-focused multiple-choice question answering
  • Educational assistant systems
  • Low-resource inference environments (e.g., CPU, edge devices)

It is not intended for freeform generation or use outside the MCQA format.

License

This model inherits the license of the base model. Check the hssawhney/Best-Performing-Model repo for license terms.

Downloads last month
9
Safetensors
Model size
0.6B params
Tensor type
F16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support