zay25
/

final-quantized-w8a8

Text Generation

text-generation-inference

8-bit precision

compressed-tensors

Model card Files Files and versions

MNLP_M3_quantized_model

This model is a quantized version of the best-performing MCQA model from our CS-552 Modern NLP project (Milestone 3). It was optimized for efficient inference while maintaining strong accuracy on STEM multiple-choice question answering tasks.

Model Summary

Base model: hssawhney/Best-Performing-Model
Quantization type: Post-Training Quantization (PTQ)
Precision: W8A8
Method: SmoothQuant + GPTQ via LLMCompressor
Excluded layers: lm_head (to preserve logits quality)
Final model size: ~717 MB

Calibration Details

Calibration dataset: 512 samples randomly selected from zay25/MNLP_M3_quantized_dataset
The calibration set preserves the original format (STEM MCQA) and was selected to represent a broad distribution of question types.

Intended Use

This model is intended for:

STEM-focused multiple-choice question answering
Educational assistant systems
Low-resource inference environments (e.g., CPU, edge devices)

It is not intended for freeform generation or use outside the MCQA format.

License

This model inherits the license of the base model. Check the hssawhney/Best-Performing-Model repo for license terms.

Downloads last month: 9

Safetensors

Model size

0.6B params

Tensor type

F16

·

I8

·