Spaces:
Sleeping
Sleeping
metadata
title: ResNet-50 ImageNet-1k Classifier
emoji: πΌοΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
ResNet-50 ImageNet-1k Classifier
A state-of-the-art image classifier built with ResNet-50 architecture, trained on the ImageNet-1k dataset.
π― Model Overview
- Architecture: ResNet-50 with Bottleneck blocks [3, 4, 6, 3]
- Dataset: ImageNet-1k (1000 classes)
- Parameters: ~25.6M
- Input Size: 224x224 RGB images
- Target Accuracy: 78%+ (Top-1), 94%+ (Top-5)
π Training Features
This model was trained using modern optimization techniques:
- Progressive Resizing: 128β160β192β224px for better convergence
- Data Augmentation: CutMix and MixUp for improved generalization
- Label Smoothing: 0.1 to reduce overfitting
- Exponential Moving Average (EMA): For stable predictions
- Automatic Mixed Precision (AMP): Faster training with FP16
- PyTorch 2.0 Compilation: Optimized compute graphs
- FFCV DataLoader: High-performance data loading
π Performance
| Metric | Score |
|---|---|
| Top-1 Accuracy | 78%+ |
| Top-5 Accuracy | 94%+ |
| Training Time | ~90 min (8x A100) |
| Inference Time | ~5ms per image |
π οΈ Usage
Local Testing
# Install dependencies
pip install -r requirements.txt
# Test the model architecture
python test_model.py
# Run the Gradio app locally
python app.py
Training Your Own Model
Check out the training code: assignment_9
# Quick test with partial dataset
python main.py train --partial-dataset --partial-size 5000 --use-ffcv --epochs 5
# Full training for 78%+ accuracy
python main.py distributed --use-ffcv --batch-size 2048 --epochs 100 --progressive-resize --use-ema --compile
π Files
app.py- Main Gradio applicationimagenet_classes.json- ImageNet-1k class labels (downloaded from HuggingFace)requirements.txt- Python dependenciestest_model.py- Model architecture verificationbest_model.pt- Trained model checkpoint (add after training).gitignore- Git ignore rules
ποΈ Model Architecture
ResNet-50
βββ Conv1 (7x7, stride 2)
βββ MaxPool (3x3, stride 2)
βββ Layer 1: 3 Bottleneck blocks (64 channels)
βββ Layer 2: 4 Bottleneck blocks (128 channels)
βββ Layer 3: 6 Bottleneck blocks (256 channels)
βββ Layer 4: 3 Bottleneck blocks (512 channels)
βββ AdaptiveAvgPool
βββ FC (2048 β 1000 classes)
π Citation
Based on the original ResNet paper:
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
π License
MIT License
π Links
- Training Code: github.com/arghyaiitb/assignment_9
- HuggingFace Space: huggingface.co/spaces/arghyaiitb/resnet50-imagenet-1k
- ImageNet Dataset: image-net.org