Felladrin commited on
Commit
4ea3d66
·
verified ·
1 Parent(s): 6c6d16c

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-classification
4
+ library_name: transformers.js
5
+ tags:
6
+ - deep-fake
7
+ - ViT
8
+ - detection
9
+ - Image
10
+ - transformers-4.49.0.dev0
11
+ - precision-92.12
12
+ - v2
13
+ base_model:
14
+ - prithivMLmods/Deep-Fake-Detector-v2-Model
15
+ ---
16
+
17
+
18
+
19
+ # Deep-Fake-Detector-v2-Model (ONNX)
20
+
21
+
22
+ This is an ONNX version of [prithivMLmods/Deep-Fake-Detector-v2-Model](https://huggingface.co/prithivMLmods/Deep-Fake-Detector-v2-Model). It was automatically converted and uploaded using [this Hugging Face Space](https://huggingface.co/spaces/onnx-community/convert-to-onnx).
23
+
24
+
25
+ ## Usage with Transformers.js
26
+
27
+
28
+ See the pipeline documentation for `image-classification`: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.ImageClassificationPipeline
29
+
30
+
31
+ ---
32
+
33
+
34
+ ![fake q.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/PVkTbLOEBr-qNkTws3UsD.gif)
35
+
36
+ # **Deep-Fake-Detector-v2-Model**
37
+
38
+ # **Overview**
39
+
40
+ The **Deep-Fake-Detector-v2-Model** is a state-of-the-art deep learning model designed to detect deepfake images. It leverages the **Vision Transformer (ViT)** architecture, specifically the `google/vit-base-patch16-224-in21k` model, fine-tuned on a dataset of real and deepfake images. The model is trained to classify images as either "Realism" or "Deepfake" with high accuracy, making it a powerful tool for detecting manipulated media.
41
+
42
+ ```
43
+ Classification report:
44
+
45
+ precision recall f1-score support
46
+
47
+ Realism 0.9683 0.8708 0.9170 28001
48
+ Deepfake 0.8826 0.9715 0.9249 28000
49
+
50
+ accuracy 0.9212 56001
51
+ macro avg 0.9255 0.9212 0.9210 56001
52
+ weighted avg 0.9255 0.9212 0.9210 56001
53
+ ```
54
+
55
+ **Confusion Matrix**:
56
+ ```
57
+ [[True Positives, False Negatives],
58
+ [False Positives, True Negatives]]
59
+ ```
60
+
61
+ ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/VLX0QDcKkSLIJ9c5LX-wt.png)
62
+
63
+ **<span style="color:red;">Update :</span>** The previous model checkpoint was obtained using a smaller classification dataset. Although it performed well in evaluation scores, its real-time performance was average due to limited variations in the training set. The new update includes a larger dataset to improve the detection of fake images.
64
+
65
+ | Repository | Link |
66
+ |------------|------|
67
+ | Deep Fake Detector v2 Model | [GitHub Repository](https://github.com/PRITHIVSAKTHIUR/Deep-Fake-Detector-Model) |
68
+
69
+ # **Key Features**
70
+ - **Architecture**: Vision Transformer (ViT) - `google/vit-base-patch16-224-in21k`.
71
+ - **Input**: RGB images resized to 224x224 pixels.
72
+ - **Output**: Binary classification ("Realism" or "Deepfake").
73
+ - **Training Dataset**: A curated dataset of real and deepfake images.
74
+ - **Fine-Tuning**: The model is fine-tuned using Hugging Face's `Trainer` API with advanced data augmentation techniques.
75
+ - **Performance**: Achieves high accuracy and F1 score on validation and test datasets.
76
+
77
+ # **Model Architecture**
78
+ The model is based on the **Vision Transformer (ViT)**, which treats images as sequences of patches and applies a transformer encoder to learn spatial relationships. Key components include:
79
+ - **Patch Embedding**: Divides the input image into fixed-size patches (16x16 pixels).
80
+ - **Transformer Encoder**: Processes patch embeddings using multi-head self-attention mechanisms.
81
+ - **Classification Head**: A fully connected layer for binary classification.
82
+
83
+ # **Training Details**
84
+ - **Optimizer**: AdamW with a learning rate of `1e-6`.
85
+ - **Batch Size**: 32 for training, 8 for evaluation.
86
+ - **Epochs**: 2.
87
+ - **Data Augmentation**:
88
+ - Random rotation (±90 degrees).
89
+ - Random sharpness adjustment.
90
+ - Random resizing and cropping.
91
+ - **Loss Function**: Cross-Entropy Loss.
92
+ - **Evaluation Metrics**: Accuracy, F1 Score, and Confusion Matrix.
93
+
94
+ # **Inference with Hugging Face Pipeline**
95
+ ```python
96
+ from transformers import pipeline
97
+
98
+ # Load the model
99
+ pipe = pipeline('image-classification', model="prithivMLmods/Deep-Fake-Detector-v2-Model", device=0)
100
+
101
+ # Predict on an image
102
+ result = pipe("path_to_image.jpg")
103
+ print(result)
104
+ ```
105
+
106
+ # **Inference with PyTorch**
107
+ ```python
108
+ from transformers import ViTForImageClassification, ViTImageProcessor
109
+ from PIL import Image
110
+ import torch
111
+
112
+ # Load the model and processor
113
+ model = ViTForImageClassification.from_pretrained("prithivMLmods/Deep-Fake-Detector-v2-Model")
114
+ processor = ViTImageProcessor.from_pretrained("prithivMLmods/Deep-Fake-Detector-v2-Model")
115
+
116
+ # Load and preprocess the image
117
+ image = Image.open("path_to_image.jpg").convert("RGB")
118
+ inputs = processor(images=image, return_tensors="pt")
119
+
120
+ # Perform inference
121
+ with torch.no_grad():
122
+ outputs = model(**inputs)
123
+ logits = outputs.logits
124
+ predicted_class = torch.argmax(logits, dim=1).item()
125
+
126
+ # Map class index to label
127
+ label = model.config.id2label[predicted_class]
128
+ print(f"Predicted Label: {label}")
129
+ ```
130
+ # **Dataset**
131
+ The model is fine-tuned on the dataset, which contains:
132
+ - **Real Images**: Authentic images of human faces.
133
+ - **Fake Images**: Deepfake images generated using advanced AI techniques.
134
+
135
+ # **Limitations**
136
+ The model is trained on a specific dataset and may not generalize well to other deepfake datasets or domains.
137
+ - Performance may degrade on low-resolution or heavily compressed images.
138
+ - The model is designed for image classification and does not detect deepfake videos directly.
139
+
140
+ # **Ethical Considerations**
141
+
142
+ **Misuse**: This model should not be used for malicious purposes, such as creating or spreading deepfakes.
143
+ **Bias**: The model may inherit biases from the training dataset. Care should be taken to ensure fairness and inclusivity.
144
+ **Transparency**: Users should be informed when deepfake detection tools are used to analyze their content.
145
+
146
+ # **Future Work**
147
+ - Extend the model to detect deepfake videos.
148
+ - Improve generalization by training on larger and more diverse datasets.
149
+ - Incorporate explainability techniques to provide insights into model predictions.
150
+
151
+ # **Citation**
152
+
153
+ ```bibtex
154
+ @misc{Deep-Fake-Detector-v2-Model,
155
+ author = {prithivMLmods},
156
+ title = {Deep-Fake-Detector-v2-Model},
157
+ initial = {21 Mar 2024},
158
+ second_updated = {31 Jan 2025},
159
+ latest_updated = {02 Feb 2025}
160
+ }
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "_name_or_path": "prithivMLmods/Deep-Fake-Detector-v2-Model",
4
+ "architectures": [
5
+ "ViTForImageClassification"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.0,
8
+ "encoder_stride": 16,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.0,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "Realism",
14
+ "1": "Deepfake"
15
+ },
16
+ "image_size": 224,
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 3072,
19
+ "label2id": {
20
+ "Deepfake": 1,
21
+ "Realism": 0
22
+ },
23
+ "layer_norm_eps": 1e-12,
24
+ "model_type": "vit",
25
+ "num_attention_heads": 12,
26
+ "num_channels": 3,
27
+ "num_hidden_layers": 12,
28
+ "patch_size": 16,
29
+ "problem_type": "single_label_classification",
30
+ "qkv_bias": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.49.0"
33
+ }
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e0662a81744aca769e240fd1daa3f44225c50d0565fd8fb8f702d1f26609c91
3
+ size 343401688
onnx/model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cafa8f6543f2892d7d63263c5484077b5416bf7507b4e5eb7a42f7ee7637b442
3
+ size 51450010
onnx/model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07eee7514086ff66fdec3e3f55155e76cdb8e00a788c185e4c95d58040811191
3
+ size 171801382
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8238e449fc640c2f0959e36e8aa778bd006065c0a8af4c7f6f54c8be9c9d9e4
3
+ size 87333629
onnx/model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e239476e94466b225d490147c2f3c1ea64e969f0e09ae4f772e15a16d8e393a9
3
+ size 56757898
onnx/model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3ac808c1e632abe7d62160e67db63d10f6a952ebc0507cf3b6d5cd0de4a2594
3
+ size 49718585
onnx/model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3519c22b9695f99ddc00821228eeac91239065a90bfbdb4917858b3ec1dcfc42
3
+ size 87333629
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3519c22b9695f99ddc00821228eeac91239065a90bfbdb4917858b3ec1dcfc42
3
+ size 87333629
preprocessor_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": null,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.5,
8
+ 0.5,
9
+ 0.5
10
+ ],
11
+ "image_processor_type": "ViTFeatureExtractor",
12
+ "image_std": [
13
+ 0.5,
14
+ 0.5,
15
+ 0.5
16
+ ],
17
+ "resample": 2,
18
+ "rescale_factor": 0.00392156862745098,
19
+ "size": {
20
+ "height": 224,
21
+ "width": 224
22
+ }
23
+ }
quantize_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "modes": [
3
+ "fp16",
4
+ "q8",
5
+ "int8",
6
+ "uint8",
7
+ "q4",
8
+ "q4f16",
9
+ "bnb4"
10
+ ],
11
+ "per_channel": true,
12
+ "reduce_range": true,
13
+ "block_size": null,
14
+ "is_symmetric": true,
15
+ "accuracy_level": null,
16
+ "quant_type": 1,
17
+ "op_block_list": null
18
+ }