ufal
/

vit-historical-page

@@ -26,21 +26,20 @@ There are currently 2 version of the model available for download, both of them
 but different data annotations. The latest `v5.3` is considered to be default and can be found in the `main` branch
 of HF 😊 hub [^1] 🔗
-| Version | Base                             | Pages |   PDFs    | Description                                                                         |
-|--------:|----------------------------------|:-----:|:---------:|:------------------------------------------------------------------------------------|
-|  `v2.0` | `vit-base-patch16-224`           | 10073 | **3896**  | annotations with mistakes, more heterogenous data                                   |
-|  `v2.1` | `vit-base-patch16-224`           | 11940 | **5002**  | `main`: more diverse pages in each category, less annotation mistakes               |
-|  `v2.2` | `vit-base-patch16-224`           | 15855 | **5730**  | same data as `v2.1` + some restored pages from `v2.0`                               |
-|  `v3.2` | `vit-base-patch16-384`           | 15855 | **5730**  | same data as `v2.2`, but a bit larger model base with higher resolution             |
-|  `v5.2` | `vit-large-patch16-384`          | 15855 | **5730**  | same data as `v2.2`, but the largest model base with higher resolution              |
-|  `v1.2` | `efficientnetv2_s.in21k`         | 15855 | **5730**  | same data as `v2.2`, but the smallest model base (CNN)                              |
-|  `v4.2` | `efficientnetv2_l.in21k_ft_in1k` | 15855 | **5730**  | same data as `v2.2`, CNN base model smaller than the largest, may be more accurate  |
-|  `v2.3` | `vit-base-patch16-224`           | 38625 | **37328** | new data annotation phase data, more single-page documents used                     |
-|  `v3.3` | `vit-base-patch16-384`           | 38625 | **37328** | same data as `v2.3`, but a bit larger model base with higher resolution             |
-|  `v5.3` | `vit-large-patch16-384`          | 38625 | **37328** | same data as `v2.3`, but the largest model base with higher resolution              |
-|  `v1.3` | `efficientnetv2_m.in21k_ft_in1k` | 38625 | **37328** | same data as `v2.3`, but the smallest model base (CNN)                              |
-|  `v4.3` | `regnety_160.swag_ft_in1k`       | 38625 | **37328** | same data as `v2.3`, CNN base model bigger than the smallest,, may be more accurate |
-|  `v6.3` | `regnety_640.seer`               | 38625 | **37328** | same data as `v2.3`, CNN base model smaller than the largest, may be less accurate  |
 | **Version**                      | **Parameters (M)** | Resolution (px) | Revision |
@@ -53,6 +52,7 @@ of HF 😊 hub [^1] 🔗
 | `vit-large-patch16-384`          | 305                | 384             | v5.X     |
 | `regnety_640.seer`               | 281                | 384             | v6.3     |
 | Base Model                                 | Revision | max_cat | Best_Prec (%) | Best_Acc (%) | Fold | Note         |
 |--------------------------------------------|----------|---------|---------------|--------------|------|--------------|
 | **google/vit-base-patch16-224**            | **v2.3** | 14,000  | **98.79**     | **98.79**    | 5    | OK & Small   |
@@ -63,7 +63,7 @@ of HF 😊 hub [^1] 🔗
 | microsoft/dit-large                        | v11.3    | 14,000  | 98.53         | 98.53        | 2    |              |
 | timm/regnety_120.sw_in12k_ft_in1k          | v12.3    | 14,000  | 98.29         | 98.29        | 3    |              |
 | **timm/regnety_160.swag_ft_in1k**          | **v4.3** | 14,000  | **99.17**     | **99.16**    | 1    | Best & Small |
-| **timm/regnety_640.seer**                  | **v6.3** | 14,000  | **98.79**     | **98.79**    | 5    | OK & Large   |
 | timm/tf_efficientnetv2_l.in21k_ft_in1k     | v8.3     | 14,000  | 98.62         | 98.62        | 5    |              |
 | **timm/tf_efficientnetv2_m.in21k_ft_in1k** | **v1.3** | 14,000  | **98.83**     | **98.83**    | 1    | Good & Small |
 | timm/tf_efficientnetv2_s.in21k             | v7.3     | 14,000  | 97.90         | 97.87        | 1    |              |
@@ -182,13 +182,14 @@ During training the following transforms were applied randomly with a 50% chance
 | `v3.2`       | 96.49     | 99.94     |
 | `v4.2`       | 97.73     | 99.87     |
 | `v5.2`       | 97.86     | 99.87     |
-| `v1.3`       | 98.83     | 99.96     |
 | `v2.3`       | 98.79     | 99.96     |
 | `v3.3`       | 98.92     | 99.98     |
-| `v4.3`       | **99.16** | 99.96     |
 | `v5.3`       | **99.12** | 99.94     |
 | `v6.3`       | 98.79     | 99.94     |
 **v2.2** Evaluation set's accuracy (**Top-1**):  **97.54%**
 ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/20250701-1136_model_v220105p_conf_mat_TOP-1.png?raw=true)
@@ -211,27 +212,27 @@ During training the following transforms were applied randomly with a 50% chance
 **v1.3** Evaluation set's accuracy (**Top-1**):  **98.83%**
-![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/model_v13_conf_mat_TOP-1.png?raw=true)
 **v2.3** Evaluation set's accuracy (**Top-1**):  **98.79%**
-![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/model_v23_conf_mat_TOP-1.png?raw=true)
 **v3.3** Evaluation set's accuracy (**Top-1**):  **98.92%**
-![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/model_v33_conf_mat_TOP-1.png?raw=true)
 **v4.3** Evaluation set's accuracy (**Top-1**):  **98.16%**
-![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/model_v43_conf_mat_TOP-1.png?raw=true)
 **v5.3** Evaluation set's accuracy (**Top-1**):  **99.12%**
-![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/model_v53_conf_mat_TOP-1.png?raw=true)
 **v6.3** Evaluation set's accuracy (**Top-1**):  **98.79%**
-![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/model_v63_conf_mat_TOP-1.png?raw=true)
@@ -257,17 +258,17 @@ During training the following transforms were applied randomly with a 50% chance
 - **v4.2** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20250710-1921_model_v120106l_TOP-3_EVAL.csv) 🔗
-- **v1.3** Manually ✍  **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251019-1328_5449_model_v13_TOP-1_EVAL.csv) 📎
-- **v2.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1136_5449_model_v23_TOP-1_EVAL.csv) 📎
-- **v3.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-0827_5449_model_v33_TOP-1_EVAL.csv) 📎
-- **v4.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1135_5449_model_v43_TOP-1_EVAL.csv) 📎
-- **v5.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251019-1411_5449_model_v53_TOP-1_EVAL.csv) 📎
-- **v6.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1123_5449_model_v63_TOP-1_EVAL.csv) 📎
 #### Table columns

 but different data annotations. The latest `v5.3` is considered to be default and can be found in the `main` branch
 of HF 😊 hub [^1] 🔗
+| Version | Base                             | Pages |   PDFs    | Description                                                                        |
+|--------:|----------------------------------|:-----:|:---------:|:-----------------------------------------------------------------------------------|
+|  `v2.0` | `vit-base-patch16-224`           | 10073 | **3896**  | annotations with mistakes, more heterogenous data                                  |
+|  `v2.1` | `vit-base-patch16-224`           | 11940 | **5002**  | `main`: more diverse pages in each category, less annotation mistakes              |
+|  `v2.2` | `vit-base-patch16-224`           | 15855 | **5730**  | same data as `v2.1` + some restored pages from `v2.0`                              |
+|  `v3.2` | `vit-base-patch16-384`           | 15855 | **5730**  | same data as `v2.2`, but a bit larger model base with higher resolution            |
+|  `v5.2` | `vit-large-patch16-384`          | 15855 | **5730**  | same data as `v2.2`, but the largest model base with higher resolution             |
+|  `v1.2` | `efficientnetv2_s.in21k`         | 15855 | **5730**  | same data as `v2.2`, but the smallest model base (CNN)                             |
+|  `v4.2` | `efficientnetv2_l.in21k_ft_in1k` | 15855 | **5730**  | same data as `v2.2`, CNN base model smaller than the largest, may be more accurate |
+|  `v2.3` | `vit-base-patch16-224`           | 38625 | **37328** | new data annotation phase data, more single-page documents used, transformer model |
+|  `v3.3` | `vit-base-patch16-384`           | 38625 | **37328** | same data as `v2.3`, but a bit larger model base with higher resolution            |
+|  `v5.3` | `vit-large-patch16-384`          | 38625 | **37328** | same data as `v2.3`, but the largest model base with higher resolution             |
+|  `v1.3` | `efficientnetv2_m.in21k_ft_in1k` | 38625 | **37328** | same data as `v2.3`, but the smallest model base (CNN)                             |
+|  `v4.3` | `regnety_160.swag_ft_in1k`       | 38625 | **37328** | same data as `v2.3`, CNN base model bigger than the smallest, may be more accurate |
 | **Version**                      | **Parameters (M)** | Resolution (px) | Revision |
 | `vit-large-patch16-384`          | 305                | 384             | v5.X     |
 | `regnety_640.seer`               | 281                | 384             | v6.3     |
 | Base Model                                 | Revision | max_cat | Best_Prec (%) | Best_Acc (%) | Fold | Note         |
 |--------------------------------------------|----------|---------|---------------|--------------|------|--------------|
 | **google/vit-base-patch16-224**            | **v2.3** | 14,000  | **98.79**     | **98.79**    | 5    | OK & Small   |
 | microsoft/dit-large                        | v11.3    | 14,000  | 98.53         | 98.53        | 2    |              |
 | timm/regnety_120.sw_in12k_ft_in1k          | v12.3    | 14,000  | 98.29         | 98.29        | 3    |              |
 | **timm/regnety_160.swag_ft_in1k**          | **v4.3** | 14,000  | **99.17**     | **99.16**    | 1    | Best & Small |
+| timm/regnety_640.see                       | v6.3     | 14,000  | 98.79         | 98.79        | 5    | OK & Large   |
 | timm/tf_efficientnetv2_l.in21k_ft_in1k     | v8.3     | 14,000  | 98.62         | 98.62        | 5    |              |
 | **timm/tf_efficientnetv2_m.in21k_ft_in1k** | **v1.3** | 14,000  | **98.83**     | **98.83**    | 1    | Good & Small |
 | timm/tf_efficientnetv2_s.in21k             | v7.3     | 14,000  | 97.90         | 97.87        | 1    |              |
 | `v3.2`       | 96.49     | 99.94     |
 | `v4.2`       | 97.73     | 99.87     |
 | `v5.2`       | 97.86     | 99.87     |
+| `v1.3`       | 96.81     | 99.78     |
 | `v2.3`       | 98.79     | 99.96     |
 | `v3.3`       | 98.92     | 99.98     |
+| `v4.3`       | 98.92     | **100.0** |
 | `v5.3`       | **99.12** | 99.94     |
 | `v6.3`       | 98.79     | 99.94     |
 **v2.2** Evaluation set's accuracy (**Top-1**):  **97.54%**
 ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/20250701-1136_model_v220105p_conf_mat_TOP-1.png?raw=true)
 **v1.3** Evaluation set's accuracy (**Top-1**):  **98.83%**
+![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/20251020-1835_model_v13_conf_mat_TOP-1.png?raw=true)
 **v2.3** Evaluation set's accuracy (**Top-1**):  **98.79%**
+![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/20251020-1841_model_v23_conf_mat_TOP-1.png?raw=true)
 **v3.3** Evaluation set's accuracy (**Top-1**):  **98.92%**
+![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/20251020-1849_model_v33_conf_mat_TOP-1.png?raw=true)
 **v4.3** Evaluation set's accuracy (**Top-1**):  **98.16%**
+![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/20251020-1856_model_v43_conf_mat_TOP-1.png?raw=true)
 **v5.3** Evaluation set's accuracy (**Top-1**):  **99.12%**
+![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/20251020-1905_model_v53_conf_mat_TOP-1.png?raw=true)
 **v6.3** Evaluation set's accuracy (**Top-1**):  **98.79%**
+![TOP-1 confusion matrix](https://github.com/ufal/atrium-page-classification/blob/vit/result/plots/20251020-1913_model_v63_conf_mat_TOP-1.png?raw=true)
 - **v4.2** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20250710-1921_model_v120106l_TOP-3_EVAL.csv) 🔗
+- **v1.3** Manually ✍  **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1825_5449_model_v13_TOP-1_EVAL.csv) 📎
+- **v2.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1835_5449_model_v23_TOP-1_EVAL.csv) 📎
+- **v3.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1841_5449_model_v33_TOP-1_EVAL.csv) 📎
+- **v4.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1849_5449_model_v43_TOP-1_EVAL.csv) 📎
+- **v5.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1856_5449_model_v53_TOP-1_EVAL.csv) 📎
+- **v6.3** Manually ✍ **checked** evaluation dataset (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/vit/result/tables/20251020-1906_5449_model_v63_TOP-1_EVAL.csv) 📎
 #### Table columns