File size: 3,677 Bytes
516e3b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6206851
 
516e3b5
 
 
 
 
 
 
 
6206851
 
516e3b5
 
 
 
 
 
 
 
6206851
 
516e3b5
 
 
 
 
 
 
 
6206851
 
516e3b5
 
 
 
 
 
 
 
6206851
 
516e3b5
 
 
 
 
 
 
 
6206851
 
516e3b5
 
 
 
 
 
 
 
6206851
 
516e3b5
 
 
 
 
 
 
 
6206851
 
516e3b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9131ef1
516e3b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5e1fff2
 
 
 
 
 
 
 
 
516e3b5
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
library_name: transformers
license: llama3.1
base_model: meta-llama/Llama-3.1-8B
language: en
datasets:
- Word2Li/MiddOptimized
tags:
- llama-factory
- full
pipeline_tag: text-generation
model-index:
- name: Llama3.1-8B-Middo-Alpaca-4o-mini
  results:
    - task:
        type: text-generation
      dataset:
        name: MMLU
        type: MMLU
      metrics:
        - name: weighted accuarcy
          type: weighted accuarcy
          value: 44.69
          verified: true
    - task:
        type: text-generation
      dataset:
        name: IFEval
        type: IFEval
      metrics:
        - name: overall accuarcy
          type: overall accuarcy
          value: 47.96
          verified: true
    - task:
        type: text-generation
      dataset:
        name: GSM8K
        type: GSM8K
      metrics:
        - name: accuarcy
          type: accuarcy
          value: 57.62
          verified: true
    - task:
        type: text-generation
      dataset:
        name: MATH
        type: MATH
      metrics:
        - name: accuarcy
          type: accuarcy
          value: 18.50
          verified: true
    - task:
        type: text-generation
      dataset:
        name: HumanEval
        type: HumanEval
      metrics:
        - name: humaneval_pass@1
          type: humaneval_pass@1
          value: 52.44
          verified: true
    - task:
        type: text-generation
      dataset:
        name: MBPP
        type: MBPP
      metrics:
        - name: score
          type: score
          value: 45.40
          verified: true
    - task:
        type: text-generation
      dataset:
        name: Hellaswag
        type: Hellaswag
      metrics:
        - name: accuarcy
          type: accuarcy
          value: 57.37
          verified: true
    - task:
        type: text-generation
      dataset:
        name: GPQA
        type: GPQA
      metrics:
        - name: accuarcy
          type: accuarcy
          value: 19.70
          verified: true
metrics:
- accuracy
---

# Llama3.1-8B-Middo-Alpaca-4o-mini

Paper: [Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning](https://arxiv.org/abs/2508.21589)

Code: https://github.com/Word2VecT/Middo

## Model description

This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on the [MiddOptimzed/llama_alpaca_4o_mini](https://huggingface.co/datasets/Word2Li/MiddOptimized/viewer/default/llama_alpaca_4o_mini) dataset.

## Training and evaluation data

### Training data

Middo optimized [Word2Li/Alpaca-4o-mini](https://huggingface.co/datasets/Word2Li/Alpaca-4o-mini) on [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B).

### Evaluation data

- General
  - MMLU
  - IFEval
- Math
  - GSM8K
  - MATH
- Code
  - HumanEval
  - MBPP
- Reasoning
  - Hellaswag
  - GPQA

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 1.0

### Training results

- epoch: 0.9964556962025316
- total_flos: 2.1359726465573192e + 18
- train_loss: 0.9420681825982846
- train_runtime: 3147.8466
- train_samples_per_second: 20.072
- train_steps_per_second: 0.078

### Framework versions

- Transformers 4.45.2
- Pytorch 2.5.1+cu121
- Datasets 2.21.0
- Tokenizers 0.20.1