derek-thomas
		
	commited on
		
		
					Commit 
							
							·
						
						5b52ebb
	
1
								Parent(s):
							
							dcfeee1
								
Updating for falcon
Browse files- 01-poe-dataset-creation.ipynb +1 -3
 - 02-autotrain.ipynb +171 -35
 
    	
        01-poe-dataset-creation.ipynb
    CHANGED
    
    | 
         @@ -110,10 +110,8 @@ 
     | 
|
| 110 | 
         
             
                "\n",
         
     | 
| 111 | 
         
             
                "In each of these scenarios I will build prompts with strucutred generation to fine-tune with. I noticed some difficulty in a first pass with getting consistent response formats, but thats out of scope, so structured generation can help a lot here.\n",
         
     | 
| 112 | 
         
             
                "\n",
         
     | 
| 113 | 
         
            -
                "Datasets wont store complex structures like lists of dicts of different types (needed for structured generation, so its easiest if I tokenize. Ill be using Mistral, so Ill skip the system prompt. Its simple enough to come back and change this for a different model in this notebook.\n",
         
     | 
| 114 | 
         
            -
                "\n",
         
     | 
| 115 | 
         
             
                "## Implementation\n",
         
     | 
| 116 | 
         
            -
                "To explore this goal, we will start with [layoric/labeled-multiple-choice-explained](https://huggingface.co/datasets/layoric/labeled-multiple-choice-explained) as our dataset. It has explanations already provided by GPT-3.5-turbo. Given that these explanations are a bit different than what  
     | 
| 117 | 
         
             
                "\n",
         
     | 
| 118 | 
         
             
                "In this notebook we will format our data such that we can try each experiment and then we will push it to my repo: [derek-thomas/labeled-multiple-choice-explained](https://huggingface.co/datasets/derek-thomas/labeled-multiple-choice-explained)."
         
     | 
| 119 | 
         
             
               ]
         
     | 
| 
         | 
|
| 110 | 
         
             
                "\n",
         
     | 
| 111 | 
         
             
                "In each of these scenarios I will build prompts with strucutred generation to fine-tune with. I noticed some difficulty in a first pass with getting consistent response formats, but thats out of scope, so structured generation can help a lot here.\n",
         
     | 
| 112 | 
         
             
                "\n",
         
     | 
| 
         | 
|
| 
         | 
|
| 113 | 
         
             
                "## Implementation\n",
         
     | 
| 114 | 
         
            +
                "To explore this goal, we will start with [layoric/labeled-multiple-choice-explained](https://huggingface.co/datasets/layoric/labeled-multiple-choice-explained) as our dataset. It has explanations already provided by GPT-3.5-turbo. Given that these explanations are a bit different than what falcon would do, it might be useful if we generate some from falcon as well. Based on [this notebook](./poe-generate-falcon-reasoning.ipynb) we have been able to generate falcon reasoning in this refined dataset [derek-thomas/labeled-multiple-choice-explained-falcon-reasoning](https://huggingface.co/datasets/derek-thomas/labeled-multiple-choice-explained-falcon-reasoning).\n",
         
     | 
| 115 | 
         
             
                "\n",
         
     | 
| 116 | 
         
             
                "In this notebook we will format our data such that we can try each experiment and then we will push it to my repo: [derek-thomas/labeled-multiple-choice-explained](https://huggingface.co/datasets/derek-thomas/labeled-multiple-choice-explained)."
         
     | 
| 117 | 
         
             
               ]
         
     | 
    	
        02-autotrain.ipynb
    CHANGED
    
    | 
         @@ -50,7 +50,7 @@ 
     | 
|
| 50 | 
         
             
                {
         
     | 
| 51 | 
         
             
                 "data": {
         
     | 
| 52 | 
         
             
                  "application/vnd.jupyter.widget-view+json": {
         
     | 
| 53 | 
         
            -
                   "model_id": " 
     | 
| 54 | 
         
             
                   "version_major": 2,
         
     | 
| 55 | 
         
             
                   "version_minor": 0
         
     | 
| 56 | 
         
             
                  },
         
     | 
| 
         @@ -63,7 +63,7 @@ 
     | 
|
| 63 | 
         
             
                }
         
     | 
| 64 | 
         
             
               ],
         
     | 
| 65 | 
         
             
               "source": [
         
     | 
| 66 | 
         
            -
                "from huggingface_hub import login, get_token\n",
         
     | 
| 67 | 
         
             
                "login()"
         
     | 
| 68 | 
         
             
               ]
         
     | 
| 69 | 
         
             
              },
         
     | 
| 
         @@ -96,15 +96,15 @@ 
     | 
|
| 96 | 
         
             
                "# Base config\n",
         
     | 
| 97 | 
         
             
                "config_template = {\n",
         
     | 
| 98 | 
         
             
                "    \"task\": \"llm-sft\",\n",
         
     | 
| 99 | 
         
            -
                "    \"base_model\": \" 
     | 
| 100 | 
         
             
                "    \"project_name\": \"\",\n",
         
     | 
| 101 | 
         
             
                "    \"log\": \"tensorboard\",\n",
         
     | 
| 102 | 
         
             
                "    \"backend\": \"spaces-l4x1\",\n",
         
     | 
| 103 | 
         
             
                "    \"data\": {\n",
         
     | 
| 104 | 
         
            -
                "        \"path\": \"derek-thomas/labeled-multiple-choice-explained- 
     | 
| 105 | 
         
             
                "        \"train_split\": \"train\",\n",
         
     | 
| 106 | 
         
             
                "        \"valid_split\": None,\n",
         
     | 
| 107 | 
         
            -
                "        \"chat_template\": \" 
     | 
| 108 | 
         
             
                "        \"column_mapping\": {\n",
         
     | 
| 109 | 
         
             
                "            \"text_column\": \"\"\n",
         
     | 
| 110 | 
         
             
                "            },\n",
         
     | 
| 
         @@ -112,9 +112,9 @@ 
     | 
|
| 112 | 
         
             
                "    \"params\": {\n",
         
     | 
| 113 | 
         
             
                "        \"block_size\": 512,\n",
         
     | 
| 114 | 
         
             
                "        \"model_max_length\": 1500,\n",
         
     | 
| 115 | 
         
            -
                "        \"epochs\":  
     | 
| 116 | 
         
             
                "        \"batch_size\": 1,\n",
         
     | 
| 117 | 
         
            -
                "        \"lr\": 3e- 
     | 
| 118 | 
         
             
                "        \"peft\": True,\n",
         
     | 
| 119 | 
         
             
                "        \"quantization\": \"int4\",\n",
         
     | 
| 120 | 
         
             
                "        \"target_modules\": \"all-linear\",\n",
         
     | 
| 
         @@ -191,40 +191,67 @@ 
     | 
|
| 191 | 
         
             
                 "output_type": "stream",
         
     | 
| 192 | 
         
             
                 "text": [
         
     | 
| 193 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_RFA_gpt3_5.yml\n",
         
     | 
| 194 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 195 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 196 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 197 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 198 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 199 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_RFA_falcon.yml\n",
         
     | 
| 200 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 201 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 202 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 203 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 204 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 205 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_FAR_gpt3_5.yml\n",
         
     | 
| 206 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 207 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 208 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 209 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 210 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 211 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_FAR_falcon.yml\n",
         
     | 
| 212 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 213 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 214 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 215 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 216 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 217 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_FA.yml\n",
         
     | 
| 218 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 219 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 220 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 221 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 222 | 
         
            -
                  "INFO     | 2025-01-08  
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 223 | 
         
             
                 ]
         
     | 
| 224 | 
         
             
                }
         
     | 
| 225 | 
         
             
               ],
         
     | 
| 226 | 
         
             
               "source": [
         
     | 
| 227 | 
         
             
                "# Generate configs and run commands\n",
         
     | 
| 
         | 
|
| 
         | 
|
| 228 | 
         
             
                "for project_suffix, text_column in zip(project_suffixes, text_columns):\n",
         
     | 
| 229 | 
         
             
                "    # Modify the config\n",
         
     | 
| 230 | 
         
             
                "    config = config_template.copy()\n",
         
     | 
| 
         @@ -236,15 +263,124 @@ 
     | 
|
| 236 | 
         
             
                "    with open(config_path, \"w\") as f:\n",
         
     | 
| 237 | 
         
             
                "        yaml.dump(config, f)\n",
         
     | 
| 238 | 
         
             
                "\n",
         
     | 
| 239 | 
         
            -
                "    # Run the command\n",
         
     | 
| 240 | 
         
             
                "    print(f\"Running autotrain with config: {config_path}\")\n",
         
     | 
| 241 | 
         
            -
                "    subprocess.run([\"autotrain\", \"--config\", config_path])"
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 242 | 
         
             
               ]
         
     | 
| 243 | 
         
             
              },
         
     | 
| 244 | 
         
             
              {
         
     | 
| 245 | 
         
             
               "cell_type": "code",
         
     | 
| 246 | 
         
             
               "execution_count": null,
         
     | 
| 247 | 
         
            -
               "id": " 
     | 
| 248 | 
         
             
               "metadata": {},
         
     | 
| 249 | 
         
             
               "outputs": [],
         
     | 
| 250 | 
         
             
               "source": []
         
     | 
| 
         | 
|
| 50 | 
         
             
                {
         
     | 
| 51 | 
         
             
                 "data": {
         
     | 
| 52 | 
         
             
                  "application/vnd.jupyter.widget-view+json": {
         
     | 
| 53 | 
         
            +
                   "model_id": "928f44f483504b438e0fdbd4df3d7dd5",
         
     | 
| 54 | 
         
             
                   "version_major": 2,
         
     | 
| 55 | 
         
             
                   "version_minor": 0
         
     | 
| 56 | 
         
             
                  },
         
     | 
| 
         | 
|
| 63 | 
         
             
                }
         
     | 
| 64 | 
         
             
               ],
         
     | 
| 65 | 
         
             
               "source": [
         
     | 
| 66 | 
         
            +
                "from huggingface_hub import login, get_token, whoami\n",
         
     | 
| 67 | 
         
             
                "login()"
         
     | 
| 68 | 
         
             
               ]
         
     | 
| 69 | 
         
             
              },
         
     | 
| 
         | 
|
| 96 | 
         
             
                "# Base config\n",
         
     | 
| 97 | 
         
             
                "config_template = {\n",
         
     | 
| 98 | 
         
             
                "    \"task\": \"llm-sft\",\n",
         
     | 
| 99 | 
         
            +
                "    \"base_model\": \"tiiuae/Falcon3-7B-Instruct\",\n",
         
     | 
| 100 | 
         
             
                "    \"project_name\": \"\",\n",
         
     | 
| 101 | 
         
             
                "    \"log\": \"tensorboard\",\n",
         
     | 
| 102 | 
         
             
                "    \"backend\": \"spaces-l4x1\",\n",
         
     | 
| 103 | 
         
             
                "    \"data\": {\n",
         
     | 
| 104 | 
         
            +
                "        \"path\": \"derek-thomas/labeled-multiple-choice-explained-falcon-tokenized\",\n",
         
     | 
| 105 | 
         
             
                "        \"train_split\": \"train\",\n",
         
     | 
| 106 | 
         
             
                "        \"valid_split\": None,\n",
         
     | 
| 107 | 
         
            +
                "        \"chat_template\": \"tokenizer\",\n",
         
     | 
| 108 | 
         
             
                "        \"column_mapping\": {\n",
         
     | 
| 109 | 
         
             
                "            \"text_column\": \"\"\n",
         
     | 
| 110 | 
         
             
                "            },\n",
         
     | 
| 
         | 
|
| 112 | 
         
             
                "    \"params\": {\n",
         
     | 
| 113 | 
         
             
                "        \"block_size\": 512,\n",
         
     | 
| 114 | 
         
             
                "        \"model_max_length\": 1500,\n",
         
     | 
| 115 | 
         
            +
                "        \"epochs\": 4,\n",
         
     | 
| 116 | 
         
             
                "        \"batch_size\": 1,\n",
         
     | 
| 117 | 
         
            +
                "        \"lr\": 3e-7,\n",
         
     | 
| 118 | 
         
             
                "        \"peft\": True,\n",
         
     | 
| 119 | 
         
             
                "        \"quantization\": \"int4\",\n",
         
     | 
| 120 | 
         
             
                "        \"target_modules\": \"all-linear\",\n",
         
     | 
| 
         | 
|
| 191 | 
         
             
                 "output_type": "stream",
         
     | 
| 192 | 
         
             
                 "text": [
         
     | 
| 193 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_RFA_gpt3_5.yml\n",
         
     | 
| 194 | 
         
            +
                  "INFO     | 2025-01-08 14:33:16 | autotrain.cli.autotrain:main:58 - Using AutoTrain configuration: ./autotrain_configs/conversation_RFA_gpt3_5.yml\n",
         
     | 
| 195 | 
         
            +
                  "INFO     | 2025-01-08 14:33:16 | autotrain.parser:__post_init__:165 - Running task: lm_training\n",
         
     | 
| 196 | 
         
            +
                  "INFO     | 2025-01-08 14:33:16 | autotrain.parser:__post_init__:166 - Using backend: spaces-l4x1\n",
         
     | 
| 197 | 
         
            +
                  "INFO     | 2025-01-08 14:33:16 | autotrain.parser:run:224 - {'model': 'tiiuae/Falcon3-7B-Instruct', 'project_name': 'falcon-v03-poe-RFA-gpt3-5', 'data_path': 'derek-thomas/labeled-multiple-choice-explained-falcon-tokenized', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 512, 'model_max_length': 1500, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'bf16', 'lr': 3e-07, 'epochs': 4, 'batch_size': 1, 'warmup_ratio': 0.1, 'gradient_accumulation': 8, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'tokenizer', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': None, 'text_column': 'conversation_RFA_gpt3_5', 'rejected_text_column': None, 'push_to_hub': True, 'username': 'derek-thomas', 'token': '*****', 'unsloth': False, 'distributed_backend': None}\n",
         
     | 
| 198 | 
         
            +
                  "INFO     | 2025-01-08 14:33:23 | autotrain.parser:run:229 - Job ID: derek-thomas/autotrain-falcon-v03-poe-RFA-gpt3-5\n",
         
     | 
| 199 | 
         
            +
                  "\n",
         
     | 
| 200 | 
         
            +
                  "---\n",
         
     | 
| 201 | 
         
            +
                  "https://huggingface.co/spaces/derek-thomas/autotrain-falcon-v03-poe-RFA-gpt3-5\n",
         
     | 
| 202 | 
         
            +
                  "---\n",
         
     | 
| 203 | 
         
            +
                  "\n",
         
     | 
| 204 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_RFA_falcon.yml\n",
         
     | 
| 205 | 
         
            +
                  "INFO     | 2025-01-08 14:33:26 | autotrain.cli.autotrain:main:58 - Using AutoTrain configuration: ./autotrain_configs/conversation_RFA_falcon.yml\n",
         
     | 
| 206 | 
         
            +
                  "INFO     | 2025-01-08 14:33:26 | autotrain.parser:__post_init__:165 - Running task: lm_training\n",
         
     | 
| 207 | 
         
            +
                  "INFO     | 2025-01-08 14:33:26 | autotrain.parser:__post_init__:166 - Using backend: spaces-l4x1\n",
         
     | 
| 208 | 
         
            +
                  "INFO     | 2025-01-08 14:33:26 | autotrain.parser:run:224 - {'model': 'tiiuae/Falcon3-7B-Instruct', 'project_name': 'falcon-v03-poe-RFA-falcon', 'data_path': 'derek-thomas/labeled-multiple-choice-explained-falcon-tokenized', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 512, 'model_max_length': 1500, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'bf16', 'lr': 3e-07, 'epochs': 4, 'batch_size': 1, 'warmup_ratio': 0.1, 'gradient_accumulation': 8, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'tokenizer', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': None, 'text_column': 'conversation_RFA_falcon', 'rejected_text_column': None, 'push_to_hub': True, 'username': 'derek-thomas', 'token': '*****', 'unsloth': False, 'distributed_backend': None}\n",
         
     | 
| 209 | 
         
            +
                  "INFO     | 2025-01-08 14:33:32 | autotrain.parser:run:229 - Job ID: derek-thomas/autotrain-falcon-v03-poe-RFA-falcon\n",
         
     | 
| 210 | 
         
            +
                  "\n",
         
     | 
| 211 | 
         
            +
                  "---\n",
         
     | 
| 212 | 
         
            +
                  "https://huggingface.co/spaces/derek-thomas/autotrain-falcon-v03-poe-RFA-falcon\n",
         
     | 
| 213 | 
         
            +
                  "---\n",
         
     | 
| 214 | 
         
            +
                  "\n",
         
     | 
| 215 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_FAR_gpt3_5.yml\n",
         
     | 
| 216 | 
         
            +
                  "INFO     | 2025-01-08 14:33:36 | autotrain.cli.autotrain:main:58 - Using AutoTrain configuration: ./autotrain_configs/conversation_FAR_gpt3_5.yml\n",
         
     | 
| 217 | 
         
            +
                  "INFO     | 2025-01-08 14:33:36 | autotrain.parser:__post_init__:165 - Running task: lm_training\n",
         
     | 
| 218 | 
         
            +
                  "INFO     | 2025-01-08 14:33:36 | autotrain.parser:__post_init__:166 - Using backend: spaces-l4x1\n",
         
     | 
| 219 | 
         
            +
                  "INFO     | 2025-01-08 14:33:36 | autotrain.parser:run:224 - {'model': 'tiiuae/Falcon3-7B-Instruct', 'project_name': 'falcon-v03-poe-FAR-gpt3-5', 'data_path': 'derek-thomas/labeled-multiple-choice-explained-falcon-tokenized', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 512, 'model_max_length': 1500, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'bf16', 'lr': 3e-07, 'epochs': 4, 'batch_size': 1, 'warmup_ratio': 0.1, 'gradient_accumulation': 8, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'tokenizer', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': None, 'text_column': 'conversation_FAR_gpt3_5', 'rejected_text_column': None, 'push_to_hub': True, 'username': 'derek-thomas', 'token': '*****', 'unsloth': False, 'distributed_backend': None}\n",
         
     | 
| 220 | 
         
            +
                  "INFO     | 2025-01-08 14:33:41 | autotrain.parser:run:229 - Job ID: derek-thomas/autotrain-falcon-v03-poe-FAR-gpt3-5\n",
         
     | 
| 221 | 
         
            +
                  "\n",
         
     | 
| 222 | 
         
            +
                  "---\n",
         
     | 
| 223 | 
         
            +
                  "https://huggingface.co/spaces/derek-thomas/autotrain-falcon-v03-poe-FAR-gpt3-5\n",
         
     | 
| 224 | 
         
            +
                  "---\n",
         
     | 
| 225 | 
         
            +
                  "\n",
         
     | 
| 226 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_FAR_falcon.yml\n",
         
     | 
| 227 | 
         
            +
                  "INFO     | 2025-01-08 14:33:45 | autotrain.cli.autotrain:main:58 - Using AutoTrain configuration: ./autotrain_configs/conversation_FAR_falcon.yml\n",
         
     | 
| 228 | 
         
            +
                  "INFO     | 2025-01-08 14:33:45 | autotrain.parser:__post_init__:165 - Running task: lm_training\n",
         
     | 
| 229 | 
         
            +
                  "INFO     | 2025-01-08 14:33:45 | autotrain.parser:__post_init__:166 - Using backend: spaces-l4x1\n",
         
     | 
| 230 | 
         
            +
                  "INFO     | 2025-01-08 14:33:45 | autotrain.parser:run:224 - {'model': 'tiiuae/Falcon3-7B-Instruct', 'project_name': 'falcon-v03-poe-FAR-falcon', 'data_path': 'derek-thomas/labeled-multiple-choice-explained-falcon-tokenized', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 512, 'model_max_length': 1500, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'bf16', 'lr': 3e-07, 'epochs': 4, 'batch_size': 1, 'warmup_ratio': 0.1, 'gradient_accumulation': 8, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'tokenizer', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': None, 'text_column': 'conversation_FAR_falcon', 'rejected_text_column': None, 'push_to_hub': True, 'username': 'derek-thomas', 'token': '*****', 'unsloth': False, 'distributed_backend': None}\n",
         
     | 
| 231 | 
         
            +
                  "INFO     | 2025-01-08 14:33:51 | autotrain.parser:run:229 - Job ID: derek-thomas/autotrain-falcon-v03-poe-FAR-falcon\n",
         
     | 
| 232 | 
         
            +
                  "\n",
         
     | 
| 233 | 
         
            +
                  "---\n",
         
     | 
| 234 | 
         
            +
                  "https://huggingface.co/spaces/derek-thomas/autotrain-falcon-v03-poe-FAR-falcon\n",
         
     | 
| 235 | 
         
            +
                  "---\n",
         
     | 
| 236 | 
         
            +
                  "\n",
         
     | 
| 237 | 
         
             
                  "Running autotrain with config: ./autotrain_configs/conversation_FA.yml\n",
         
     | 
| 238 | 
         
            +
                  "INFO     | 2025-01-08 14:33:54 | autotrain.cli.autotrain:main:58 - Using AutoTrain configuration: ./autotrain_configs/conversation_FA.yml\n",
         
     | 
| 239 | 
         
            +
                  "INFO     | 2025-01-08 14:33:54 | autotrain.parser:__post_init__:165 - Running task: lm_training\n",
         
     | 
| 240 | 
         
            +
                  "INFO     | 2025-01-08 14:33:54 | autotrain.parser:__post_init__:166 - Using backend: spaces-l4x1\n",
         
     | 
| 241 | 
         
            +
                  "INFO     | 2025-01-08 14:33:54 | autotrain.parser:run:224 - {'model': 'tiiuae/Falcon3-7B-Instruct', 'project_name': 'falcon-v03-poe-FA', 'data_path': 'derek-thomas/labeled-multiple-choice-explained-falcon-tokenized', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 512, 'model_max_length': 1500, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'bf16', 'lr': 3e-07, 'epochs': 4, 'batch_size': 1, 'warmup_ratio': 0.1, 'gradient_accumulation': 8, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'tokenizer', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': None, 'text_column': 'conversation_FA', 'rejected_text_column': None, 'push_to_hub': True, 'username': 'derek-thomas', 'token': '*****', 'unsloth': False, 'distributed_backend': None}\n",
         
     | 
| 242 | 
         
            +
                  "INFO     | 2025-01-08 14:34:00 | autotrain.parser:run:229 - Job ID: derek-thomas/autotrain-falcon-v03-poe-FA\n",
         
     | 
| 243 | 
         
            +
                  "\n",
         
     | 
| 244 | 
         
            +
                  "---\n",
         
     | 
| 245 | 
         
            +
                  "https://huggingface.co/spaces/derek-thomas/autotrain-falcon-v03-poe-FA\n",
         
     | 
| 246 | 
         
            +
                  "---\n",
         
     | 
| 247 | 
         
            +
                  "\n"
         
     | 
| 248 | 
         
             
                 ]
         
     | 
| 249 | 
         
             
                }
         
     | 
| 250 | 
         
             
               ],
         
     | 
| 251 | 
         
             
               "source": [
         
     | 
| 252 | 
         
             
                "# Generate configs and run commands\n",
         
     | 
| 253 | 
         
            +
                "autotrain_spaces = []\n",
         
     | 
| 254 | 
         
            +
                "autotrain_models = []\n",
         
     | 
| 255 | 
         
             
                "for project_suffix, text_column in zip(project_suffixes, text_columns):\n",
         
     | 
| 256 | 
         
             
                "    # Modify the config\n",
         
     | 
| 257 | 
         
             
                "    config = config_template.copy()\n",
         
     | 
| 
         | 
|
| 263 | 
         
             
                "    with open(config_path, \"w\") as f:\n",
         
     | 
| 264 | 
         
             
                "        yaml.dump(config, f)\n",
         
     | 
| 265 | 
         
             
                "\n",
         
     | 
| 266 | 
         
            +
                "    # # Run the command\n",
         
     | 
| 267 | 
         
             
                "    print(f\"Running autotrain with config: {config_path}\")\n",
         
     | 
| 268 | 
         
            +
                "    subprocess.run([\"autotrain\", \"--config\", config_path])\n",
         
     | 
| 269 | 
         
            +
                "\n",
         
     | 
| 270 | 
         
            +
                "    space_name = f\"{whoami()['name']}/autotrain-{config['project_name']}\"\n",
         
     | 
| 271 | 
         
            +
                "    model_name = f\"{whoami()['name']}/{config['project_name']}\"\n",
         
     | 
| 272 | 
         
            +
                "    autotrain_spaces.append(space_name)\n",
         
     | 
| 273 | 
         
            +
                "    autotrain_models.append(model_name)\n",
         
     | 
| 274 | 
         
            +
                "    print(f'\\n---\\nhttps://huggingface.co/spaces/{space_name}\\n---\\n')"
         
     | 
| 275 | 
         
            +
               ]
         
     | 
| 276 | 
         
            +
              },
         
     | 
| 277 | 
         
            +
              {
         
     | 
| 278 | 
         
            +
               "cell_type": "markdown",
         
     | 
| 279 | 
         
            +
               "id": "bf3d3324-202b-45df-8897-58cf89931c45",
         
     | 
| 280 | 
         
            +
               "metadata": {},
         
     | 
| 281 | 
         
            +
               "source": [
         
     | 
| 282 | 
         
            +
                "# Cleanup"
         
     | 
| 283 | 
         
            +
               ]
         
     | 
| 284 | 
         
            +
              },
         
     | 
| 285 | 
         
            +
              {
         
     | 
| 286 | 
         
            +
               "cell_type": "code",
         
     | 
| 287 | 
         
            +
               "execution_count": 7,
         
     | 
| 288 | 
         
            +
               "id": "adf09687-ab1e-4f1e-8bf9-317cc928467a",
         
     | 
| 289 | 
         
            +
               "metadata": {},
         
     | 
| 290 | 
         
            +
               "outputs": [],
         
     | 
| 291 | 
         
            +
               "source": [
         
     | 
| 292 | 
         
            +
                "from huggingface_hub import HfApi\n",
         
     | 
| 293 | 
         
            +
                "api = HfApi()"
         
     | 
| 294 | 
         
            +
               ]
         
     | 
| 295 | 
         
            +
              },
         
     | 
| 296 | 
         
            +
              {
         
     | 
| 297 | 
         
            +
               "cell_type": "code",
         
     | 
| 298 | 
         
            +
               "execution_count": 8,
         
     | 
| 299 | 
         
            +
               "id": "19d80d26-cda4-41fb-a125-06060c3f90ce",
         
     | 
| 300 | 
         
            +
               "metadata": {},
         
     | 
| 301 | 
         
            +
               "outputs": [
         
     | 
| 302 | 
         
            +
                {
         
     | 
| 303 | 
         
            +
                 "name": "stdout",
         
     | 
| 304 | 
         
            +
                 "output_type": "stream",
         
     | 
| 305 | 
         
            +
                 "text": [
         
     | 
| 306 | 
         
            +
                  "['derek-thomas/autotrain-falcon-v03-poe-RFA-gpt3-5',\n",
         
     | 
| 307 | 
         
            +
                  " 'derek-thomas/autotrain-falcon-v03-poe-RFA-falcon',\n",
         
     | 
| 308 | 
         
            +
                  " 'derek-thomas/autotrain-falcon-v03-poe-FAR-gpt3-5',\n",
         
     | 
| 309 | 
         
            +
                  " 'derek-thomas/autotrain-falcon-v03-poe-FAR-falcon',\n",
         
     | 
| 310 | 
         
            +
                  " 'derek-thomas/autotrain-falcon-v03-poe-FA']\n",
         
     | 
| 311 | 
         
            +
                  "\n",
         
     | 
| 312 | 
         
            +
                  "['derek-thomas/falcon-v03-poe-RFA-gpt3-5',\n",
         
     | 
| 313 | 
         
            +
                  " 'derek-thomas/falcon-v03-poe-RFA-falcon',\n",
         
     | 
| 314 | 
         
            +
                  " 'derek-thomas/falcon-v03-poe-FAR-gpt3-5',\n",
         
     | 
| 315 | 
         
            +
                  " 'derek-thomas/falcon-v03-poe-FAR-falcon',\n",
         
     | 
| 316 | 
         
            +
                  " 'derek-thomas/falcon-v03-poe-FA']\n"
         
     | 
| 317 | 
         
            +
                 ]
         
     | 
| 318 | 
         
            +
                }
         
     | 
| 319 | 
         
            +
               ],
         
     | 
| 320 | 
         
            +
               "source": [
         
     | 
| 321 | 
         
            +
                "from pprint import pprint\n",
         
     | 
| 322 | 
         
            +
                "pprint(autotrain_spaces)\n",
         
     | 
| 323 | 
         
            +
                "print()\n",
         
     | 
| 324 | 
         
            +
                "pprint(autotrain_models)"
         
     | 
| 325 | 
         
            +
               ]
         
     | 
| 326 | 
         
            +
              },
         
     | 
| 327 | 
         
            +
              {
         
     | 
| 328 | 
         
            +
               "cell_type": "markdown",
         
     | 
| 329 | 
         
            +
               "id": "0040e05b-39c2-4aff-a40e-01577a388eff",
         
     | 
| 330 | 
         
            +
               "metadata": {},
         
     | 
| 331 | 
         
            +
               "source": [
         
     | 
| 332 | 
         
            +
                "<span style=\"color:red; font-size:20px; font-weight:bold;\">\n",
         
     | 
| 333 | 
         
            +
                "WAIT TO RUN THIS UNTIL YOUR SPACES ARE FINISHED TRAINING!\n",
         
     | 
| 334 | 
         
            +
                "</span>"
         
     | 
| 335 | 
         
            +
               ]
         
     | 
| 336 | 
         
            +
              },
         
     | 
| 337 | 
         
            +
              {
         
     | 
| 338 | 
         
            +
               "cell_type": "code",
         
     | 
| 339 | 
         
            +
               "execution_count": null,
         
     | 
| 340 | 
         
            +
               "id": "f86ed8ad-4e38-454a-a2c1-b1f075399c37",
         
     | 
| 341 | 
         
            +
               "metadata": {},
         
     | 
| 342 | 
         
            +
               "outputs": [],
         
     | 
| 343 | 
         
            +
               "source": [
         
     | 
| 344 | 
         
            +
                "for space in autotrain_spaces:\n",
         
     | 
| 345 | 
         
            +
                "    confirm = input(f\"Are you sure you want to delete the space '{space}'? (y/n): \")\n",
         
     | 
| 346 | 
         
            +
                "    if confirm.lower() == 'y':\n",
         
     | 
| 347 | 
         
            +
                "        api.delete_repo(space, repo_type='space')\n",
         
     | 
| 348 | 
         
            +
                "        print(f\"Deleted {space}\")\n",
         
     | 
| 349 | 
         
            +
                "    else:\n",
         
     | 
| 350 | 
         
            +
                "        print(f\"Skipped {space}\")\n"
         
     | 
| 351 | 
         
            +
               ]
         
     | 
| 352 | 
         
            +
              },
         
     | 
| 353 | 
         
            +
              {
         
     | 
| 354 | 
         
            +
               "cell_type": "markdown",
         
     | 
| 355 | 
         
            +
               "id": "2182f8fe-8504-4cb9-a0a6-4b143541158d",
         
     | 
| 356 | 
         
            +
               "metadata": {},
         
     | 
| 357 | 
         
            +
               "source": [
         
     | 
| 358 | 
         
            +
                "<span style=\"color:red; font-size:20px; font-weight:bold;\">\n",
         
     | 
| 359 | 
         
            +
                "ONLY RUN THIS IF YOU NEED TO RESTART FROM SCRATCH\n",
         
     | 
| 360 | 
         
            +
                "THIS WILL DELETE YOUR MODELS\n",
         
     | 
| 361 | 
         
            +
                "</span>"
         
     | 
| 362 | 
         
            +
               ]
         
     | 
| 363 | 
         
            +
              },
         
     | 
| 364 | 
         
            +
              {
         
     | 
| 365 | 
         
            +
               "cell_type": "code",
         
     | 
| 366 | 
         
            +
               "execution_count": null,
         
     | 
| 367 | 
         
            +
               "id": "12939405-a731-4a7c-ab4a-e1a4f1850bb6",
         
     | 
| 368 | 
         
            +
               "metadata": {},
         
     | 
| 369 | 
         
            +
               "outputs": [],
         
     | 
| 370 | 
         
            +
               "source": [
         
     | 
| 371 | 
         
            +
                "# for model in autotrain_models:\n",
         
     | 
| 372 | 
         
            +
                "#     confirm = input(f\"Are you sure you want to delete the model '{model}'? (y/n): \")\n",
         
     | 
| 373 | 
         
            +
                "#     if confirm.lower() == 'y':\n",
         
     | 
| 374 | 
         
            +
                "#         api.delete_repo(model, repo_type='model')\n",
         
     | 
| 375 | 
         
            +
                "#         print(f\"Deleted {model}\")\n",
         
     | 
| 376 | 
         
            +
                "#     else:\n",
         
     | 
| 377 | 
         
            +
                "#         print(f\"Skipped {model}\")\n"
         
     | 
| 378 | 
         
             
               ]
         
     | 
| 379 | 
         
             
              },
         
     | 
| 380 | 
         
             
              {
         
     | 
| 381 | 
         
             
               "cell_type": "code",
         
     | 
| 382 | 
         
             
               "execution_count": null,
         
     | 
| 383 | 
         
            +
               "id": "c2a2c864-2082-4be9-8e28-92fd01833e38",
         
     | 
| 384 | 
         
             
               "metadata": {},
         
     | 
| 385 | 
         
             
               "outputs": [],
         
     | 
| 386 | 
         
             
               "source": []
         
     |