| [2025-10-24 23:55:28] ======================================== | |
| [2025-10-24 23:55:28] Job Name: testing__pvv2_lora | |
| [2025-10-24 23:55:28] Hostname: gl007.hpc.nyu.edu | |
| [2025-10-24 23:55:28] Number of nodes: 1 | |
| [2025-10-24 23:55:28] GPUs per node: 2 | |
| [2025-10-24 23:55:28] Start Time: Fri Oct 24 11:55:28 PM EDT 2025 | |
| [2025-10-24 23:55:28] Log file: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/logs/pipeline.log | |
| [2025-10-24 23:55:28] ======================================== | |
| [2025-10-24 23:55:28] Sourcing secrets from: /scratch/zrs2020/LlamaFactoryHelper/secrets.env | |
| [2025-10-24 23:55:30] | |
| [2025-10-24 23:55:30] ======================================== | |
| [2025-10-24 23:55:30] Configuration Paths | |
| [2025-10-24 23:55:30] ======================================== | |
| [2025-10-24 23:55:30] Train Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/configs/train_config.yaml | |
| [2025-10-24 23:55:30] Merge Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/configs/merge_config.yaml | |
| [2025-10-24 23:55:30] Dataset Info: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory/data/dataset_info.json | |
| [2025-10-24 23:55:30] Output Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints | |
| [2025-10-24 23:55:30] Export Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged | |
| [2025-10-24 23:55:30] HF Repo ID: TAUR-dev/testing__pvv2_lora | |
| [2025-10-24 23:55:30] | |
| [make-effective-cfg] tokenized_path: /scratch/zrs2020/.cache/hf_cache/home/llamafactory/tokenized/TAUR_dev_D_SFT_C_ours_cd3arg_10responses_reflections10_formats_C_full_fb94f2a3 | |
| [make-effective-cfg] wrote: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/logs/train_config.effective.yaml | |
| [2025-10-24 23:55:30] | |
| [2025-10-24 23:55:30] ======================================== | |
| [2025-10-24 23:55:30] STAGE 0: Downloading Dataset | |
| [2025-10-24 23:55:30] Dataset: TAUR-dev/D-SFT_C-ours_cd3arg_10responses_reflections10_formats-C_full | |
| [2025-10-24 23:55:30] Start Time: Fri Oct 24 11:55:30 PM EDT 2025 | |
| [2025-10-24 23:55:30] ======================================== | |
| [dataset-download] Loading dataset from: TAUR-dev/D-SFT_C-ours_cd3arg_10responses_reflections10_formats-C_full | |
| [dataset-download] Dataset loaded successfully | |
| [dataset-download] Dataset info: DatasetDict({ | |
| train: Dataset({ | |
| features: ['conversations', 'sft_template_type_idx'], | |
| num_rows: 29130 | |
| }) | |
| }) | |
| [2025-10-24 23:55:32] | |
| [2025-10-24 23:55:32] ======================================== | |
| [2025-10-24 23:55:32] Dataset download completed | |
| [2025-10-24 23:55:32] End Time: Fri Oct 24 11:55:32 PM EDT 2025 | |
| [2025-10-24 23:55:32] ======================================== | |
| [2025-10-24 23:55:32] | |
| [2025-10-24 23:55:32] ======================================== | |
| [2025-10-24 23:55:32] STAGE 1: Training Model | |
| [2025-10-24 23:55:32] Start Time: Fri Oct 24 11:55:32 PM EDT 2025 | |
| [2025-10-24 23:55:32] ======================================== | |
| [2025-10-24 23:55:32] Job: testing__pvv2_lora | |
| [2025-10-24 23:55:32] Nodes: 1 | GPUs/node: 2 | |
| [2025-10-24 23:55:32] Master: 127.0.0.1:29500 | |
| [2025-10-24 23:55:32] LLaMA-Factory: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory | |
| [2025-10-24 23:55:32] Train cfg (effective): /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/logs/train_config.effective.yaml | |
| [2025-10-24 23:55:32] HF cache: /scratch/zrs2020/.cache/hf_cache/home/datasets | |
| [2025-10-24 23:55:32] Launcher: torchrun | |
| [2025-10-24 23:55:32] | |
| [2025-10-24 23:55:32] Single-node training (2 GPU(s)) | |
| [2025-10-24 23:55:32] Executing command: llamafactory-cli train /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/logs/train_config.effective.yaml | |
| /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. | |
| warnings.warn( | |
| [INFO|2025-10-24 23:55:40] llamafactory.launcher:143 >> Initializing 2 distributed tasks at: 127.0.0.1:29500 | |
| W1024 23:55:41.864000 3022854 site-packages/torch/distributed/run.py:803] | |
| W1024 23:55:41.864000 3022854 site-packages/torch/distributed/run.py:803] ***************************************** | |
| W1024 23:55:41.864000 3022854 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| W1024 23:55:41.864000 3022854 site-packages/torch/distributed/run.py:803] ***************************************** | |
| /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. | |
| warnings.warn( | |
| /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. | |
| warnings.warn( | |
| /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. | |
| import pkg_resources | |
| /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. | |
| import pkg_resources | |
| [W1024 23:55:50.757874363 ProcessGroupNCCL.cpp:924] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) | |
| [W1024 23:55:50.757887679 ProcessGroupNCCL.cpp:924] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) | |
| [INFO|2025-10-24 23:55:50] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled. | |
| [INFO|2025-10-24 23:55:50] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 2, device: cuda:0, distributed training: True, compute dtype: torch.bfloat16 | |
| [INFO|2025-10-24 23:55:50] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 2, device: cuda:1, distributed training: True, compute dtype: torch.bfloat16 | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,441 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/vocab.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/merges.txt | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file added_tokens.json from cache at None | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file special_tokens_map.json from cache at None | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file chat_template.jinja from cache at None | |
| [INFO|tokenization_utils_base.py:2364] 2025-10-24 23:55:50,609 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| [INFO|configuration_utils.py:765] 2025-10-24 23:55:50,826 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json | |
| [INFO|configuration_utils.py:839] 2025-10-24 23:55:50,828 >> Model config Qwen2Config { | |
| "architectures": [ | |
| "Qwen2ForCausalLM" | |
| ], | |
| "attention_dropout": 0.0, | |
| "bos_token_id": 151643, | |
| "dtype": "bfloat16", | |
| "eos_token_id": 151645, | |
| "hidden_act": "silu", | |
| "hidden_size": 1536, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 8960, | |
| "layer_types": [ | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention" | |
| ], | |
| "max_position_embeddings": 32768, | |
| "max_window_layers": 21, | |
| "model_type": "qwen2", | |
| "num_attention_heads": 12, | |
| "num_hidden_layers": 28, | |
| "num_key_value_heads": 2, | |
| "rms_norm_eps": 1e-06, | |
| "rope_scaling": null, | |
| "rope_theta": 1000000.0, | |
| "sliding_window": null, | |
| "tie_word_embeddings": true, | |
| "transformers_version": "4.57.1", | |
| "use_cache": true, | |
| "use_sliding_window": false, | |
| "vocab_size": 151936 | |
| } | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/vocab.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/merges.txt | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file added_tokens.json from cache at None | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file special_tokens_map.json from cache at None | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file chat_template.jinja from cache at None | |
| [INFO|tokenization_utils_base.py:2364] 2025-10-24 23:55:51,063 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| [WARNING|2025-10-24 23:55:51] llamafactory.data.loader:148 >> Loading dataset from disk will ignore other data arguments. | |
| [INFO|2025-10-24 23:55:51] llamafactory.data.loader:143 >> Loaded tokenized dataset from /scratch/zrs2020/.cache/hf_cache/home/llamafactory/tokenized/TAUR_dev_D_SFT_C_ours_cd3arg_10responses_reflections10_formats_C_full_fb94f2a3. | |
| [INFO|configuration_utils.py:765] 2025-10-24 23:55:51,138 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json | |
| [INFO|configuration_utils.py:839] 2025-10-24 23:55:51,138 >> Model config Qwen2Config { | |
| "architectures": [ | |
| "Qwen2ForCausalLM" | |
| ], | |
| "attention_dropout": 0.0, | |
| "bos_token_id": 151643, | |
| "dtype": "bfloat16", | |
| "eos_token_id": 151645, | |
| "hidden_act": "silu", | |
| "hidden_size": 1536, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 8960, | |
| "layer_types": [ | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention" | |
| ], | |
| "max_position_embeddings": 32768, | |
| "max_window_layers": 21, | |
| "model_type": "qwen2", | |
| "num_attention_heads": 12, | |
| "num_hidden_layers": 28, | |
| "num_key_value_heads": 2, | |
| "rms_norm_eps": 1e-06, | |
| "rope_scaling": null, | |
| "rope_theta": 1000000.0, | |
| "sliding_window": null, | |
| "tie_word_embeddings": true, | |
| "transformers_version": "4.57.1", | |
| "use_cache": true, | |
| "use_sliding_window": false, | |
| "vocab_size": 151936 | |
| } | |
| [INFO|2025-10-24 23:55:51] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training. | |
| [WARNING|logging.py:328] 2025-10-24 23:55:51,492 >> `torch_dtype` is deprecated! Use `dtype` instead! | |
| [INFO|modeling_utils.py:1172] 2025-10-24 23:55:51,493 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/model.safetensors | |
| [INFO|modeling_utils.py:2341] 2025-10-24 23:55:51,494 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. | |
| [INFO|configuration_utils.py:986] 2025-10-24 23:55:51,495 >> Generate config GenerationConfig { | |
| "bos_token_id": 151643, | |
| "eos_token_id": 151645, | |
| "use_cache": false | |
| } | |
| `torch_dtype` is deprecated! Use `dtype` instead! | |
| [INFO|configuration_utils.py:941] 2025-10-24 23:55:52,421 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/generation_config.json | |
| [INFO|configuration_utils.py:986] 2025-10-24 23:55:52,421 >> Generate config GenerationConfig { | |
| "bos_token_id": 151643, | |
| "do_sample": true, | |
| "eos_token_id": [ | |
| 151645, | |
| 151643 | |
| ], | |
| "pad_token_id": 151643, | |
| "repetition_penalty": 1.1, | |
| "temperature": 0.7, | |
| "top_k": 20, | |
| "top_p": 0.8 | |
| } | |
| [INFO|dynamic_module_utils.py:423] 2025-10-24 23:55:52,453 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-1.5B-Instruct. | |
| [INFO|2025-10-24 23:55:52] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled. | |
| [INFO|2025-10-24 23:55:52] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference. | |
| [INFO|2025-10-24 23:55:52] llamafactory.model.adapter:143 >> Upcasting trainable params to float32. | |
| [INFO|2025-10-24 23:55:52] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA | |
| [INFO|2025-10-24 23:55:52] llamafactory.model.model_utils.misc:143 >> Found linear modules: o_proj,gate_proj,q_proj,down_proj,v_proj,k_proj,up_proj | |
| [INFO|2025-10-24 23:55:52] llamafactory.model.loader:143 >> trainable params: 9,232,384 || all params: 1,552,946,688 || trainable%: 0.5945 | |
| [WARNING|trainer.py:906] 2025-10-24 23:55:52,738 >> The model is already on multiple devices. Skipping the move to device specified in `args`. | |
| [INFO|trainer.py:699] 2025-10-24 23:55:52,740 >> max_steps is given, it will override any value given in num_train_epochs | |
| [INFO|trainer.py:749] 2025-10-24 23:55:52,740 >> Using auto half precision backend | |
| [WARNING|trainer.py:982] 2025-10-24 23:55:52,742 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. | |
| The model is already on multiple devices. Skipping the move to device specified in `args`. | |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. | |
| NCCL version 2.27.5+cuda12.9 | |
| [INFO|trainer.py:2519] 2025-10-24 23:55:53,120 >> ***** Running training ***** | |
| [INFO|trainer.py:2520] 2025-10-24 23:55:53,120 >> Num examples = 29,130 | |
| [INFO|trainer.py:2521] 2025-10-24 23:55:53,120 >> Num Epochs = 1 | |
| [INFO|trainer.py:2522] 2025-10-24 23:55:53,120 >> Instantaneous batch size per device = 1 | |
| [INFO|trainer.py:2525] 2025-10-24 23:55:53,120 >> Total train batch size (w. parallel, distributed & accumulation) = 2 | |
| [INFO|trainer.py:2526] 2025-10-24 23:55:53,120 >> Gradient Accumulation steps = 1 | |
| [INFO|trainer.py:2527] 2025-10-24 23:55:53,120 >> Total optimization steps = 10 | |
| [INFO|trainer.py:2528] 2025-10-24 23:55:53,122 >> Number of trainable parameters = 9,232,384 | |
| [INFO|integration_utils.py:867] 2025-10-24 23:55:53,220 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" | |
| wandb: Currently logged in as: zsprague (ut_nlp_deduce) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin | |
| wandb: Tracking run with wandb version 0.22.2 | |
| wandb: Run data is saved locally in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251024_235553-oqx8ngeo | |
| wandb: Run `wandb offline` to turn off syncing. | |
| wandb: Syncing run testing__pvv2_lora | |
| wandb: View project at https://wandb.ai/ut_nlp_deduce/llamafactory | |
| wandb: View run at https://wandb.ai/ut_nlp_deduce/llamafactory/runs/oqx8ngeo | |
| 0%| | 0/10 [00:00<?, ?it/s] 10%| | 1/10 [00:01<00:10, 1.22s/it] 20%| | 2/10 [00:01<00:06, 1.27it/s] 30%| | 3/10 [00:02<00:04, 1.68it/s] 40%| | 4/10 [00:03<00:04, 1.37it/s] 50%| | 5/10 [00:03<00:03, 1.49it/s][INFO|trainer.py:4309] 2025-10-24 23:55:57,737 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-5 | |
| [INFO|configuration_utils.py:765] 2025-10-24 23:55:57,839 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json | |
| [INFO|configuration_utils.py:839] 2025-10-24 23:55:57,840 >> Model config Qwen2Config { | |
| "architectures": [ | |
| "Qwen2ForCausalLM" | |
| ], | |
| "attention_dropout": 0.0, | |
| "bos_token_id": 151643, | |
| "dtype": "bfloat16", | |
| "eos_token_id": 151645, | |
| "hidden_act": "silu", | |
| "hidden_size": 1536, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 8960, | |
| "layer_types": [ | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention" | |
| ], | |
| "max_position_embeddings": 32768, | |
| "max_window_layers": 21, | |
| "model_type": "qwen2", | |
| "num_attention_heads": 12, | |
| "num_hidden_layers": 28, | |
| "num_key_value_heads": 2, | |
| "rms_norm_eps": 1e-06, | |
| "rope_scaling": null, | |
| "rope_theta": 1000000.0, | |
| "sliding_window": null, | |
| "tie_word_embeddings": true, | |
| "transformers_version": "4.57.1", | |
| "use_cache": true, | |
| "use_sliding_window": false, | |
| "vocab_size": 151936 | |
| } | |
| [INFO|tokenization_utils_base.py:2421] 2025-10-24 23:55:58,067 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-5/chat_template.jinja | |
| [INFO|tokenization_utils_base.py:2590] 2025-10-24 23:55:58,072 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-5/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2599] 2025-10-24 23:55:58,076 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-5/special_tokens_map.json | |
| 60%| | 6/10 [00:05<00:04, 1.18s/it] 70%| | 7/10 [00:06<00:02, 1.11it/s] 80%| | 8/10 [00:06<00:01, 1.23it/s] 90%| | 9/10 [00:07<00:00, 1.45it/s]100%|| 10/10 [00:08<00:00, 1.23it/s] {'loss': 0.7188, 'grad_norm': 0.2177160233259201, 'learning_rate': 3.015368960704584e-08, 'epoch': 0.0} | |
| 100%|| 10/10 [00:08<00:00, 1.23it/s][INFO|trainer.py:4309] 2025-10-24 23:56:02,371 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10 | |
| [INFO|configuration_utils.py:765] 2025-10-24 23:56:02,490 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json | |
| [INFO|configuration_utils.py:839] 2025-10-24 23:56:02,491 >> Model config Qwen2Config { | |
| "architectures": [ | |
| "Qwen2ForCausalLM" | |
| ], | |
| "attention_dropout": 0.0, | |
| "bos_token_id": 151643, | |
| "dtype": "bfloat16", | |
| "eos_token_id": 151645, | |
| "hidden_act": "silu", | |
| "hidden_size": 1536, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 8960, | |
| "layer_types": [ | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention" | |
| ], | |
| "max_position_embeddings": 32768, | |
| "max_window_layers": 21, | |
| "model_type": "qwen2", | |
| "num_attention_heads": 12, | |
| "num_hidden_layers": 28, | |
| "num_key_value_heads": 2, | |
| "rms_norm_eps": 1e-06, | |
| "rope_scaling": null, | |
| "rope_theta": 1000000.0, | |
| "sliding_window": null, | |
| "tie_word_embeddings": true, | |
| "transformers_version": "4.57.1", | |
| "use_cache": true, | |
| "use_sliding_window": false, | |
| "vocab_size": 151936 | |
| } | |
| [INFO|tokenization_utils_base.py:2421] 2025-10-24 23:56:02,701 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10/chat_template.jinja | |
| [INFO|tokenization_utils_base.py:2590] 2025-10-24 23:56:02,706 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2599] 2025-10-24 23:56:02,710 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10/special_tokens_map.json | |
| [INFO|trainer.py:2810] 2025-10-24 23:56:03,258 >> | |
| Training completed. Do not forget to share your model on huggingface.co/models =) | |
| {'train_runtime': 10.137, 'train_samples_per_second': 1.973, 'train_steps_per_second': 0.986, 'train_loss': 0.718793535232544, 'epoch': 0.0} | |
| 100%|| 10/10 [00:09<00:00, 1.23it/s]100%|| 10/10 [00:09<00:00, 1.10it/s] | |
| [INFO|trainer.py:4309] 2025-10-24 23:56:03,267 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints | |
| [INFO|configuration_utils.py:765] 2025-10-24 23:56:03,356 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json | |
| [INFO|configuration_utils.py:839] 2025-10-24 23:56:03,357 >> Model config Qwen2Config { | |
| "architectures": [ | |
| "Qwen2ForCausalLM" | |
| ], | |
| "attention_dropout": 0.0, | |
| "bos_token_id": 151643, | |
| "dtype": "bfloat16", | |
| "eos_token_id": 151645, | |
| "hidden_act": "silu", | |
| "hidden_size": 1536, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 8960, | |
| "layer_types": [ | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention" | |
| ], | |
| "max_position_embeddings": 32768, | |
| "max_window_layers": 21, | |
| "model_type": "qwen2", | |
| "num_attention_heads": 12, | |
| "num_hidden_layers": 28, | |
| "num_key_value_heads": 2, | |
| "rms_norm_eps": 1e-06, | |
| "rope_scaling": null, | |
| "rope_theta": 1000000.0, | |
| "sliding_window": null, | |
| "tie_word_embeddings": true, | |
| "transformers_version": "4.57.1", | |
| "use_cache": true, | |
| "use_sliding_window": false, | |
| "vocab_size": 151936 | |
| } | |
| [INFO|tokenization_utils_base.py:2421] 2025-10-24 23:56:03,588 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/chat_template.jinja | |
| [INFO|tokenization_utils_base.py:2590] 2025-10-24 23:56:03,592 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2599] 2025-10-24 23:56:03,596 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/special_tokens_map.json | |
| ***** train metrics ***** | |
| epoch = 0.0007 | |
| total_flos = 414519GF | |
| train_loss = 0.7188 | |
| train_runtime = 0:00:10.13 | |
| train_samples_per_second = 1.973 | |
| train_steps_per_second = 0.986 | |
| [INFO|modelcard.py:456] 2025-10-24 23:56:03,838 >> Dropping the following result as it does not have all the necessary fields: | |
| {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}} | |
| [W1024 23:56:04.029787829 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator()) | |
| [1;34mwandb[0m: | |
| [1;34mwandb[0m: View run [33mtesting__pvv2_lora[0m at: [34m[0m | |
| [1;34mwandb[0m: Find logs at: [1;35mwandb/run-20251024_235553-oqx8ngeo/logs[0m | |
| [W1024 23:56:05.730735839 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator()) | |
| [W1024 23:56:05.132682733 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator()) | |
| [W1024 23:56:05.555229777 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator()) | |
| [2025-10-24 23:56:06] | |
| [2025-10-24 23:56:06] ======================================== | |
| [2025-10-24 23:56:06] Training completed successfully | |
| [2025-10-24 23:56:06] End Time: Fri Oct 24 11:56:06 PM EDT 2025 | |
| [2025-10-24 23:56:06] ======================================== | |
| [2025-10-24 23:56:06] | |
| [2025-10-24 23:56:06] ======================================== | |
| [2025-10-24 23:56:06] STAGE 2: Merging/Exporting Model | |
| [2025-10-24 23:56:06] Start Time: Fri Oct 24 11:56:06 PM EDT 2025 | |
| [2025-10-24 23:56:06] ======================================== | |
| [2025-10-24 23:56:06] Looking for checkpoints in: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints | |
| [2025-10-24 23:56:06] Analyzing checkpoints to find the one from current training run... | |
| [2025-10-24 23:56:06] - checkpoint-10: trainer_state.json modified at Fri Oct 24 11:56:03 PM EDT 2025 | |
| [2025-10-24 23:56:06] - checkpoint-5: trainer_state.json modified at Fri Oct 24 11:55:58 PM EDT 2025 | |
| [2025-10-24 23:56:06] | |
| [2025-10-24 23:56:06] Selected checkpoint: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10 | |
| [2025-10-24 23:56:06] This checkpoint has the most recently updated trainer_state.json | |
| [2025-10-24 23:56:06] Checkpoint details: | |
| [2025-10-24 23:56:06] Path: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10 | |
| [2025-10-24 23:56:06] Last modified: 2025-10-24 23:56:03.255712120 -0400 | |
| [2025-10-24 23:56:06] Training step: 10 | |
| [2025-10-24 23:56:06] Updating merge config to point to checkpoint... | |
| Successfully updated merge config | |
| [2025-10-24 23:56:06] Updated merge config to use: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10 | |
| [2025-10-24 23:56:06] | |
| [2025-10-24 23:56:06] Merge config contents: | |
| [2025-10-24 23:56:06] template: qwen | |
| [2025-10-24 23:56:06] trust_remote_code: true | |
| [2025-10-24 23:56:06] export_dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged | |
| [2025-10-24 23:56:06] model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct | |
| [2025-10-24 23:56:06] adapter_name_or_path: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10 | |
| [2025-10-24 23:56:06] | |
| [2025-10-24 23:56:06] Executing command: llamafactory-cli export /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/configs/merge_config.yaml | |
| /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. | |
| warnings.warn( | |
| /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. | |
| import pkg_resources | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,985 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/vocab.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/merges.txt | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file added_tokens.json from cache at None | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file special_tokens_map.json from cache at None | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file chat_template.jinja from cache at None | |
| [INFO|tokenization_utils_base.py:2364] 2025-10-24 23:56:14,157 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| [INFO|configuration_utils.py:765] 2025-10-24 23:56:14,372 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json | |
| [INFO|configuration_utils.py:839] 2025-10-24 23:56:14,374 >> Model config Qwen2Config { | |
| "architectures": [ | |
| "Qwen2ForCausalLM" | |
| ], | |
| "attention_dropout": 0.0, | |
| "bos_token_id": 151643, | |
| "dtype": "bfloat16", | |
| "eos_token_id": 151645, | |
| "hidden_act": "silu", | |
| "hidden_size": 1536, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 8960, | |
| "layer_types": [ | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention" | |
| ], | |
| "max_position_embeddings": 32768, | |
| "max_window_layers": 21, | |
| "model_type": "qwen2", | |
| "num_attention_heads": 12, | |
| "num_hidden_layers": 28, | |
| "num_key_value_heads": 2, | |
| "rms_norm_eps": 1e-06, | |
| "rope_scaling": null, | |
| "rope_theta": 1000000.0, | |
| "sliding_window": null, | |
| "tie_word_embeddings": true, | |
| "transformers_version": "4.57.1", | |
| "use_cache": true, | |
| "use_sliding_window": false, | |
| "vocab_size": 151936 | |
| } | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/vocab.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/merges.txt | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file added_tokens.json from cache at None | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file special_tokens_map.json from cache at None | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file chat_template.jinja from cache at None | |
| [INFO|tokenization_utils_base.py:2364] 2025-10-24 23:56:14,608 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| [INFO|configuration_utils.py:765] 2025-10-24 23:56:14,663 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json | |
| [INFO|configuration_utils.py:839] 2025-10-24 23:56:14,663 >> Model config Qwen2Config { | |
| "architectures": [ | |
| "Qwen2ForCausalLM" | |
| ], | |
| "attention_dropout": 0.0, | |
| "bos_token_id": 151643, | |
| "dtype": "bfloat16", | |
| "eos_token_id": 151645, | |
| "hidden_act": "silu", | |
| "hidden_size": 1536, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 8960, | |
| "layer_types": [ | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention", | |
| "full_attention" | |
| ], | |
| "max_position_embeddings": 32768, | |
| "max_window_layers": 21, | |
| "model_type": "qwen2", | |
| "num_attention_heads": 12, | |
| "num_hidden_layers": 28, | |
| "num_key_value_heads": 2, | |
| "rms_norm_eps": 1e-06, | |
| "rope_scaling": null, | |
| "rope_theta": 1000000.0, | |
| "sliding_window": null, | |
| "tie_word_embeddings": true, | |
| "transformers_version": "4.57.1", | |
| "use_cache": true, | |
| "use_sliding_window": false, | |
| "vocab_size": 151936 | |
| } | |
| [WARNING|logging.py:328] 2025-10-24 23:56:14,663 >> `torch_dtype` is deprecated! Use `dtype` instead! | |
| [INFO|2025-10-24 23:56:14] llamafactory.model.model_utils.kv_cache:143 >> KV cache is enabled for faster generation. | |
| [WARNING|logging.py:328] 2025-10-24 23:56:15,013 >> `torch_dtype` is deprecated! Use `dtype` instead! | |
| [INFO|modeling_utils.py:1172] 2025-10-24 23:56:15,014 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/model.safetensors | |
| [INFO|modeling_utils.py:2341] 2025-10-24 23:56:15,015 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. | |
| [INFO|configuration_utils.py:986] 2025-10-24 23:56:15,016 >> Generate config GenerationConfig { | |
| "bos_token_id": 151643, | |
| "eos_token_id": 151645 | |
| } | |
| [INFO|configuration_utils.py:941] 2025-10-24 23:56:15,118 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/generation_config.json | |
| [INFO|configuration_utils.py:986] 2025-10-24 23:56:15,119 >> Generate config GenerationConfig { | |
| "bos_token_id": 151643, | |
| "do_sample": true, | |
| "eos_token_id": [ | |
| 151645, | |
| 151643 | |
| ], | |
| "pad_token_id": 151643, | |
| "repetition_penalty": 1.1, | |
| "temperature": 0.7, | |
| "top_k": 20, | |
| "top_p": 0.8 | |
| } | |
| [INFO|dynamic_module_utils.py:423] 2025-10-24 23:56:15,148 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-1.5B-Instruct. | |
| [INFO|2025-10-24 23:56:15] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference. | |
| [INFO|2025-10-24 23:56:17] llamafactory.model.adapter:143 >> Merged 1 adapter(s). | |
| [INFO|2025-10-24 23:56:17] llamafactory.model.adapter:143 >> Loaded adapter(s): /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10 | |
| [INFO|2025-10-24 23:56:17] llamafactory.model.loader:143 >> all params: 1,543,714,304 | |
| [INFO|2025-10-24 23:56:17] llamafactory.train.tuner:143 >> Convert model dtype to: torch.bfloat16. | |
| [INFO|configuration_utils.py:491] 2025-10-24 23:56:17,909 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/config.json | |
| [INFO|configuration_utils.py:757] 2025-10-24 23:56:17,914 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/generation_config.json | |
| [INFO|modeling_utils.py:4181] 2025-10-24 23:56:21,705 >> Model weights saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/model.safetensors | |
| [INFO|tokenization_utils_base.py:2421] 2025-10-24 23:56:21,725 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/chat_template.jinja | |
| [INFO|tokenization_utils_base.py:2590] 2025-10-24 23:56:21,745 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2599] 2025-10-24 23:56:21,765 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/special_tokens_map.json | |
| [INFO|2025-10-24 23:56:21] llamafactory.train.tuner:143 >> Ollama modelfile saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/Modelfile | |
| [2025-10-24 23:56:22] | |
| [2025-10-24 23:56:22] ======================================== | |
| [2025-10-24 23:56:22] Merge/Export completed successfully | |
| [2025-10-24 23:56:22] End Time: Fri Oct 24 11:56:22 PM EDT 2025 | |
| [2025-10-24 23:56:22] ======================================== | |
| [2025-10-24 23:56:22] | |
| [2025-10-24 23:56:22] ======================================== | |
| [2025-10-24 23:56:22] Preparing Training Artifacts | |
| [2025-10-24 23:56:22] ======================================== | |
| [2025-10-24 23:56:22] Copying configuration files... | |
| [2025-10-24 23:56:22] Copying and cleaning training logs... | |