Zaynes's picture
Upload folder using huggingface_hub
2203975 verified
[2025-10-24 23:55:28] ========================================
[2025-10-24 23:55:28] Job Name: testing__pvv2_lora
[2025-10-24 23:55:28] Hostname: gl007.hpc.nyu.edu
[2025-10-24 23:55:28] Number of nodes: 1
[2025-10-24 23:55:28] GPUs per node: 2
[2025-10-24 23:55:28] Start Time: Fri Oct 24 11:55:28 PM EDT 2025
[2025-10-24 23:55:28] Log file: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/logs/pipeline.log
[2025-10-24 23:55:28] ========================================
[2025-10-24 23:55:28] Sourcing secrets from: /scratch/zrs2020/LlamaFactoryHelper/secrets.env
[2025-10-24 23:55:30]
[2025-10-24 23:55:30] ========================================
[2025-10-24 23:55:30] Configuration Paths
[2025-10-24 23:55:30] ========================================
[2025-10-24 23:55:30] Train Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/configs/train_config.yaml
[2025-10-24 23:55:30] Merge Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/configs/merge_config.yaml
[2025-10-24 23:55:30] Dataset Info: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory/data/dataset_info.json
[2025-10-24 23:55:30] Output Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints
[2025-10-24 23:55:30] Export Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged
[2025-10-24 23:55:30] HF Repo ID: TAUR-dev/testing__pvv2_lora
[2025-10-24 23:55:30]
[make-effective-cfg] tokenized_path: /scratch/zrs2020/.cache/hf_cache/home/llamafactory/tokenized/TAUR_dev_D_SFT_C_ours_cd3arg_10responses_reflections10_formats_C_full_fb94f2a3
[make-effective-cfg] wrote: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/logs/train_config.effective.yaml
[2025-10-24 23:55:30]
[2025-10-24 23:55:30] ========================================
[2025-10-24 23:55:30] STAGE 0: Downloading Dataset
[2025-10-24 23:55:30] Dataset: TAUR-dev/D-SFT_C-ours_cd3arg_10responses_reflections10_formats-C_full
[2025-10-24 23:55:30] Start Time: Fri Oct 24 11:55:30 PM EDT 2025
[2025-10-24 23:55:30] ========================================
[dataset-download] Loading dataset from: TAUR-dev/D-SFT_C-ours_cd3arg_10responses_reflections10_formats-C_full
[dataset-download] Dataset loaded successfully
[dataset-download] Dataset info: DatasetDict({
train: Dataset({
features: ['conversations', 'sft_template_type_idx'],
num_rows: 29130
})
})
[2025-10-24 23:55:32]
[2025-10-24 23:55:32] ========================================
[2025-10-24 23:55:32] Dataset download completed
[2025-10-24 23:55:32] End Time: Fri Oct 24 11:55:32 PM EDT 2025
[2025-10-24 23:55:32] ========================================
[2025-10-24 23:55:32]
[2025-10-24 23:55:32] ========================================
[2025-10-24 23:55:32] STAGE 1: Training Model
[2025-10-24 23:55:32] Start Time: Fri Oct 24 11:55:32 PM EDT 2025
[2025-10-24 23:55:32] ========================================
[2025-10-24 23:55:32] Job: testing__pvv2_lora
[2025-10-24 23:55:32] Nodes: 1 | GPUs/node: 2
[2025-10-24 23:55:32] Master: 127.0.0.1:29500
[2025-10-24 23:55:32] LLaMA-Factory: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory
[2025-10-24 23:55:32] Train cfg (effective): /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/logs/train_config.effective.yaml
[2025-10-24 23:55:32] HF cache: /scratch/zrs2020/.cache/hf_cache/home/datasets
[2025-10-24 23:55:32] Launcher: torchrun
[2025-10-24 23:55:32]
[2025-10-24 23:55:32] Single-node training (2 GPU(s))
[2025-10-24 23:55:32] Executing command: llamafactory-cli train /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/logs/train_config.effective.yaml
/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
[INFO|2025-10-24 23:55:40] llamafactory.launcher:143 >> Initializing 2 distributed tasks at: 127.0.0.1:29500
W1024 23:55:41.864000 3022854 site-packages/torch/distributed/run.py:803]
W1024 23:55:41.864000 3022854 site-packages/torch/distributed/run.py:803] *****************************************
W1024 23:55:41.864000 3022854 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1024 23:55:41.864000 3022854 site-packages/torch/distributed/run.py:803] *****************************************
/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
[W1024 23:55:50.757874363 ProcessGroupNCCL.cpp:924] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator())
[W1024 23:55:50.757887679 ProcessGroupNCCL.cpp:924] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator())
[INFO|2025-10-24 23:55:50] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.
[INFO|2025-10-24 23:55:50] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 2, device: cuda:0, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-24 23:55:50] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 2, device: cuda:1, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,441 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/vocab.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/merges.txt
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer_config.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,442 >> loading file chat_template.jinja from cache at None
[INFO|tokenization_utils_base.py:2364] 2025-10-24 23:55:50,609 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:765] 2025-10-24 23:55:50,826 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
[INFO|configuration_utils.py:839] 2025-10-24 23:55:50,828 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/vocab.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/merges.txt
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer_config.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:55:50,899 >> loading file chat_template.jinja from cache at None
[INFO|tokenization_utils_base.py:2364] 2025-10-24 23:55:51,063 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING|2025-10-24 23:55:51] llamafactory.data.loader:148 >> Loading dataset from disk will ignore other data arguments.
[INFO|2025-10-24 23:55:51] llamafactory.data.loader:143 >> Loaded tokenized dataset from /scratch/zrs2020/.cache/hf_cache/home/llamafactory/tokenized/TAUR_dev_D_SFT_C_ours_cd3arg_10responses_reflections10_formats_C_full_fb94f2a3.
[INFO|configuration_utils.py:765] 2025-10-24 23:55:51,138 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
[INFO|configuration_utils.py:839] 2025-10-24 23:55:51,138 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|2025-10-24 23:55:51] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
[WARNING|logging.py:328] 2025-10-24 23:55:51,492 >> `torch_dtype` is deprecated! Use `dtype` instead!
[INFO|modeling_utils.py:1172] 2025-10-24 23:55:51,493 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/model.safetensors
[INFO|modeling_utils.py:2341] 2025-10-24 23:55:51,494 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:986] 2025-10-24 23:55:51,495 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645,
"use_cache": false
}
`torch_dtype` is deprecated! Use `dtype` instead!
[INFO|configuration_utils.py:941] 2025-10-24 23:55:52,421 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/generation_config.json
[INFO|configuration_utils.py:986] 2025-10-24 23:55:52,421 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.1,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8
}
[INFO|dynamic_module_utils.py:423] 2025-10-24 23:55:52,453 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-1.5B-Instruct.
[INFO|2025-10-24 23:55:52] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
[INFO|2025-10-24 23:55:52] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-10-24 23:55:52] llamafactory.model.adapter:143 >> Upcasting trainable params to float32.
[INFO|2025-10-24 23:55:52] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA
[INFO|2025-10-24 23:55:52] llamafactory.model.model_utils.misc:143 >> Found linear modules: o_proj,gate_proj,q_proj,down_proj,v_proj,k_proj,up_proj
[INFO|2025-10-24 23:55:52] llamafactory.model.loader:143 >> trainable params: 9,232,384 || all params: 1,552,946,688 || trainable%: 0.5945
[WARNING|trainer.py:906] 2025-10-24 23:55:52,738 >> The model is already on multiple devices. Skipping the move to device specified in `args`.
[INFO|trainer.py:699] 2025-10-24 23:55:52,740 >> max_steps is given, it will override any value given in num_train_epochs
[INFO|trainer.py:749] 2025-10-24 23:55:52,740 >> Using auto half precision backend
[WARNING|trainer.py:982] 2025-10-24 23:55:52,742 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
The model is already on multiple devices. Skipping the move to device specified in `args`.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
NCCL version 2.27.5+cuda12.9
[INFO|trainer.py:2519] 2025-10-24 23:55:53,120 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-10-24 23:55:53,120 >> Num examples = 29,130
[INFO|trainer.py:2521] 2025-10-24 23:55:53,120 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-10-24 23:55:53,120 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-10-24 23:55:53,120 >> Total train batch size (w. parallel, distributed & accumulation) = 2
[INFO|trainer.py:2526] 2025-10-24 23:55:53,120 >> Gradient Accumulation steps = 1
[INFO|trainer.py:2527] 2025-10-24 23:55:53,120 >> Total optimization steps = 10
[INFO|trainer.py:2528] 2025-10-24 23:55:53,122 >> Number of trainable parameters = 9,232,384
[INFO|integration_utils.py:867] 2025-10-24 23:55:53,220 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Currently logged in as: zsprague (ut_nlp_deduce) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.22.2
wandb: Run data is saved locally in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251024_235553-oqx8ngeo
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run testing__pvv2_lora
wandb: View project at https://wandb.ai/ut_nlp_deduce/llamafactory
wandb: View run at https://wandb.ai/ut_nlp_deduce/llamafactory/runs/oqx8ngeo
0%| | 0/10 [00:00<?, ?it/s] 10%| | 1/10 [00:01<00:10, 1.22s/it] 20%| | 2/10 [00:01<00:06, 1.27it/s] 30%| | 3/10 [00:02<00:04, 1.68it/s] 40%| | 4/10 [00:03<00:04, 1.37it/s] 50%| | 5/10 [00:03<00:03, 1.49it/s][INFO|trainer.py:4309] 2025-10-24 23:55:57,737 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-5
[INFO|configuration_utils.py:765] 2025-10-24 23:55:57,839 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
[INFO|configuration_utils.py:839] 2025-10-24 23:55:57,840 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|tokenization_utils_base.py:2421] 2025-10-24 23:55:58,067 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-5/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-10-24 23:55:58,072 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-5/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-10-24 23:55:58,076 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-5/special_tokens_map.json
60%| | 6/10 [00:05<00:04, 1.18s/it] 70%| | 7/10 [00:06<00:02, 1.11it/s] 80%| | 8/10 [00:06<00:01, 1.23it/s] 90%| | 9/10 [00:07<00:00, 1.45it/s]100%|| 10/10 [00:08<00:00, 1.23it/s] {'loss': 0.7188, 'grad_norm': 0.2177160233259201, 'learning_rate': 3.015368960704584e-08, 'epoch': 0.0}
100%|| 10/10 [00:08<00:00, 1.23it/s][INFO|trainer.py:4309] 2025-10-24 23:56:02,371 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10
[INFO|configuration_utils.py:765] 2025-10-24 23:56:02,490 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
[INFO|configuration_utils.py:839] 2025-10-24 23:56:02,491 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|tokenization_utils_base.py:2421] 2025-10-24 23:56:02,701 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-10-24 23:56:02,706 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-10-24 23:56:02,710 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10/special_tokens_map.json
[INFO|trainer.py:2810] 2025-10-24 23:56:03,258 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 10.137, 'train_samples_per_second': 1.973, 'train_steps_per_second': 0.986, 'train_loss': 0.718793535232544, 'epoch': 0.0}
100%|| 10/10 [00:09<00:00, 1.23it/s]100%|| 10/10 [00:09<00:00, 1.10it/s]
[INFO|trainer.py:4309] 2025-10-24 23:56:03,267 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints
[INFO|configuration_utils.py:765] 2025-10-24 23:56:03,356 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
[INFO|configuration_utils.py:839] 2025-10-24 23:56:03,357 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|tokenization_utils_base.py:2421] 2025-10-24 23:56:03,588 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-10-24 23:56:03,592 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-10-24 23:56:03,596 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/special_tokens_map.json
***** train metrics *****
epoch = 0.0007
total_flos = 414519GF
train_loss = 0.7188
train_runtime = 0:00:10.13
train_samples_per_second = 1.973
train_steps_per_second = 0.986
[INFO|modelcard.py:456] 2025-10-24 23:56:03,838 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
[W1024 23:56:04.029787829 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
[1;34mwandb[0m:
[1;34mwandb[0m: View run [33mtesting__pvv2_lora[0m at: [34m[0m
[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20251024_235553-oqx8ngeo/logs[0m
[W1024 23:56:05.730735839 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
[W1024 23:56:05.132682733 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
[W1024 23:56:05.555229777 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
[2025-10-24 23:56:06]
[2025-10-24 23:56:06] ========================================
[2025-10-24 23:56:06] Training completed successfully
[2025-10-24 23:56:06] End Time: Fri Oct 24 11:56:06 PM EDT 2025
[2025-10-24 23:56:06] ========================================
[2025-10-24 23:56:06]
[2025-10-24 23:56:06] ========================================
[2025-10-24 23:56:06] STAGE 2: Merging/Exporting Model
[2025-10-24 23:56:06] Start Time: Fri Oct 24 11:56:06 PM EDT 2025
[2025-10-24 23:56:06] ========================================
[2025-10-24 23:56:06] Looking for checkpoints in: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints
[2025-10-24 23:56:06] Analyzing checkpoints to find the one from current training run...
[2025-10-24 23:56:06] - checkpoint-10: trainer_state.json modified at Fri Oct 24 11:56:03 PM EDT 2025
[2025-10-24 23:56:06] - checkpoint-5: trainer_state.json modified at Fri Oct 24 11:55:58 PM EDT 2025
[2025-10-24 23:56:06]
[2025-10-24 23:56:06] Selected checkpoint: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10
[2025-10-24 23:56:06] This checkpoint has the most recently updated trainer_state.json
[2025-10-24 23:56:06] Checkpoint details:
[2025-10-24 23:56:06] Path: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10
[2025-10-24 23:56:06] Last modified: 2025-10-24 23:56:03.255712120 -0400
[2025-10-24 23:56:06] Training step: 10
[2025-10-24 23:56:06] Updating merge config to point to checkpoint...
Successfully updated merge config
[2025-10-24 23:56:06] Updated merge config to use: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10
[2025-10-24 23:56:06]
[2025-10-24 23:56:06] Merge config contents:
[2025-10-24 23:56:06] template: qwen
[2025-10-24 23:56:06] trust_remote_code: true
[2025-10-24 23:56:06] export_dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged
[2025-10-24 23:56:06] model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct
[2025-10-24 23:56:06] adapter_name_or_path: /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10
[2025-10-24 23:56:06]
[2025-10-24 23:56:06] Executing command: llamafactory-cli export /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/configs/merge_config.yaml
/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,985 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/vocab.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/merges.txt
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer_config.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:13,986 >> loading file chat_template.jinja from cache at None
[INFO|tokenization_utils_base.py:2364] 2025-10-24 23:56:14,157 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:765] 2025-10-24 23:56:14,372 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
[INFO|configuration_utils.py:839] 2025-10-24 23:56:14,374 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/vocab.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/merges.txt
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/tokenizer_config.json
[INFO|tokenization_utils_base.py:2095] 2025-10-24 23:56:14,443 >> loading file chat_template.jinja from cache at None
[INFO|tokenization_utils_base.py:2364] 2025-10-24 23:56:14,608 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:765] 2025-10-24 23:56:14,663 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
[INFO|configuration_utils.py:839] 2025-10-24 23:56:14,663 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[WARNING|logging.py:328] 2025-10-24 23:56:14,663 >> `torch_dtype` is deprecated! Use `dtype` instead!
[INFO|2025-10-24 23:56:14] llamafactory.model.model_utils.kv_cache:143 >> KV cache is enabled for faster generation.
[WARNING|logging.py:328] 2025-10-24 23:56:15,013 >> `torch_dtype` is deprecated! Use `dtype` instead!
[INFO|modeling_utils.py:1172] 2025-10-24 23:56:15,014 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/model.safetensors
[INFO|modeling_utils.py:2341] 2025-10-24 23:56:15,015 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:986] 2025-10-24 23:56:15,016 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}
[INFO|configuration_utils.py:941] 2025-10-24 23:56:15,118 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/generation_config.json
[INFO|configuration_utils.py:986] 2025-10-24 23:56:15,119 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.1,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8
}
[INFO|dynamic_module_utils.py:423] 2025-10-24 23:56:15,148 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-1.5B-Instruct.
[INFO|2025-10-24 23:56:15] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-10-24 23:56:17] llamafactory.model.adapter:143 >> Merged 1 adapter(s).
[INFO|2025-10-24 23:56:17] llamafactory.model.adapter:143 >> Loaded adapter(s): /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/checkpoints/checkpoint-10
[INFO|2025-10-24 23:56:17] llamafactory.model.loader:143 >> all params: 1,543,714,304
[INFO|2025-10-24 23:56:17] llamafactory.train.tuner:143 >> Convert model dtype to: torch.bfloat16.
[INFO|configuration_utils.py:491] 2025-10-24 23:56:17,909 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/config.json
[INFO|configuration_utils.py:757] 2025-10-24 23:56:17,914 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/generation_config.json
[INFO|modeling_utils.py:4181] 2025-10-24 23:56:21,705 >> Model weights saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/model.safetensors
[INFO|tokenization_utils_base.py:2421] 2025-10-24 23:56:21,725 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-10-24 23:56:21,745 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-10-24 23:56:21,765 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/special_tokens_map.json
[INFO|2025-10-24 23:56:21] llamafactory.train.tuner:143 >> Ollama modelfile saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/testing__pvv2_lora/merged/Modelfile
[2025-10-24 23:56:22]
[2025-10-24 23:56:22] ========================================
[2025-10-24 23:56:22] Merge/Export completed successfully
[2025-10-24 23:56:22] End Time: Fri Oct 24 11:56:22 PM EDT 2025
[2025-10-24 23:56:22] ========================================
[2025-10-24 23:56:22]
[2025-10-24 23:56:22] ========================================
[2025-10-24 23:56:22] Preparing Training Artifacts
[2025-10-24 23:56:22] ========================================
[2025-10-24 23:56:22] Copying configuration files...
[2025-10-24 23:56:22] Copying and cleaning training logs...