LLaMA 1B - Fine-tuned Language Model
This is a LLaMA model trained on FineWeb-Edu dataset with optimized learning rate and sequence length.
Model Details
- Model Name: llama_1B_lr_4e-4_100bt
 - Architecture: LLaMA (Large Language Model Meta AI)
 - Parameters: ~1B parameters
 - Training Step: 340,000
 - Sequence Length: 4096
 - Vocabulary Size: 128256
 
Architecture Details
Model Configuration
- Hidden Dimension: 2048
 - Number of Layers: 18
 - Number of Heads: 16
 - Head Dimension: None
 - KV Heads: None
 - Max Sequence Length: 4096
 - RoPE Theta: 10000.0
 - Norm Epsilon: 1e-05
 - FFN Dimension Multiplier: None
 - Weight Tying: False
 
Training Details
Data
- Dataset: fineweb_edu_100bt_shuffled
 - Batch Size: 9
 - Tokenizer: tiktoken
 - Tokenizer Path: /fsx-pretraining/home/chunyyyy/blt/bytelatent/tokenizers/original/tokenizer.model
 - Add BOS Token: True
 - Add EOS Token: True
 
Optimization
- Learning Rate: 0.0004
 - Weight Decay: 0.1
 - Scheduler: cosine
 - Warmup Steps: 5000
 
Distributed Training
- Data Parallel Replicas: 8
 - Model Dtype: bf16
 - FSDP Type: full_shard
 
Usage
This model uses the LLaMA architecture and contains distributed model weights in PyTorch format. The checkpoint can be loaded using the PyTorch/transformers framework.
# Example loading code
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
# Note: This requires the specific LLaMA framework used for training
# The checkpoint is saved in distributed format and may need conversion
Evaluation Tasks
The model evaluation configuration:
- Validation Steps: 1000
 - Validation Source: /fsx-pretraining/home/sllokega/intern_workspace/data/fineweb_edu_10bt_val
 - Generator Max Tokens: 4096
 - Temperature: 1.0
 - Top-p: 0.95
 
Training Configuration
The complete training configuration is preserved in the uploaded files.
Files Description
*.distcp: Distributed checkpoint files containing model weightsparams.json: Model parameters and configurationtrain_state_*.json: Training state information including optimizer statesconfig.yaml: Complete training configuration
Citation
If you use this model, please cite the LLaMA paper and the FineWeb-Edu dataset.
- Downloads last month
 - -