LLaMA 1B - Fine-tuned Language Model

This is a LLaMA model trained on FineWeb-Edu dataset with optimized learning rate and sequence length.

Model Details

  • Model Name: llama_1B_lr_4e-4_100bt
  • Architecture: LLaMA (Large Language Model Meta AI)
  • Parameters: ~1B parameters
  • Training Step: 340,000
  • Sequence Length: 4096
  • Vocabulary Size: 128256

Architecture Details

Model Configuration

  • Hidden Dimension: 2048
  • Number of Layers: 18
  • Number of Heads: 16
  • Head Dimension: None
  • KV Heads: None
  • Max Sequence Length: 4096
  • RoPE Theta: 10000.0
  • Norm Epsilon: 1e-05
  • FFN Dimension Multiplier: None
  • Weight Tying: False

Training Details

Data

  • Dataset: fineweb_edu_100bt_shuffled
  • Batch Size: 9
  • Tokenizer: tiktoken
  • Tokenizer Path: /fsx-pretraining/home/chunyyyy/blt/bytelatent/tokenizers/original/tokenizer.model
  • Add BOS Token: True
  • Add EOS Token: True

Optimization

  • Learning Rate: 0.0004
  • Weight Decay: 0.1
  • Scheduler: cosine
  • Warmup Steps: 5000

Distributed Training

  • Data Parallel Replicas: 8
  • Model Dtype: bf16
  • FSDP Type: full_shard

Usage

This model uses the LLaMA architecture and contains distributed model weights in PyTorch format. The checkpoint can be loaded using the PyTorch/transformers framework.

# Example loading code
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

# Note: This requires the specific LLaMA framework used for training
# The checkpoint is saved in distributed format and may need conversion

Evaluation Tasks

The model evaluation configuration:

  • Validation Steps: 1000
  • Validation Source: /fsx-pretraining/home/sllokega/intern_workspace/data/fineweb_edu_10bt_val
  • Generator Max Tokens: 4096
  • Temperature: 1.0
  • Top-p: 0.95

Training Configuration

The complete training configuration is preserved in the uploaded files.

Files Description

  • *.distcp: Distributed checkpoint files containing model weights
  • params.json: Model parameters and configuration
  • train_state_*.json: Training state information including optimizer states
  • config.yaml: Complete training configuration

Citation

If you use this model, please cite the LLaMA paper and the FineWeb-Edu dataset.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support