EpstAIn-V2: A "From-Scratch" Transformer

This repository contains an experimental, decoder-only transformer model trained from scratch for educational purposes.

Model Description

This model, "EpstAIn-V2", is a ~28 million parameter MinimalGPT trained on a custom dataset of political files. The primary goal of this project was not to create a state-of-the-art model, but to serve as a hands-on guide for building, training, and running a transformer model from the ground up with limited resources.

The model was trained for approximately 1.77 hours on a consumer-grade GPU.

Model Details

  • Architecture: MinimalGPT, a custom decoder-only transformer based on nanoGPT/minGPT principles. The code for the architecture is in model.py.
  • Parameters: ~28M
  • Configuration:
    • Layers: 8
    • Attention Heads: 8
    • Embedding Dimension: 512
    • Context Length: 256
    • Vocab Size: 5000
  • Training Data: The model was trained on a 100MB CSV file containing various documents and communications. The tokenizer (epstain_tokenizer.json) was also trained from scratch on this data.

How to Use

The model was not built for integration with the standard transformers AutoModel pipeline, as it uses custom class definitions. To use this model, you must have the model.py file in your working directory.

import torch
from model import MinimalGPT, MVTConfig
from tokenizers import Tokenizer

# --- 1. Load Configuration and Tokenizer ---
# The config must match the trained model's architecture
config = MVTConfig()
config.n_layer = 8
config.n_head = 8
config.n_embd = 512
config.vocab_size = 5000 # This should match your tokenizer

tokenizer = Tokenizer.from_file("epstain_tokenizer.json")
device = 'cuda' if torch.cuda.is_available() else 'cpu'


# --- 2. Initialize Model and Load Weights ---
model = MinimalGPT(config).to(device)
checkpoint = torch.load("EpstAIn_V2.pt", map_location=device)

# Note: This checkpoint was saved from our custom training script.
# It might contain more than just the model state, so we extract the 'model_state_dict'.
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# --- 3. Generate Text ---
# (See generate.py for a full text generation loop example)
prompt = "The future of this policy is"
print(f"Model loaded and ready for generation. Example prompt: '{prompt}'")

Limitations & Bias

This is an experimental model and is not suitable for any real-world application.

  • Incoherent Output: The model's output is largely incoherent and consists of "gibberish" text. It has learned the style and vocabulary of the training data but has not learned grammar, semantics, or reasoning.
  • Bias: The model was trained on a specific and potentially sensitive dataset. It will inevitably reflect the biases, viewpoints, and content present in that data.
  • No Safety Mechanisms: This model has no built-in safety guardrails or content filters.
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support