LaaLM-exp-v1: Linux as a Language Model (Experimental v1)

A 3B parameter conversational AI that emulates a Linux terminal through pure language model inference. LaaLM-exp-v1 learns to maintain filesystem state internally through conversation context, without any external state management.

Key Features

Persistent State Tracking - Remembers files, directories, and content across the conversation
12 Linux Commands - pwd, ls, echo, touch, cat, mkdir, cd, rm, mv, cp, echo >, grep
File Content Support - Write and read actual file contents with redirection
Error Handling - Proper bash error messages for invalid operations
No External State - Pure conversation-based memory, no simulators required
95.4% Benchmark Accuracy - Tested on 130 diverse scenarios

Performance

Overall Accuracy: 95.4% (124/130 tests passed)

Category	Accuracy	Passed/Total
Basic Commands	100%	20/20
File Creation	100%	20/20
File Operations	100%	30/30
File Content	100%	20/20
Error Handling	75%	15/20
Persistence	95%	19/20

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "LaaLM/LaaLM-exp-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "LaaLM/LaaLM-exp-v1",
    fix_mistral_regex=True  # Important for proper tokenization
)
model.eval()

Understanding the System Prompt

The system prompt is critical for LaaLM to function correctly. It establishes the initial filesystem state that the model will track throughout the conversation.

Required format:

conversation = [
    {
        "role": "system",
        "content": """You are a Linux terminal emulator. Initial state:
Current directory: /home/user
Files: (empty)
Environment: USER=user, HOME=/home/user"""
    }
]

Key components:

Identity declaration - "You are a Linux terminal emulator"
Current directory - Starting working directory (typically /home/user)
Initial files - List files or state "(empty)" for clean start
Environment variables - USER and HOME at minimum

Important: The system prompt is only set once at the start of the conversation. Do not update it with current state - the model learns to track state changes from the command history.

Example with existing files:

conversation = [
    {
        "role": "system",
        "content": """You are a Linux terminal emulator. Initial state:
Current directory: /home/user
Files: existing_file.txt
Environment: USER=user, HOME=/home/user"""
    }
]

Running Commands

def run_command(cmd):
    # Add user command
    conversation.append({"role": "user", "content": cmd})
    
    # Format prompt
    prompt = tokenizer.apply_chat_template(
        conversation,
        tokenize=False,
        add_generation_prompt=True
    )
    
    # Tokenize and generate
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=150,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id
        )
    
    # Decode response
    response = tokenizer.decode(
        outputs[0][inputs.input_ids.shape[1]:],
        skip_special_tokens=True
    ).strip()
    
    # Add to conversation history
    conversation.append({"role": "assistant", "content": response})
    return response

# Example session
print(run_command("pwd"))                    # /home/user
print(run_command("touch test.txt"))         # (empty)
print(run_command("ls"))                     # test.txt
print(run_command("echo hello > test.txt"))  # (empty)
print(run_command("cat test.txt"))           # hello
print(run_command("cp test.txt backup.txt")) # (empty)
print(run_command("ls"))                     # backup.txt test.txt
print(run_command("rm test.txt"))            # (empty)
print(run_command("ls"))                     # backup.txt

Quantized Versions

GGUF quantizations are available for CPU inference and lower memory usage:

LaaLM-exp-v1-GGUF

Includes Q2_K through fp16 quantizations (1.27GB - 6.18GB) for use with:

llama.cpp
Ollama
llama-cpp-python
Other GGUF-compatible tools

Recommended: Q4_K_M (1.93GB) for best quality/size balance.

Supported Commands

Command	Description	Example
`pwd`	Print working directory	`pwd`
`ls`	List files in current directory	`ls`
`echo`	Print text to stdout	`echo hello world`
`touch`	Create empty file	`touch file.txt`
`cat`	Display file contents	`cat file.txt`
`mkdir`	Create directory	`mkdir mydir`
`cd`	Change directory	`cd mydir`
`rm`	Remove file	`rm file.txt`
`mv`	Move or rename file	`mv old.txt new.txt`
`cp`	Copy file	`cp source.txt dest.txt`
`echo >`	Write content to file	`echo text > file.txt`
`grep`	Search pattern in file	`grep word file.txt`

Technical Details

Training Configuration

Base Model: Qwen/Qwen2.5-3B-Instruct
Training Data: 10,000 synthetic conversations (800k messages)
Commands per conversation: 30-50
Training Method: Full fine-tuning (no LoRA, no quantization)
Precision: BF16 with Flash Attention 2
Hardware: A100 80GB PCIe
Training Time: 34 minutes
Cost: $0.68
Max Sequence Length: 640 tokens
Optimizer: AdamW (lr=2e-5, weight_decay=0.01)
Batch Size: 8 per device, gradient accumulation 4 (effective batch size 32)
Epochs: 3

Data Generation

Training data was synthetically generated using a simulated Linux environment with:

Random filenames with realistic character patterns
Diverse command sequences with proper state tracking
Error cases including non-existent files and invalid commands
Multi-step operations requiring memory across turns
File content persistence and modification tracking

Architecture Approach

Unlike traditional terminal emulators that use external state management, LaaLM-exp-v1 learns to track filesystem state entirely through conversation context. The model:

Receives initial state via system prompt
Maintains full command history in conversation
Infers current filesystem state from past commands
Generates outputs based on learned state transitions

This demonstrates that language models can learn complex stateful behaviors through sequence modeling alone, without explicit memory mechanisms.

Benchmark Methodology

The model was evaluated on 130 automatically generated test cases across 6 categories:

Basic Commands (20 tests): pwd, ls, echo with various inputs
File Creation (20 tests): touch and echo > operations
File Operations (30 tests): rm, mv, cp with state tracking validation
File Content (20 tests): cat and grep on files with actual content
Error Handling (20 tests): Invalid commands and missing file scenarios
Persistence (20 tests): Multi-step sequences requiring memory retention

Each test consists of:

Setup commands to establish state
Test command to execute
Expected output comparison
Pass/fail determination

Limitations

Command Support

Limited to 12 commands - advanced utilities not yet supported
No pipe operators, command chaining, or complex redirects
No scripting features (variables, loops, conditionals)

Known Issues

cp command occasionally fails to copy file content (structure only)
rm on non-existent files sometimes returns empty instead of error
Long conversations (50+ commands) may experience state degradation
Very long filenames (>30 characters) can cause parsing issues

Scope

Terminal emulation only - no actual system calls or execution
Requires full conversation history for proper state tracking
Context window limits maximum conversation length

Model Lineage

Part of the LaaLM (Linux as a Language Model) project:

LaaLM-v1 - State-based approach with external filesystem tracking (T5-base, 80k examples)
LaaLM-exp-v1 - Conversation-based approach with internal state tracking (Qwen 3B, 800k messages) (current)
LaaLM-v2 - Planned with bash scripting, pipes, and expanded command set

Key Innovation

This model demonstrates that language models can maintain complex system state through conversation history alone. The approach enables:

Neural system components without explicit state machines
Learned program execution through pattern recognition
Conversational interfaces for system control
Research into emergent state tracking in transformers

Use Cases

Education - Interactive Linux command learning
Prototyping - Shell script validation without execution
AI Agents - Foundation for conversational system interfaces
Research - Studying state tracking emergence in language models
Accessibility - Natural language terminal interaction

Inference Recommendations

Always initialize with proper system prompt format
Set fix_mistral_regex=True when loading tokenizer
Use greedy decoding (do_sample=False) for deterministic outputs
Maintain full conversation context throughout session
Limit max_new_tokens to ~150 for efficiency
Do not modify system prompt after initialization

License

Apache 2.0 (inherited from Qwen 2.5 base model)

Acknowledgments

Built on Qwen 2.5-3B-Instruct by the Qwen team. Part of the LaaLM project exploring neural terminal emulation.

Related Models

LaaLM-v1 (state-based approach)

Future Development

LaaLM-v2 with expanded command set and bash scripting support

Downloads last month: 41

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for LaaLM/LaaLM-exp-v1

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

(924)

this model

Quantizations

3 models