Comma v0.1-2T - GGUF

This is a GGUF conversion of common-pile/comma-v0.1-2t for use with llama.cpp and Ollama.

Model Details

Original Model: Comma v0.1-2T Architecture: Llama 3 (7B parameters) Training: 2 trillion tokens from the Common Pile v0.1 dataset License: Apache 2.0 Converted by: Community conversion

About Comma v0.1

Comma v0.1-2T is a 7 billion parameter language model trained exclusively on openly licensed and public domain text from the Common Pile v0.1 dataset. This model demonstrates that competitive performance can be achieved using only ethically sourced training data.

Performance is competitive with Llama 2 7B, OLMo, and DeepSeekLLM on knowledge-intensive and coding benchmarks.

GGUF Conversion Details

Format: GGUF
Quantization: F16 (non-quantized, full precision)
Size: 14GB
Converted with: llama.cpp (with custom patch for Comma tokenizer)
File: comma-v0.1-2t-f16.gguf

Conversion Notes

This conversion required patching llama.cpp to recognize Comma v0.1's tokenizer format. The tokenizer uses Llama 3 style BPE with a unique checksum. The patch has been included in this repository for others who want to convert similar models.

Usage with Ollama

1. Download the GGUF file

Download from the Files tab above, or use the command line:

huggingface-cli download jadael/comma-v0.1-2t-GGUF comma-v0.1-2t-f16.gguf --local-dir .

2. Create a Modelfile

FROM ./comma-v0.1-2t-f16.gguf

TEMPLATE """{{ .Prompt }}"""

PARAMETER stop "<|end_of_text|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1

Note: This is a base model (not instruction-tuned), so it will continue/complete text rather than follow chat instructions.

3. Import the model

ollama create comma-v0.1-2t -f Modelfile

4. Run the model

ollama run comma-v0.1-2t

Example usage (text completion):

>>> Once upon a time in a land far away
[Model will continue the story...]

Usage with llama.cpp

./llama-cli -m comma-v0.1-2t-f16.gguf -p "Your prompt here" -n 128

Important Notes

This is a base model (not instruction-tuned)
Trained on Common Pile v0.1: openly licensed and public domain text only
Suitable for further fine-tuning or use as a foundation model
For chat/instruct capabilities, fine-tuning is recommended

Citation

If you use this model, please cite the original Common Pile work:

@article{commonpile2025,
  title={The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text},
  author={[Original authors from Common Pile team]},
  year={2025}
}

License

This GGUF conversion is released under Apache 2.0 (same as the original model).

Apache 2.0 was chosen because:

It matches the original Comma v0.1-2T model license
It's compatible with llama.cpp's MIT license
It provides patent grants and protections
It requires attribution and license preservation
It's the most protective license compatible with all components

All conversions, tools, and documentation in this repository are Apache 2.0 licensed.

Downloads last month: 37

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

16-bit

Model tree for jadael/comma-v0.1-2t-GGUF

Base model

common-pile/comma-v0.1-2t

Quantized

(3)

this model

jadael
/

comma-v0.1-2t-GGUF