Comma v0.1-2T - GGUF
This is a GGUF conversion of common-pile/comma-v0.1-2t for use with llama.cpp and Ollama.
Model Details
Original Model: Comma v0.1-2T Architecture: Llama 3 (7B parameters) Training: 2 trillion tokens from the Common Pile v0.1 dataset License: Apache 2.0 Converted by: Community conversion
About Comma v0.1
Comma v0.1-2T is a 7 billion parameter language model trained exclusively on openly licensed and public domain text from the Common Pile v0.1 dataset. This model demonstrates that competitive performance can be achieved using only ethically sourced training data.
Performance is competitive with Llama 2 7B, OLMo, and DeepSeekLLM on knowledge-intensive and coding benchmarks.
GGUF Conversion Details
- Format: GGUF
- Quantization: F16 (non-quantized, full precision)
- Size: 14GB
- Converted with: llama.cpp (with custom patch for Comma tokenizer)
- File:
comma-v0.1-2t-f16.gguf
Conversion Notes
This conversion required patching llama.cpp to recognize Comma v0.1's tokenizer format. The tokenizer uses Llama 3 style BPE with a unique checksum. The patch has been included in this repository for others who want to convert similar models.
Usage with Ollama
1. Download the GGUF file
Download from the Files tab above, or use the command line:
huggingface-cli download jadael/comma-v0.1-2t-GGUF comma-v0.1-2t-f16.gguf --local-dir .
2. Create a Modelfile
FROM ./comma-v0.1-2t-f16.gguf
TEMPLATE """{{ .Prompt }}"""
PARAMETER stop "<|end_of_text|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
Note: This is a base model (not instruction-tuned), so it will continue/complete text rather than follow chat instructions.
3. Import the model
ollama create comma-v0.1-2t -f Modelfile
4. Run the model
ollama run comma-v0.1-2t
Example usage (text completion):
>>> Once upon a time in a land far away
[Model will continue the story...]
Usage with llama.cpp
./llama-cli -m comma-v0.1-2t-f16.gguf -p "Your prompt here" -n 128
Important Notes
- This is a base model (not instruction-tuned)
- Trained on Common Pile v0.1: openly licensed and public domain text only
- Suitable for further fine-tuning or use as a foundation model
- For chat/instruct capabilities, fine-tuning is recommended
Citation
If you use this model, please cite the original Common Pile work:
@article{commonpile2025,
title={The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text},
author={[Original authors from Common Pile team]},
year={2025}
}
Links
- Original Model: https://huggingface.co/common-pile/comma-v0.1-2t
- Common Pile Dataset: https://huggingface.co/common-pile
- Project Blog: https://blog.eleuther.ai/common-pile/
- Ollama: https://ollama.com
- llama.cpp: https://github.com/ggerganov/llama.cpp
License
This GGUF conversion is released under Apache 2.0 (same as the original model).
Apache 2.0 was chosen because:
- It matches the original Comma v0.1-2T model license
- It's compatible with llama.cpp's MIT license
- It provides patent grants and protections
- It requires attribution and license preservation
- It's the most protective license compatible with all components
All conversions, tools, and documentation in this repository are Apache 2.0 licensed.
- Downloads last month
- 37
16-bit
Model tree for jadael/comma-v0.1-2t-GGUF
Base model
common-pile/comma-v0.1-2t