VellumMini-0.1-Qwen3-14B
Just a sneak peek of what I'm cooking in a little project called Vellum. This model was made to evaluate the quality of the CreativeGPT dataset, and how well Qwen3 trains on it. This is just one of many datasets that the final model will be trained on (which will also be using a different base model).
This got pretty good results compared to the regular instruct in my testing so thought I would share. I trained for 3 epochs, but both checkpoints at 2 epoch and 3 epoch were too overbaked. This checkpoint, at 1 epoch performed best.
I'm pretty surprised how decent this came out since Qwen models aren't that great at writing, especially at this size.
Usage
Use with thinking/chain-of-thought disabled. Use ChatML prompt format.
Qwen suggested sampler settings are recommended.
Temperature: 0.7
Top_P: 0.8
Top_K: 20
Min_P: 0
Quants
GGUFs
iMatrix
These are reccommended.
- bartowski - https://huggingface.co/bartowski/lemon07r_VellumMini-0.1-Qwen3-14B-GGUF
- mradermacher - https://huggingface.co/mradermacher/VellumMini-0.1-Qwen3-14B-i1-GGUF
Static
- mradermacher - https://huggingface.co/mradermacher/VellumMini-0.1-Qwen3-14B-GGUF
- Q4_K_M Only - https://huggingface.co/lemon07r/VellumMini-0.1-Qwen3-14B-Q4_K_M-GGUF
Special Thanks
Big thanks to everyone over at the KoboldAI discord. The members there have helped me a ton with various things over the long while I've been there.
Training Details
Parent Model
https://huggingface.co/Qwen/Qwen3-14B
Training Method
Full fine-tune - SFT
Dataset(s)
https://huggingface.co/datasets/N8Programs/CreativeGPT
Training Hyperparameters
Batch size
4
Learning rate
0.00001
Number of epochs
3
Warmup ratio
0.05
Weight decay
0.02
Max gradient norm
1
Packing
No
Training Results
- Downloads last month
- 27
