VellumMini-0.1-Qwen3-14B

Just a sneak peek of what I'm cooking in a little project called Vellum. This model was made to evaluate the quality of the CreativeGPT dataset, and how well Qwen3 trains on it. This is just one of many datasets that the final model will be trained on (which will also be using a different base model).

This got pretty good results compared to the regular instruct in my testing so thought I would share. I trained for 3 epochs, but both checkpoints at 2 epoch and 3 epoch were too overbaked. This checkpoint, at 1 epoch performed best.

I'm pretty surprised how decent this came out since Qwen models aren't that great at writing, especially at this size.

Usage

Use with thinking/chain-of-thought disabled. Use ChatML prompt format.

Qwen suggested sampler settings are recommended.

Temperature: 0.7

Top_P: 0.8

Top_K: 20

Min_P: 0

Quants

GGUFs

iMatrix

These are reccommended.

bartowski - https://huggingface.co/bartowski/lemon07r_VellumMini-0.1-Qwen3-14B-GGUF
mradermacher - https://huggingface.co/mradermacher/VellumMini-0.1-Qwen3-14B-i1-GGUF

Static

mradermacher - https://huggingface.co/mradermacher/VellumMini-0.1-Qwen3-14B-GGUF
Q4_K_M Only - https://huggingface.co/lemon07r/VellumMini-0.1-Qwen3-14B-Q4_K_M-GGUF

Special Thanks

Big thanks to everyone over at the KoboldAI discord. The members there have helped me a ton with various things over the long while I've been there.

Training Details

Parent Model

https://huggingface.co/Qwen/Qwen3-14B

Training Method

Full fine-tune - SFT

Dataset(s)

https://huggingface.co/datasets/N8Programs/CreativeGPT

Training Hyperparameters

Batch size
4

Learning rate
0.00001

Number of epochs
3

Warmup ratio
0.05

Weight decay
0.02

Max gradient norm
1

Packing
No

Training Results

Downloads last month: 27

Safetensors

Model size

15B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lemon07r/VellumMini-0.1-Qwen3-14B

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B