File size: 5,133 Bytes
58b196f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dcf0106
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58b196f
dcf0106
 
 
 
58b196f
54b6ea7
58b196f
 
54b6ea7
 
 
 
58b196f
 
54b6ea7
58b196f
 
 
54b6ea7
 
 
 
58b196f
 
 
54b6ea7
 
 
 
dcf0106
58b196f
dcf0106
58b196f
dcf0106
58b196f
54b6ea7
 
58b196f
 
54b6ea7
 
 
 
 
 
58b196f
 
54b6ea7
 
 
 
58b196f
 
54b6ea7
 
 
 
 
 
 
 
58b196f
 
54b6ea7
 
 
58b196f
54b6ea7
 
58b196f
 
54b6ea7
 
 
58b196f
 
54b6ea7
 
 
 
 
dcf0106
58b196f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
tags:
- text-to-image
- lora
- diffusers
- template:diffusion-lora
widget:
- text: >-
    flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke
    kaikai|hara id 21|yoneyama mai|fuzichoco],  1girl, hoshimachi suisei,
    virtual youtuber, blue hair, side ponytail, cowboy shot, black shirt, star
    print, off shoulder, outdoors, starry sky, wariza, looking up, half-closed
    eyes, black sky,  live2d animation, upper body, high quality cinematic video
    of a woman sitting under the starry night sky. The Camera is steady, This is
    a cowboy shot. The animation is smooth and fluid.
  parameters:
    negative_prompt: >-
      bad quality
      video,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
  output:
    url: images/ComfyUI_00455_.webp
- text: >-
    flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke
    kaikai|hara id 21|yoneyama mai|fuzichoco],  1girl, sakura miko, pink hair,
    cowboy shot, white shirt, floral print, off shoulder, outdoors, cherry
    blossom, tree shade, wariza, looking up, falling petals, half-closed eyes,
    white sky, clouds,  live2d animation, upper body, high quality cinematic
    video of a woman sitting under a sakura tree. Dreamy and lonely, the camera
    close-ups on the face of the woman as she turns towards the viewer. The
    Camera is steady, This is a cowboy shot. The animation is smooth and fluid.
  parameters:
    negative_prompt: >-
      bad quality
      video,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
  output:
    url: images/ComfyUI_00469_.webp
base_model: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
instance_prompt: flat color, no lineart
license: apache-2.0
---
# Flat Color - Style

<Gallery />

## Model description 

Flat Color - Style Trained on images without visible lineart, flat colors, and little to no indication of depth.

Reprinted from CivitAI: https:&#x2F;&#x2F;civitai.com&#x2F;models&#x2F;1132089?modelVersionId&#x3D;1525407

Text to Video previews generated with [ComfyUI_examples&#x2F;wan&#x2F;#text-to-video](https:&#x2F;&#x2F;comfyanonymous.github.io&#x2F;ComfyUI_examples&#x2F;wan&#x2F;#text-to-video)

Loading the LoRA with LoraLoaderModelOnly node and using the fp16 1.3B wan2.1_t2v_1.3B_fp16.safetensors

## Trigger words

You should use `flat color` to trigger the image generation.

You should use `no lineart` to trigger the image generation.


## Download model

Weights for this model are available in Safetensors format.

[Download](/motimalu/wan-flat-color-1.3b-v2/tree/main) them in the Files & versions tab.


## Training Config

Trained with [diffusion-pipe](https://github.com/tdrussell/diffusion-pipe)

### dataset.toml
```
# Resolution settings.
resolutions = [512]

# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7

# Frame buckets (1 is for images)
frame_buckets = [1]

[[directory]] # IMAGES
# Path to the directory containing images and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/images'
num_repeats = 5
resolutions = [720]
frame_buckets = [1] # Use 1 frame for images.

[[directory]] # VIDEOS
# Path to the directory containing videos and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/videos'
num_repeats = 5
resolutions = [512] # Set video resolution to 256 (e.g., 244p).
frame_buckets = [6, 28, 31, 32, 36, 42, 43, 48, 50, 53]
```

### config.toml

```
# Dataset config file.
output_dir = '/mnt/d/wan/training_output'
dataset = 'dataset.toml'

# Training settings
epochs = 50
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 4
gradient_clipping = 1.0
warmup_steps = 100

# eval settings
eval_every_n_epochs = 5
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1

# misc settings
save_every_n_epochs = 5
checkpoint_every_n_minutes = 30
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_middle'

[model]
type = 'wan'
ckpt_path = '../Wan2.1-T2V-1.3B'
dtype = 'bfloat16'
# You can use fp8 for the transformer when training LoRA.
transformer_dtype = 'float8'
timestep_sample_method = 'logit_normal'

[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'

[optimizer]
type = 'adamw_optimi'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.02
eps = 1e-8
```