File size: 2,992 Bytes
92b0f7e
 
428d55f
 
92b0f7e
 
428d55f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
library_name: diffusers
tags:
- modular_diffusers
---

# Modular ChronoEdit

Modular implementation of [`nvidia/ChronoEdit-14B-Diffusers`](https://hf.co/nvidia/ChronoEdit-14B-Diffusers).

## Code

<details>
<summary>Unfold</summary>

```py
"""
Mimicked from https://huggingface.co/spaces/nvidia/ChronoEdit/blob/main/app.py
"""

from diffusers.modular_pipelines import WanModularPipeline, ModularPipelineBlocks
from diffusers.utils import load_image
from diffusers import UniPCMultistepScheduler
import torch
from PIL import Image

repo_id = "diffusers-internal-dev/chronoedit-modular"
blocks = ModularPipelineBlocks.from_pretrained(repo_id, trust_remote_code=True)
pipe = WanModularPipeline(blocks, repo_id)
pipe.load_components(
    trust_remote_code=True,
    device_map="cuda",
    torch_dtype={"default": torch.bfloat16, "image_encoder": torch.float32},
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=2.0)
pipe.load_lora_weights("nvidia/ChronoEdit-14B-Diffusers", weight_name="lora/chronoedit_distill_lora.safetensors")
pipe.fuse_lora(lora_scale=1.0)

image = load_image("https://huggingface.co/spaces/nvidia/ChronoEdit/resolve/main/examples/3.png")
prompt = "Transform the image so that inside the floral teacup of steaming tea, a small, cute mouse is sitting and taking a bath; the mouse should look relaxed and cheerful, with a tiny white bath towel draped over its head as if enjoying a spa moment, while the steam rises gently around it, blending seamlessly with the warm and cozy atmosphere."

# image is resized within the pipeline unlike https://huggingface.co/spaces/nvidia/ChronoEdit/blob/main/app.py#L151
# refer to `ChronoEditImageInputStep`.
out = pipe(
    image=image,
    prompt=prompt,  # todo: enhance prompt
    num_inference_steps=8,  # todo: implement temporal reasoning
    num_frames=5,  # https://huggingface.co/spaces/nvidia/ChronoEdit/blob/main/app.py#L152
    output_type="np",
    generator=torch.manual_seed(0),
)
frames = out.values["videos"][0]
Image.fromarray((frames[-1] * 255).clip(0, 255).astype("uint8")).save("demo.png")
```

</details>

You can find it [here](./example.py) too.

> [!TIP]
> Make sure `diffusers` is installed from source: `pip install git+https://github.com/huggingface/diffusers`.

## Results

<table>
  <tr>
    <td><img src="https://huggingface.co/spaces/nvidia/ChronoEdit/resolve/main/examples/3.png" alt="First Image"></td>
    <td><img src="./demo.png" alt="Edited Image"></td>
  </tr>
  <caption><i>Transform the image so that inside the floral teacup of steaming tea, a small, cute mouse is sitting and taking a bath; the mouse should look relaxed and cheerful, with a tiny white bath towel draped over its head as if enjoying a spa moment, while the steam rises gently around it, blending seamlessly with the warm and cozy atmosphere</i>.</caption>
</table>

## Notes

1. This implementation doesn't have temporal reasoning.
2. This doesn't use a separate prompt enhancer model.