|
|
<!DOCTYPE html> |
|
|
<html lang="en"> |
|
|
<head> |
|
|
<meta charset="UTF-8"> |
|
|
<title>Animate3D: Animating Any 3D Model with Multi-view Video Diffusion</title> |
|
|
|
|
|
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300&display=swap" rel="stylesheet"> |
|
|
<link rel="stylesheet" href="styles/styles.css"> |
|
|
|
|
|
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css"> |
|
|
</head> |
|
|
<body> |
|
|
<div class="container"> |
|
|
|
|
|
<section id="title"> |
|
|
<video id="background-video" autoplay muted loop> |
|
|
<source src="videos/bg_video/bg.mp4" type="video/mp4"> |
|
|
您的浏览器不支持 HTML5 视频。 |
|
|
</video> |
|
|
<div class="title-content"> |
|
|
<h1>Animate3D: Animating Any 3D Model with Multi-view Video Diffusion</h1> |
|
|
<p class="authors"> |
|
|
<span class="author-name">Yanqin Jiang<sup>1*</sup>,</span> |
|
|
<span class="author-name">Chaohui Yu,<sup>2*</sup></span> |
|
|
<span class="author-name">Chenjie Cao<sup>2</sup>,</span> |
|
|
<span class="author-name">Fan Wang<sup>2</sup>,</span> |
|
|
<span class="author-name">Weiming Hu<sup>1</sup>,</span> |
|
|
<span class="author-name">Jin Gao<sup>1</sup></span> |
|
|
</p> |
|
|
<p class="institute"> <sup>1</sup>CASIA<br> |
|
|
<sup>2</sup>DAMO Academy, Alibaba Group</p> |
|
|
<p class="accept">NeurIPS 2024</p> |
|
|
<div class="links"> |
|
|
<a href="https://arxiv.org/abs/2407.11398" target="_blank"> |
|
|
<i class="fas fa-file-pdf icon"></i> Arxiv |
|
|
</a> |
|
|
<a href="https://youtu.be/qkaeeGzLnY8" target="_blank"> |
|
|
<i class="fas fa-video icon"></i> Video |
|
|
</a> |
|
|
<a href="https://huggingface.co/datasets/yanqinJiang/MV-Video" target="_blank"> |
|
|
<i class="fas fa-database icon"></i> Data |
|
|
</a> |
|
|
<a href="https://github.com/yanqinJiang/Animate3D" target="_blank"> |
|
|
<i class="fab fa-github icon"></i> Code |
|
|
</a> |
|
|
</div> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="abstract"> |
|
|
<h2>Abstract</h2> |
|
|
<div class="content-center-abstract"> |
|
|
<p> |
|
|
Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. |
|
|
It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. |
|
|
In this work, we present Animate3D, a novel framework for animating any static 3D model. |
|
|
The core idea is two-fold: |
|
|
1) We propose a novel multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, which is trained on our presented large-scale multi-view video dataset (MV-Video). |
|
|
2) Based on MV-VDM, we introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects. |
|
|
Specifically, for MV-VDM, we design a new spatiotemporal attention module to enhance spatial and temporal consistency by integrating 3D and video diffusion models. |
|
|
Additionally, we leverage the static 3D model's multi-view renderings as conditions to preserve its identity. |
|
|
For animating 3D models, an effective two-stage pipeline is proposed: we first reconstruct motions directly from generated multi-view videos, followed by the introduced 4D-SDS to refine both appearance and motion. |
|
|
Benefiting from accurate motion learning, we could achieve straightforward mesh animation. |
|
|
Qualitative and quantitative experiments demonstrate that Animate3D significantly outperforms previous approaches. |
|
|
Data, code, and models will be open-released. |
|
|
</p> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="youtube-video"> |
|
|
<h2>Video</h2> |
|
|
<div class="content-center"> |
|
|
<p>The video is best viewed in <span class="highlight"> 4K mode</span>.</p> |
|
|
</div> |
|
|
<div class="video-container"> |
|
|
<div class="video-wrapper-16-9"> |
|
|
<iframe |
|
|
src="https://www.youtube.com/embed/qkaeeGzLnY8?si=ahBAiCBjfeKLeptj" |
|
|
title="YouTube video player" |
|
|
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" |
|
|
referrerpolicy="strict-origin-when-cross-origin" |
|
|
allowfullscreen> |
|
|
</iframe> |
|
|
</div> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="animate-generated"> |
|
|
<h2>Animate Generated 3D Mesh</h2> |
|
|
<div class="content-center"> |
|
|
<p>We animate <span class="highlight"> 6 mesh assets</span>. Models are generated using commerical 3D generation tools (<span class="highlight"><a href="https://hyperhuman.deemos.com/rodin">Rodin Gen-1</a></span>, <span class="highlight"><a href="https://www.meshy.ai/">Meshy</a></span>, <span class="highlight"><a href="https://www.tripo3d.ai/">Tripo3D</a></span>). |
|
|
Each model is with <span class="highlight">multiple animations</span>, and you can <span class="highlight">switch between different animations by clicking the thumbnails below the video</span>. |
|
|
Click the thumbnail above the video to see the input 3D model. When you hover your mouse over the video, a <span class="highlight">full screen button</span> button will appear in the bottom right corner. |
|
|
Click it to watch the video in 2048×1024; resolution.</p> |
|
|
</div> |
|
|
<div class="video-container"> |
|
|
<img id="reference-thumbnail-group4" class="thumbnail" onclick="playReference('group4')"> |
|
|
<div class="video-wrapper-other"> |
|
|
<div class="controls"> |
|
|
<div class="button left" onclick="switchGroup(-1, 'group4')"></div> |
|
|
<div class="button right" onclick="switchGroup(1, 'group4')"></div> |
|
|
</div> |
|
|
<video id="main-video-group4" autoplay muted controls preload="auto" loop> |
|
|
<source id="main-video-source-group4" src="index.html" type="video/mp4"> |
|
|
您的浏览器不支持播放此视频。 |
|
|
</video> |
|
|
</div> |
|
|
<div class="video-thumbnails" id="thumbnails-group4"> |
|
|
|
|
|
</div> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="animate-reconstructed"> |
|
|
<h2>Animate Reconstructed 3D Model</h2> |
|
|
<div class="content-center"> |
|
|
<p>We animate <span class="highlight"> 40 reconstructed 3D models</span>. Some models have more than one animation results, and you can <span class="highlight">switch between different animations by clicking the thumbnails below the video</span>. |
|
|
Click the thumbnail above the video to see the input 3D model. When you hover your mouse over the video, a <span class="highlight">full screen button</span> button will appear in the bottom right corner. |
|
|
Click it to watch the video in 1024 resolution.</p> |
|
|
</div> |
|
|
<div class="video-container"> |
|
|
<img id="reference-thumbnail-group1" class="thumbnail" onclick="playReference('group1')"> |
|
|
<div class="video-wrapper-1-1"> |
|
|
<div class="controls"> |
|
|
<div class="button left" onclick="switchGroup(-1, 'group1')"></div> |
|
|
<div class="button right" onclick="switchGroup(1, 'group1')"></div> |
|
|
</div> |
|
|
<video id="main-video-group1" autoplay muted controls preload="auto" loop> |
|
|
<source id="main-video-source-group1" src="index.html" type="video/mp4"> |
|
|
您的浏览器不支持播放此视频。 |
|
|
</video> |
|
|
</div> |
|
|
<div class="video-thumbnails" id="thumbnails-group1"> |
|
|
|
|
|
</div> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="animate-real-world"> |
|
|
<h2>Animate Real-world 3D Scan</h2> |
|
|
<div class="content-center"> |
|
|
<p>We animate <span class="highlight"> 10 real-world 3D scans</span>. Some model have more than one animation results, and you can <span class="highlight">switch between different animations by clicking the thumbnails below the video</span>. |
|
|
Click the thumbnail above the video to see the input 3D model. When you hover your mouse over the video, a <span class="highlight">full screen button</span> button will appear in the bottom right corner. |
|
|
Click it to watch the video in 1024 resolution.</p> |
|
|
</div> |
|
|
<div class="video-container"> |
|
|
<img id="reference-thumbnail-group2" class="thumbnail" onclick="playReference('group2')"> |
|
|
<div class="video-wrapper-1-1"> |
|
|
<div class="controls"> |
|
|
<div class="button left" onclick="switchGroup(-1, 'group2')"></div> |
|
|
<div class="button right" onclick="switchGroup(1, 'group2')"></div> |
|
|
</div> |
|
|
<video id="main-video-group2" autoplay muted controls preload="auto" loop> |
|
|
<source id="main-video-source-group2" src="index.html" type="video/mp4"> |
|
|
您的浏览器不支持播放此视频。 |
|
|
</video> |
|
|
</div> |
|
|
<div class="video-thumbnails" id="thumbnails-group2"> |
|
|
|
|
|
</div> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="animate-generated"> |
|
|
<h2>Animate Generated 3D Model</h2> |
|
|
<div class="content-center"> |
|
|
<p>We animate <span class="highlight"> 10 generated models</span>. |
|
|
Models are generated using commerical 3D generation tools (<span class="highlight"><a href="https://hyperhuman.deemos.com/rodin">Rodin Gen-1</a></span>, <span class="highlight"><a href="https://www.meshy.ai/">Meshy</a></span>, <span class="highlight"><a href="https://www.tripo3d.ai/">Tripo3D</a></span>). |
|
|
Some models have more than one animation results, and you can <span class="highlight">switch between different animations by clicking the thumbnails below the video</span>. |
|
|
Click the thumbnail above the video to see the input 3D model. When you hover your mouse over the video, a <span class="highlight">full screen button</span> will appear in the bottom right corner. |
|
|
Click it to watch the video in 1024 resolution.</p> |
|
|
</div> |
|
|
<div class="video-container"> |
|
|
<img id="reference-thumbnail-group3" class="thumbnail" onclick="playReference('group3')"> |
|
|
<div class="video-wrapper-1-1"> |
|
|
<div class="controls"> |
|
|
<div class="button left" onclick="switchGroup(-1, 'group3')"></div> |
|
|
<div class="button right" onclick="switchGroup(1, 'group3')"></div> |
|
|
</div> |
|
|
<video id="main-video-group3" autoplay muted controls preload="auto" loop> |
|
|
<source id="main-video-source-group3" src="index.html" type="video/mp4"> |
|
|
您的浏览器不支持播放此视频。 |
|
|
</video> |
|
|
</div> |
|
|
<div class="video-thumbnails" id="thumbnails-group3"> |
|
|
|
|
|
</div> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="other"> |
|
|
<h2>Ablation for 4D-SDS</h2> |
|
|
<div class="content-center"> |
|
|
<p>We compare our <span class="highlight"> motion reconstruction results (left) </span> and those <span class="highlight"> w/ 4D-SDS (right) </span> as below. Best viewed in <span class="highlight">full screen</span></p> |
|
|
</div> |
|
|
<div class="video-container"> |
|
|
<div class="video-wrapper-other"> |
|
|
<div class="controls"> |
|
|
<div class="button left" onclick="switchVideo(-1)"></div> |
|
|
<div class="button right" onclick="switchVideo(1)"></div> |
|
|
</div> |
|
|
<video id="other-video" autoplay muted controls preload="auto" loop> |
|
|
<source id="other-video-source" src="index.html" type="video/mp4"> |
|
|
您的浏览器不支持播放此视频。 |
|
|
</video> |
|
|
</div> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="training-data"> |
|
|
<h2>Training Data</h2> |
|
|
<div class="content-center"> |
|
|
<p> |
|
|
Our training dataset, <span class="highlight">MV-Video</span>, comprises <span class="highlight">115K animations</span> that are available under a public license, consisting of about <span class="highlight">53K animated 3D objects</span> at all, |
|
|
which are rendered into over <span class="highlight">1.8M multi-view videos</span>. <br> |
|
|
Notably, our training data is manually selected and with <span class="highlight">high-quality</span>. It includes <span class="highlight">the highest quality part of <a href="https://objaverse.allenai.org/objaverse-1.0/" target="_blank">Objaverse</a> (around 29K animated 3D objects)</span>, while the rest <span class="highlight">(around 24K animated 3D objects)</span> are collected by ourselves. |
|
|
</p> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="relevant-work"> |
|
|
<h2>Relevant Works</h2> |
|
|
<div class="content-center"> |
|
|
<p> |
|
|
[1] <a href="https://sc4d.github.io/" target="_blank">SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer</a> (ECCV 2024) <br> |
|
|
[2] <a href="https://nju-3dv.github.io/projects/STAG4D/" target="_blank">STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians</a> (ECCV 2024) <br> |
|
|
[3] <a href="https://consistent4d.github.io/" target="_blank">Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video</a> (ICLR 2024) |
|
|
</p> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
|
|
|
<section id="acknowledgements"> |
|
|
<h2>Acknowledgements</h2> |
|
|
<div class="content-center"> |
|
|
<p>Some 3D assets for animation are downloaded from <span class="highlight"><a href="https://sketchfab.com/" target="_blank">sketchfab</a></span>, under <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">CC Attribution</a> and <a href="https://creativecommons.org/licenses/by-nc/4.0/" target="_blank">CC Attribution-NonCommercial</a>. |
|
|
We would like to thank <span class="highlight"><a href="https://animate3d.github.io/assets/acknowledgements.txt">the creators</a></span>for sharing great 3D assets. |
|
|
</p> |
|
|
</div> |
|
|
</section> |
|
|
</div> |
|
|
|
|
|
<script src="scripts/scripts.js"></script> |
|
|
</body> |
|
|
</html> |
|
|
|