Skip to content

Rerender a video tutorial #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions videoCreators/tutorial 5/1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

316 changes: 316 additions & 0 deletions videoCreators/tutorial 5/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,316 @@
<!DOCTYPE html>
<html lang = "en-us">

<head>
<meta charset="utf-8">
<meta name = "description" content="A studio that shows videos produced by generative AI">
<meta name = "keywords" content="Generative AI,video,github">
<meta name = "viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=yes">
<title> HKUST AIGC Studio</title>

<link rel = "stylesheet" href = "/css/reset.css">
<link rel = "stylesheet" href = "/css/pageStyle.css">
<link rel = "stylesheet" href = "/css/topbar.css">
<link rel = "stylesheet" href = "/css/tutorial.css">
<link rel = "stylesheet" href = "/css/videoSytle.css">
</head>

<body>

<!--top navigation-->

<div class = "topbar">

<div class = "container clearfix">
<!--logo-->

<div class = "topbar-icon leftfix">
<a href = "/"> <img class = "logo" src="/images/logo.png" alt="HKUST AIGC logo"> </a>
</div>

<!--navigation-->

<div class = "topbar-navigation rightfix">
<ul class = "list clearfix">
<li>
<a href= "/explore"> Explore</a>
</li>
<li>
<a href="/showcase" > Showcase </a>
</li>
</ul>

<div class = "Creator">
<a href="/videoCreators">
<button> Create </button>
</a>
</div>

<div class = "searchBox">
<form action = "/search/creations/">
<input type = "text" name = "searchText" placeholder = "Search HKUST AIGC Studio">
<button> search </button>
</form>
</div>

</div>

</div>
</div>

<div class = "tutorial">

<article>
<h2 class = "Title"> <span> Text2Video-Zero </span></h2>
<h3 class = "Author"> <span>FEI, Yang</span> </h3>

<h3 class = "Subtitle">
<span> Introduction </span>
</h3>
<p class = "Text">
<span>
Text-to-video generation are capable of generating high-quality images. However, when applying these models to video domain,
ensuring temporal consistency across video frames remains a formidable challenge. Rerender A Video is a novel zero-shot text-guided
video-to-video translation framework that is able to adapt image models to videos while also customizing a specific subject with
LoRA, and introducing extra spatial guidance with ControlNet to rendering high-quality and temporally-coherent videos which don't
flicker.
<span>
In this article, I will introduce the different parameters of this AI-powered tool to guide you through creating your own video.
To give a better understanding, I will be demonstrating the steps using Hugging Face Space at
<a href = "https://huggingface.co/spaces/Anonymous-sub/Rerender"> https://huggingface.co/spaces/Anonymous-sub/Rerender</a>.
</span>
</p>

<br>


<h3 class = "Subtitle">
<span> Limitations </span>
</h3>
<p class = "Text">
<span>
Only the huggingface demo is avalible at the moment which is very limited and only showcases the keyframe generation ability of
Rerender A Video. The Run Propagation button is useless for now and is supposed to be used to generate the whole video when the
full code is released. The maximum number of keyframes is also 8 and the maximum frame resolution is 512x768.
</span>
<span>
The second limitation of note is videos of large or quick motions are unstable. Due to using optical flow, which is only an estimate
of movement, the model may not be correct if the subject in the video changes position too much or too quickly.
</span>
<span>
The third limitation that is quite common with any kind of video generation is that it takes quite a long time. The running time of
a video of size 512x640 is about 1 minute per keyframe under T4 GPU. From my experience, it takes 2-3 minutes to Run All for 8 keyframes
and under a minute for the first keyframe.
</span>
<span>
I will be using these video below to showcase the limitations of rerendering.
</span>

<div class = "video-grid">
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/improved cropped breakdance.mp4" alt="Video 1" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/women original.mp4" alt="Video 1" controls>
</div>
</div>
</p>

<p class = "Text">
<span>
The first video is of a person break dancing with the prompt "man dancing in CG style", the second is a person looking at the camera
without much movement with the prompt "a woman in CG style".
</span>

<div class = "video-grid">
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/breakdance fail motion (3).mp4" alt="Video 1" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a woman in CG style seed 0.mp4" alt="Video 2" controls>
</div>
</div>
</p>
<p class = "Text">
<span>
The breakdancing video doesn't work even though the model is understands the subject is breakdancing. While the women is rendered
correctly as she doesn't move much in the other video.
</span>
</p>

<h3 class = "Subtitle">
<span> Prompt </span>
</h3>
<p class = "Text">
<span>
Prompts will change the inputted video a lot and may take a lot of trail and error to get right.
</span>
<span>
The model may understand some prompts better than others even if they mean the same thing.
</span>
<span>
In the following examples the prompt I used is "a one eyed man in CG style", "a man with one eye in CG style " and "a man with an eyepatch in CG ".
</span>
</p>

<p class = "Text">

<div class = "video-grid">
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a one eyed man in CG style Controlnet 0.2.mp4" alt="Video 1" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a man with one eye in CG style ControlNet 0.2.mp4" alt="Video 2" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a man with an eyepatch in CG style Controlnet 0.2.mp4" alt="Video 2" controls>
</div>
</div>

<p class="Text">
<span>
The model seems to understand what "one eyed man" and "man with one eye" means as it shows them wearing an eyepatch in one of the keyframes. However, it
doesn't give the eyepatch on the first keyframe which is the most important. So if your prompt doesn't give you want you want, consider wording your
prompt differently.
</span>
</p>

<p class = "Text">
<span>
Negative prompts are also very helpful for solving issues. The negative prompt can be accessed in the Advanced options for the 1st frame translation.
</span>
<span>
I encountered a bug where the the subject would be generated into a room for some reason despite the prompt "a beautiful women in cg style" which
makes no mention of a room.
So i had to add "room" in the negative prompt so this won't happen which quickly fixed the problem.
</span>

<div class = "video-grid">
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/room 3.mp4" alt="Video 1" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/room 2.mp4" alt="Video 2" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a beautiful woman in CG style negative prompt room.mp4" alt="Video 2" controls>
</div>
</div>

<span>
The first time it happened I was confused and though it just random, and the second time I realized it might be a bug so the third time around, I fixed it by adding room to
the negative prompt.
</span>

</p>

<br>

<h3 class = "Subtitle">
<span> Advanced Options </span>
</h3>

<p class = "Text">
<span>
Rerender a video provides some advanced options to finetune generation for the first keyframe. I will discuss a few key ones:
</span>
<span>
ControlNet Strength - The ControlNet improves temporal consistancy but might limit the effect of the prompt. I suggest
lowering the ControlNet strength to 0.2 if the first keyframe doesn't give you the result you want. In the following example,
i used the prompt "a man with an eyepatch in CG style".
</span>

<div class = "video-grid">
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a man with an eyepatch in CG style Controlnet 0.7.mp4" alt="Video 1" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a man with an eyepatch in CG style Controlnet 0.5.mp4" alt="Video 2" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a man with an eyepatch in CG style Controlnet 0.2.mp4" alt="Video 3" controls>
</div>
</div>
<span>
The first video has ControlNet strength 0.7 which only shows an eyepatch for one keyframe, same for the second video which has ControlNet strength
0.5. It is only for the last video which has ControlNet strength 0.2 that the eyepatch is present in all keyframes.
</span>
</p>

<p class = "Text">
<span>
Denoising strength - As image generation works by adding and removing noise, the more noise you add and remove, the more the input is changed.
Denoising strength of 0 fully recovers the input.(Outputted keyframes are the same as input). Denoising strength of 1.05 fully rerenders the
input.(Outputted leyframes are completely
different from input). In the following example, the prompt "beautiful women in CG style" is used.
</span>

<div class = "video-grid">
<div class ="video-item">
<video src="/videos/Text2Video-Zero/a cat walking on grass/8frame.mp4" alt="Video 1" controls>
</div>
<div class ="video-item">
<video src="/videos/Text2Video-Zero/a cat walking on grass/12frame.mp4" alt="Video 2" controls>
</div>
<div class ="video-item">
<video src="/videos/Text2Video-Zero/a cat walking on grass/8frame.mp4" alt="Video 3" controls>
</div>
</div>
<span>
The first video is the original video, the second video has Denoise strength of 0.2, which is very similar to original video and the third
has Denoise strength of 0.5 which is very different from the original video.
</span>
</p>

<p class = "Text">
<span>
Seed - A prompt is not all that affects the output of the text to video model. A seed in generative AI is a starting point or initial input that is used to generate an output. With the same prompt, using
the same seed will yeild the same results and vice versa. After several generations on Hugging Face, I have discovered that the subjects you have focused on in your prompt will not
change when different seeds are used. In the examples below, I have used the prompt "a man in CG style" with the seeds being different.
</span>

<div class = "video-grid">
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a man in CG style seed 0.mp4" alt="Video 1" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a man in CG style seed 100.mp4" alt="Video 2" controls>
</div>
<div class ="video-item">
<video src="/videos/Rerender a Video/Rerender a video examples/a man in CG style seed 2000000000.mp4" alt="Video 3" controls>
</div>
</div>
<span>
As you can see, the man's face doesn't change but the background does. In the first example, the background has become a
room even though the original video, the background is blurry and doesn't seem like solid walls. In the second example,
the background even seems to morph into a wreath that is glued to his head and rest appear to become like a curtain or cloth.
And the third example goes back to becoming a room with solid walls. This shows that subjects the prompt doesn't describe will
experience change that may be added onto the subject the prompt doesn describe.
</span>
</p>

<h3 class = "Subtitle">
<span> Conclusion </span>
</h3>

<p class = "Text">
<span>
Rerender a video is just a demo which can only generate keyframes but it shows great potential as the key frames are very consistent and will be a
the powerful tool if the keyframe propagation works as intended.
</span>
</p>

<h3 class = "Subtitle">
<span> Reference </span>
</h3>

<p class = "Text">
<span>
Shuai, Y., Yifan, Z., Ziwei, L., & Chen Change, L. (2023, June 14). Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation. arXiv.org. <a href = "https://arxiv.org/abs/2306.07954">https://arxiv.org/abs/2306.07954 </a>
</span>
</p>

</article>

</div>
</body>
Loading