hkust-aigc · hspSean · Jul 29, 2023 · Jul 29, 2023 · Jul 29, 2023 · Jul 29, 2023
diff --git a/videoCreators/tutorial 5/1 b/videoCreators/tutorial 5/1
@@ -0,0 +1 @@
+
diff --git a/videoCreators/tutorial 5/index.html b/videoCreators/tutorial 5/index.html
@@ -0,0 +1,316 @@
+<!DOCTYPE html>
+<html lang = "en-us">
+
+<head>
+  <meta charset="utf-8">
+  <meta name = "description" content="A studio that shows videos produced by generative AI">
+  <meta name = "keywords" content="Generative AI,video,github">
+  <meta name = "viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=yes">
+  <title> HKUST AIGC Studio</title>
+
+  <link rel = "stylesheet" href = "/css/reset.css">
+  <link rel = "stylesheet" href = "/css/pageStyle.css">
+  <link rel = "stylesheet" href = "/css/topbar.css">
+  <link rel = "stylesheet" href = "/css/tutorial.css">
+  <link rel = "stylesheet" href = "/css/videoSytle.css">
+</head>
+
+<body>
+
+  <!--top navigation-->
+
+  <div class = "topbar">
+
+    <div class = "container clearfix">
+      <!--logo-->
+
+      <div class = "topbar-icon leftfix">
+        <a href = "/"> <img class = "logo" src="/images/logo.png" alt="HKUST AIGC logo"> </a>
+      </div>
+
+      <!--navigation-->
+
+      <div class = "topbar-navigation rightfix">
+        <ul class = "list clearfix">
+          <li> 
+            <a href= "/explore"> Explore</a> 
+          </li>
+          <li> 
+            <a href="/showcase" > Showcase </a> 
+          </li>
+        </ul>
+
+        <div class = "Creator">
+          <a href="/videoCreators"> 
+            <button> Create </button>
+          </a> 
+        </div>
+
+        <div class = "searchBox">
+          <form action = "/search/creations/">
+            <input type = "text" name = "searchText" placeholder = "Search HKUST AIGC Studio">
+            <button> search </button>
+          </form> 
+        </div>
+
+      </div>
+
+    </div>
+  </div>
+
+  <div class = "tutorial">
+
+    <article> 
+      <h2 class = "Title"> <span> Text2Video-Zero </span></h2>
+      <h3 class = "Author"> <span>FEI, Yang</span> </h3>
+
+      <h3 class = "Subtitle"> 
+        <span> Introduction </span>
+      </h3>
+      <p class = "Text"> 
+        <span> 
+          Text-to-video generation are capable of generating high-quality images. However, when applying these models to video domain, 
+          ensuring temporal consistency across video frames remains a formidable challenge. Rerender A Video is a novel zero-shot text-guided 
+          video-to-video translation framework that is able to adapt image models to videos while also customizing a specific subject with 
+          LoRA, and introducing extra spatial guidance with ControlNet to rendering high-quality and temporally-coherent videos which don't 
+          flicker. 
+        <span>
+          In this article, I will introduce the different parameters of this AI-powered tool to guide you through creating your own video. 
+          To give a better understanding, I will be demonstrating the steps using Hugging Face Space at 
+          <a href = "https://huggingface.co/spaces/Anonymous-sub/Rerender"> https://huggingface.co/spaces/Anonymous-sub/Rerender</a>.
+        </span>
+      </p>
+
+      <br>
+
+
+      <h3 class = "Subtitle">
+        <span> Limitations </span>
+      </h3>
+      <p class = "Text">
+        <span>
+          Only the huggingface demo is avalible at the moment which is very limited and only showcases the keyframe generation ability of 
+          Rerender A Video. The Run Propagation button is useless for now and is supposed to be used to generate the whole video when the 
+          full code is released. The maximum number of keyframes is also 8 and the maximum frame resolution is 512x768. 
+        </span>
+        <span>
+        The second limitation of note is videos of large or quick motions are unstable. Due to using optical flow, which is only an estimate
+        of movement, the model may not be correct if the subject in the video changes position too much or too quickly. 
+        </span>
+        <span>
+        The third limitation that is quite common with any kind of video generation is that it takes quite a long time. The running time of 
+        a video of size 512x640 is about 1 minute per keyframe under T4 GPU. From my experience, it takes 2-3 minutes to Run All for 8 keyframes
+         and under a minute for the first keyframe. 
+        </span>   
+        <span>
+          I will be using these video below to showcase the limitations of rerendering.
+        </span>   
+
+        <div class = "video-grid">
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/improved cropped breakdance.mp4" alt="Video 1" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/women original.mp4" alt="Video 1" controls>
+          </div>
+        </div>
+      </p>
+
+      <p class = "Text">
+        <span>
+          The first video is of a person break dancing with the prompt "man dancing in CG style", the second is a person looking at the camera 
+          without much movement with the prompt "a woman in CG style".
+        </span>
+
+        <div class = "video-grid">
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/breakdance fail motion (3).mp4" alt="Video 1" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a woman in CG style seed 0.mp4" alt="Video 2" controls>
+          </div>
+        </div>
+      </p>
+      <p class = "Text">
+        <span>
+          The breakdancing video doesn't work even though the model is understands the subject is breakdancing. While the women is rendered
+          correctly as she doesn't move much in the other video.
+        </span>
+      </p>
+
+      <h3 class = "Subtitle">
+        <span> Prompt </span>
+      </h3>
+      <p class = "Text">
+        <span>
+          Prompts will change the inputted video a lot and may take a lot of trail and error to get right.
+        </span>
+        <span>
+          The model may understand some prompts better than others even if they mean the same thing.
+        </span>
+        <span>
+          In the following examples the prompt I used is "a one eyed man in CG style", "a man with one eye in CG style " and "a man with an eyepatch in CG ".
+        </span>  
+      </p>
+
+      <p class = "Text">
+
+        <div class = "video-grid">
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a one eyed man in CG style Controlnet 0.2.mp4" alt="Video 1" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a man with one eye in CG style ControlNet 0.2.mp4" alt="Video 2" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a man with an eyepatch in CG style Controlnet 0.2.mp4" alt="Video 2" controls>
+          </div>
+        </div>
+
+      <p class="Text">
+        <span>
+          The model seems to understand what "one eyed man" and "man with one eye" means as it shows them wearing an eyepatch in one of the keyframes. However, it
+          doesn't give the eyepatch on the first keyframe which is the most important. So if your prompt doesn't give you want you want, consider wording your 
+          prompt differently.
+        </span>
+      </p>
+
+      <p class = "Text">
+        <span>
+          Negative prompts are also very helpful for solving issues. The negative prompt can be accessed in the Advanced options for the 1st frame translation. 
+        </span>  
+        <span>
+          I encountered a bug where the the subject would be generated into a room for some reason despite the prompt "a beautiful women in cg style" which 
+          makes no mention of a room. 
+          So i had to add "room" in the negative prompt so this won't happen which quickly fixed the problem. 
+        </span>  
+
+        <div class = "video-grid">
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/room 3.mp4" alt="Video 1" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/room 2.mp4" alt="Video 2" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a beautiful woman in CG style negative prompt room.mp4" alt="Video 2" controls>
+          </div>
+        </div>
+
+        <span>
+          The first time it happened I was confused and though it just random, and the second time I realized it might be a bug so the third time around, I fixed it by adding room to
+           the negative prompt.
+        </span>  
+
+      </p>
+
+      <br>
+
+      <h3 class = "Subtitle">
+        <span> Advanced Options </span>
+      </h3>
+
+      <p class = "Text">
+        <span>
+          Rerender a video provides some advanced options to finetune generation for the first keyframe. I will discuss a few key ones:
+        </span>
+        <span>
+          ControlNet Strength - The ControlNet improves temporal consistancy but might limit the effect of the prompt. I suggest 
+          lowering the ControlNet strength to 0.2 if the first keyframe doesn't give you the result you want. In the following example, 
+          i used the prompt "a man with an eyepatch in CG style".
+        </span>
+
+        <div class = "video-grid">
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a man with an eyepatch in CG style Controlnet 0.7.mp4" alt="Video 1" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a man with an eyepatch in CG style Controlnet 0.5.mp4" alt="Video 2" controls>
+          </div>      
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a man with an eyepatch in CG style Controlnet 0.2.mp4" alt="Video 3" controls>
+          </div>
+        </div>
+        <span>
+          The first video has ControlNet strength 0.7 which only shows an eyepatch for one keyframe, same for the second video which has ControlNet strength
+           0.5. It is only for the last video which has ControlNet strength 0.2 that the eyepatch is present in all keyframes.
+        </span>
+      </p>
+
+      <p class = "Text">
+        <span>
+          Denoising strength - As image generation works by adding and removing noise, the more noise you add and remove, the more the input is changed. 
+          Denoising strength of 0 fully recovers the input.(Outputted keyframes are the same as input). Denoising strength of 1.05 fully rerenders the 
+          input.(Outputted leyframes are completely 
+          different from input). In the following example, the prompt "beautiful women in CG style" is used.
+        </span>
+
+        <div class = "video-grid">
+          <div class ="video-item">
+            <video src="/videos/Text2Video-Zero/a cat walking on grass/8frame.mp4" alt="Video 1" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Text2Video-Zero/a cat walking on grass/12frame.mp4" alt="Video 2" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Text2Video-Zero/a cat walking on grass/8frame.mp4" alt="Video 3" controls>
+          </div>
+        </div>
+        <span>
+          The first video is the original video, the second video has Denoise strength of 0.2, which is very similar to original video and the third 
+          has Denoise strength of 0.5 which is very different from the original video.
+        </span>
+      </p>
+
+        <p class = "Text">
+        <span>
+          Seed -  A prompt is not all that affects the output of the text to video model. A seed in generative AI is a starting point or initial input that is used to generate an output. With the same prompt, using
+         the same seed will yeild the same results and vice versa. After several generations on Hugging Face, I have discovered that the subjects you have focused on in your prompt will not 
+          change when different seeds are used. In the examples below, I have used the prompt "a man in CG style" with the seeds being different.
+        </span>
+
+        <div class = "video-grid">
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a man in CG style seed 0.mp4" alt="Video 1" controls>
+          </div>
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a man in CG style seed 100.mp4" alt="Video 2" controls>
+          </div>      
+          <div class ="video-item">
+            <video src="/videos/Rerender a Video/Rerender a video examples/a man in CG style seed 2000000000.mp4" alt="Video 3" controls>
+          </div>
+        </div>
+        <span>
+        As you can see, the man's face doesn't change but the background does. In the first example, the background has become a 
+          room even though the original video, the background is blurry and doesn't seem like solid walls. In the second example, 
+          the background even seems to morph into a wreath that is glued to his head and rest appear to become like a curtain or cloth. 
+          And the third example goes back to becoming a room with solid walls. This shows that subjects the prompt doesn't describe will
+          experience change that may be added onto the subject the prompt doesn describe.  
+        </span>
+      </p>
+
+      <h3 class = "Subtitle">
+        <span> Conclusion </span>
+      </h3>
+
+      <p class = "Text">
+        <span>
+          Rerender a video is just a demo which can only generate keyframes but it shows great potential as the key frames are very consistent and will be a 
+          the powerful tool if the keyframe propagation works as intended.
+        </span>
+      </p>
+
+      <h3 class = "Subtitle">
+        <span> Reference </span>
+      </h3>
+
+      <p class = "Text">
+        <span>
+          Shuai, Y., Yifan, Z., Ziwei, L., & Chen Change, L. (2023, June 14). Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation. arXiv.org. <a href = "https://arxiv.org/abs/2306.07954">https://arxiv.org/abs/2306.07954 </a>
+        </span>
+      </p>
+
+    </article>
+
+  </div>
+</body>