Close Menu
Ztoog
    What's Hot
    Crypto

    Mila Kunis and Ashton Kutcher’s ‘Stoner Cats’ NFTs get smoked by the SEC

    Technology

    Salesforce escaped from the jaws of activists to find stability in 2023

    Mobile

    Samsung launches the Galaxy S23 FE with a more attractive price tag

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

    • Technology

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

    • AI

      Fueling seamless AI at scale

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » Researchers from China Introduce Make-Your-Video: A Video Transformation Method by Employing Textual and Structural Guidance
    AI

    Researchers from China Introduce Make-Your-Video: A Video Transformation Method by Employing Textual and Structural Guidance

    Facebook Twitter Pinterest WhatsApp
    Researchers from China Introduce Make-Your-Video: A Video Transformation Method by Employing Textual and Structural Guidance
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Videos are a generally used digital medium prized for his or her capability to current vivid and participating visible experiences. With the ever-present use of smartphones and digital cameras, recording dwell occasions on digital camera has develop into easy. However, the method will get considerably tougher and costly when producing a video to characterize the concept visually. This usually requires skilled expertise in pc graphics, modeling, and animation creation. Fortunately, new developments in text-to-video have made it potential to streamline this process by utilizing solely textual content prompts. 

    Figure 1 exhibits how the mannequin can produce temporally coherent movies that adhere to the steerage intents when given textual content descriptions and movement construction as inputs. They show the video manufacturing outcomes in a number of functions, together with (prime) real-world scene setup to video, (center) dynamic 3D scene modelling to video, and (backside) video re-rendering, by setting up construction steerage from numerous sources.

    They contend that whereas language is a widely known and versatile description software, it might have to be extra profitable at giving exact management. Instead, it excels at speaking summary international context. This encourages us to research the creation of custom-made movies utilizing textual content to explain the setting and movement in a particular route. As frame-wise depth maps are 3D-aware 2D information effectively suited to the video creation activity, they’re particularly chosen to explain the movement construction. The construction route of their methodology is perhaps comparatively fundamental in order that non-expert can readily put together it. 

    🚀 JOIN the quickest ML Subreddit Community

    This structure offers the generative mannequin the liberty to generate real looking content material with out counting on meticulously produced enter. For occasion, making a photorealistic exterior surroundings may be guided by a state of affairs setup using items present in an workplace (Figure 1(prime)). The bodily objects could also be substituted with particular geometrical components or any available 3D asset utilizing 3D modeling software program (Figure 1(center)). Using the calculated depth from already-existing recordings is another choice (Figure 1(backside)). To customise their films as meant, customers have each flexibility and management due to the combo of textual and structural instruction. 

    To do that, researchers from CUHK, Tencent AI Lab and HKUST use a Latent Diffusion Model (LDM), which adopts a diffusion mannequin in a good lower-dimensional latent area to scale back processing prices. They counsel separating the coaching of spatial modules (for picture synthesis) and temporal modules (for temporal coherence) for an open-world video manufacturing mannequin. This design is predicated on two principal components: (i) coaching the mannequin elements individually reduces computational useful resource necessities, which is very necessary for resource-intensive duties; and (ii) as picture datasets embody a a lot wider number of ideas than the prevailing video datasets, pre-training the mannequin for picture synthesis aids in inheriting the various visible ideas and switch them to video technology. 

    Achieving temporal coherence is a major activity. They maintain them because the frozen spatial blocks and introduce the temporal blocks designed to study inter-frame coherence all through the video dataset utilizing a pre-trained image LDM. Notably, they incorporate spatial and temporal convolutions, rising the pre-trained modules’ flexibility and enhancing temporal stability. Additionally, they use an easy however highly effective causal consideration masks methodology to allow lengthier (i.e., 4 occasions the coaching interval) video synthesis, significantly decreasing the danger of high quality deterioration. 

    Qualitative and quantitative evaluations present that the urged approach outperforms the baselines, particularly when it comes to temporal coherence and faithfulness to person directions. The effectivity of the proposed designs, that are important to the operation of the method, is supported by ablation experiments. Additionally, they demonstrated a number of fascinating functions made potential by their methodology, and the outcomes illustrate the potential for real-world functions. 

    The following is a abstract of their contributions: • They provide textual and structural help to current an efficient methodology for producing custom-made movies. Their method produces one of the best ends in each quantitative and qualitative phrases for regulated text-to-video manufacturing. • They present a technique for utilizing pre-trained picture LDMs to generate movies that inherit wealthy visible notions and have good temporal coherence. • They embody a temporal masking method to increase the length of video synthesis whereas minimizing high quality loss.


    Check Out The Paper, Project and Github. Don’t overlook to affix our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, the place we share the newest AI analysis information, cool AI tasks, and extra. If you’ve any questions concerning the above article or if we missed something, be happy to electronic mail us at Asif@marktechpost.com

    🚀 Check Out 100’s AI Tools in AI Tools Club


    Aneesh Tickoo is a consulting intern at MarktechPost. He is presently pursuing his undergraduate diploma in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time engaged on tasks geared toward harnessing the ability of machine studying. His analysis curiosity is picture processing and is keen about constructing options round it. He loves to attach with folks and collaborate on fascinating tasks.


    Check out https://aitoolsclub.com to search out 100’s of Cool AI Tools

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Fueling seamless AI at scale

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    Cyborg jellyfish have a swimming cap and electric propulsion system

    This time lapse reveals the cyborg jellyfish swimmingSimon R. Anuszczyk and John O. Dabiri A…

    The Future

    The (Cleaning) Droid You’re Looking for – Review Geek

    Rating: 8/10 ? 1 – Absolute Hot Garbage 2 – Sorta Lukewarm Garbage 3 -…

    Crypto

    Curve’s crvUSD Stablecoin Recovers After Brief Depegging

    Share this text Curve Finance’s crvUSD, its decentralized stablecoin, skilled a short lived dip in…

    The Future

    Has the Time for 32GB of RAM Finally Come?

    Joshua Sanderson Media/Shutterstock.com We’ve lastly established that 640K of reminiscence isn’t fairly sufficient for anybody,…

    Mobile

    Early Black Friday smart home deals: complete your ecosystem for cheap

    The Black Friday gross sales are developing rapidly, so should you’re planning to improve your…

    Our Picks
    Science

    Iconic 1987A supernova captured by the James Webb Space Telescope

    Science

    SpaceX’s Starship created a volcano-like explosion in first launch

    AI

    Top 40 Generative AI Tools 2023

    Categories
    • AI (1,494)
    • Crypto (1,754)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,867)
    • Technology (1,803)
    • The Future (1,649)
    Most Popular
    Science

    Seeing a corpse makes fruit flies age faster

    The Future

    Motorola launches moto tag in Australia

    Gadgets

    6 Best Standing Desk and Laptop Stand Deals

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.