Close Menu
Ztoog
    What's Hot
    Science

    What happens when you donate your body to science? 

    The Future

    A timeline of Sam Altman’s firing from OpenAI — and the fallout

    The Future

    NASA unveils X-59 plane to test supersonic flight over US cities

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How to Get Bot Lobbies in Fortnite? (2025 Guide)

      Top 12 time & billing software for consultants (2025 reviews)

      AI data scrapers are an existential threat to Wikipedia

      Star Wars’ Season of the Force Takes Over Disneyland

      Trump government bans US officials from dating Chinese citizens. But there’s a catch…

    • Technology

      RFK Jr. fires vaccine advisory board: What to know

      Does Colossal Biosciences’ dire wolf creation justify its $10B+ valuation?

      Paris-based Pennylane, which makes cloud-based accounting software, raised €75M, doubling its valuation to €2B, led by Sequoia and with Alphabet among investors (Ryan Browne/CNBC)

      TikTok ban scores yet another delay — pushed back to June

      Early reviews of A Minecraft Movie suggest it’s better than expected

    • Gadgets

      Withings ScanWatch Nova Review: A Stylish Hybrid That Puts Health First

      Breast pump startup Willow acquires assets of Elvie as UK women’s health pioneer moves into administration

      Raccoon or robber? Find out with sub $90 night vision binoculars

      Nomad Sale: 5 Great Deals on Our Favorite Accessories

      New Windows 11 build makes mandatory Microsoft Account sign-in even more mandatory

    • Mobile

      This new flagship phone has two zoom lenses, but only one zoom camera (wait, what?)

      Moto G Stylus (2025) is now official ahead of April 17 release

      Apple’s iOS 18.5 beta update is pretty barebones, but more important than it seems

      Costco offering Apple AirTag 4-Pack at just $64.99

      The new Snapdragon 8s Gen 4 aims to make premium features a bit more accessible

    • Science

      8 Breakthroughs Tackling Pollution Across Air, Land, and Sea

      Why we can’t squash the common cold, even after 100 years of studying it

      Welcome to the Worst Allergy Season Ever

      How optical clocks are redefining time and physics

      NASA cut $420 million for climate science, moon modelling and more

    • AI

      Inroads to personalized AI trip planning | Ztoog

      AI companions are the final stage of digital addiction, and lawmakers are taking aim

      New method assesses and improves the reliability of radiologists’ diagnostic reports | Ztoog

      How do you teach an AI model to give therapy?

      Researchers teach LLMs to solve complex planning challenges | Ztoog

    • Crypto

      X names Polymarket as its official prediction market partner

      Kirby McInerney LLP Announces a Proposed Settlement in the DraftKings NFT Settlement

      Ethereum Whales Buy the Dip – Over 130K ETH Added In A Single Day

      Why Buying Bitcoin Now Is Better Than Later As BTC Price Consolidates Within Falling Wedge

      Why Bitcoin Seasoned Investors Are Accumulating — Analyst Evaluates BTC’s Current Phase

    Ztoog
    Home » Meet MovieChat: An Innovative Video Understanding System that Integrates Video Foundation Models and Large Language Models
    AI

    Meet MovieChat: An Innovative Video Understanding System that Integrates Video Foundation Models and Large Language Models

    Facebook Twitter Pinterest WhatsApp
    Meet MovieChat: An Innovative Video Understanding System that Integrates Video Foundation Models and Large Language Models
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large Language Models (LLMs) have not too long ago made appreciable strides within the Natural Language Processing (NLP) sector. Adding multi-modality to LLMs and remodeling them into Multimodal Large Language Models (MLLMs), which may carry out multimodal notion and interpretation, is a logical step. As a attainable step in the direction of Artificial General Intelligence (AGI), MLLMs have demonstrated astounding emergent expertise in numerous multimodal duties like notion (e.g., existence, depend, location, OCR), commonsense reasoning, and code reasoning. MLLMs provide a extra human-like perspective of the setting, a user-friendly interface for interplay, and a wider vary of task-solving expertise in comparison with LLMs and different task-specific fashions. 

    Existing vision-centric MLLMs use the Q-former or primary projection layer, pre-trained LLMs, a visible encoder, and further learnable modules. A unique paradigm combines present visible notion instruments (comparable to monitoring and classification) with LLMs by means of API to assemble a system with out coaching. Some earlier research within the video sector developed video MLLMs utilizing this paradigm. However, there had by no means been any investigation of a mannequin or system primarily based on prolonged motion pictures (these lasting longer than a minute), and there had by no means been set standards in opposition to which to measure the effectiveness of those techniques. 

    In this examine researchers from Zhejiang University, University of Washington, Microsoft Research Asia, and Hong Kong University introduce MovieChat, a singular framework for prolonged video interpretation challenges that combines imaginative and prescient fashions with LLMs. According to them, the remaining difficulties for prolonged video comprehension embody computing issue, reminiscence expense, and long-term temporal linkage. To do that, they counsel a reminiscence system primarily based on the Atkinson-Shiffrin reminiscence mannequin, which entails a shortly up to date short-term reminiscence and a compact, long-lasting reminiscence. 

    This distinctive framework combines imaginative and prescient fashions with LLMs and is the primary to allow prolonged video comprehension duties. This work is summarised as follows. They undertake rigorous quantitative assessments and case research to evaluate the efficiency of each understanding functionality and inference value, and they provide a kind of reminiscence mechanism to attenuate computing complexity and reminiscence value whereas enhancing the long-term temporal hyperlink. This analysis concludes by presenting a novel method for comprehending movies that mix large language fashions with video basis fashions. 

    The system solves difficulties with analyzing prolonged movies by together with a reminiscence course of impressed by the Atkinson-Shiffrin mannequin, consisting of short-term and long-term reminiscence represented by tokens in Transformers. The prompt system, MovieChat, outperforms earlier algorithms that can solely course of movies containing a couple of frames by attaining state-of-the-art efficiency in prolonged video comprehension. This methodology addresses long-term temporal relationships whereas decreasing reminiscence use and computing complexity. The work highlights the function of reminiscence processes in video comprehension, which permits the mannequin to retailer and recall pertinent data for prolonged intervals. The recognition of MovieChat has sensible ramifications for industries, together with content material evaluation, video suggestion techniques, and video monitoring. Future research may look into methods to strengthen the reminiscence system and use extra modalities, together with audio, to extend video comprehension. This examine creates prospects for functions needing a radical comprehension of visible knowledge. Their web site has a number of demos.


    Check out the Paper, GitHub, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t overlook to hitch our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI initiatives, and extra.


    Aneesh Tickoo is a consulting intern at MarktechPost. He is at the moment pursuing his undergraduate diploma in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time engaged on initiatives geared toward harnessing the ability of machine studying. His analysis curiosity is picture processing and is obsessed with constructing options round it. He loves to attach with folks and collaborate on attention-grabbing initiatives.


    🔥 Use SQL to foretell the long run (Sponsored)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Inroads to personalized AI trip planning | Ztoog

    AI

    AI companions are the final stage of digital addiction, and lawmakers are taking aim

    AI

    New method assesses and improves the reliability of radiologists’ diagnostic reports | Ztoog

    AI

    How do you teach an AI model to give therapy?

    AI

    Researchers teach LLMs to solve complex planning challenges | Ztoog

    AI

    The first trial of generative AI therapy shows it might help with depression

    AI

    Making higher education more accessible to students in Pakistan | Ztoog

    AI

    China built hundreds of AI data centers to catch the AI boom. Now many stand unused.

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Technology

    Nintendo confirms live-action Zelda movie is in the works

    Zelda followers already obtained a stellar new recreation with “Tears of the Kingdom” this 12…

    Science

    In Defense of the Rat

    Suddenly, Franks realized she had one other assembly to get to, and right here she…

    Mobile

    Gmail gets AI-powered “Summarize” feature on iOS and Android, Gemini side panel on the web

    Image credit score — PhoneArenaGoogle is enhancing its Gmail app for Android and iOS gadgets…

    Mobile

    Xiaomi phones with unlocked bootloader won’t get HyperOS updates

    With the announcement of the brand new Xiaomi 14 sequence, the corporate additionally launched a…

    Science

    Galaxy smash-ups may explain strange light from early universe

    Merging galaxies from the early universe imaged by the James Webb Space TelescopeS. Martin-Alvarez Observations…

    Our Picks
    AI

    Experience the Magic of Stable Audio by Stability AI: Where Text Prompts Become Stereo Soundscapes!

    AI

    Google DeepMind Introduces Video-to-Audio V2A Technology: Synchronizing Audiovisual Generation

    Technology

    TikTok Bill Would Complicate ByteDance Investments if Passed

    Categories
    • AI (1,469)
    • Crypto (1,733)
    • Gadgets (1,784)
    • Mobile (1,824)
    • Science (1,836)
    • Technology (1,773)
    • The Future (1,619)
    Most Popular
    Technology

    Indian central bank tightening consumer loans curb to impact startups

    AI

    Google’s new version of Gemini can handle far bigger amounts of data

    Crypto

    Hestiia wants you to mine for crypto to heat your house

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.