Close Menu
Ztoog
    What's Hot
    Technology

    Trump-Biden debate shows how a democracy dies

    Gadgets

    Explaining why your keyboard feels so darn good—or way too mushy

    Crypto

    Insider Trading Allegations Rock Mystiko Network

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What time tracking metrics should you track and why?

      Are entangled qubits following a quantum Moore’s law?

      Disneyland’s 70th Anniversary Brings Cartoony Chaos to This Summer’s Celebration

      Story of military airfield in Afghanistan that Biden left in 2021

      Tencent hires WizardLM team, a Microsoft AI group with an odd history

    • Technology

      Are Democrats fumbling a golden opportunity?

      Crypto elite increasingly worried about their personal safety

      Deep dive on the evolution of Microsoft's relationship with OpenAI, from its $1B investment in 2019 through Copilot rollouts and ChatGPT's launch to present day (Bloomberg)

      New leak reveals iPhone Fold won’t look like the Galaxy Z Fold 6 at all

      Apple will use AI and user data in iOS 19 to extend iPhone battery life

    • Gadgets

      The market’s down, but this OpenAI for the stock market can help you trade up

      We Hand-Picked the 24 Best Deals From the 2025 REI Anniversary Sale

      “Google wanted that”: Nextcloud decries Android permissions as “gatekeeping”

      Google Tests Automatic Password-to-Passkey Conversion On Android

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

    • Mobile

      Android 16 QPR1 lets you check what fingerprints you’ve enrolled on your Pixel phone

      The Forerunner 570 & 970 have made Garmin’s tiered strategy clearer than ever

      The iPhone Fold is now being tested with an under-display camera

      T-Mobile takes over one of golf’s biggest events, unleashes unique experiences

      Fitbit’s AI experiments just leveled up with 3 new health tracking features

    • Science

      Liquid physics: Inside the lab making black hole analogues on Earth

      Risk of a star destroying the solar system is higher than expected

      Do these Buddhist gods hint at the purpose of China’s super-secret satellites?

      From Espresso to Eco-Brick: How Coffee Waste Fuels 3D-Printed Design

      Ancient three-eyed ‘sea moth’ used its butt to breathe

    • AI

      How AI is introducing errors into courtrooms

      With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

      Google DeepMind’s new AI agent cracks real-world problems better than humans can

      Study shows vision-language models can’t handle queries with negation words | Ztoog

      How a new type of AI is helping police skirt facial recognition bans

    • Crypto

      Senate advances GENIUS Act after cloture vote passes

      Is Bitcoin Bull Run Back? Daily RSI Shows Only Mild Bullish Momentum

      Robinhood grows its footprint in Canada by acquiring WonderFi

      HashKey Group Announces Launch of HashKey Global MENA with VASP License in UAE

      Ethereum Breaks Key Resistance In One Massive Move – Higher High Confirms Momentum

    Ztoog
    Home » Meet MovieChat: An Innovative Video Understanding System that Integrates Video Foundation Models and Large Language Models
    AI

    Meet MovieChat: An Innovative Video Understanding System that Integrates Video Foundation Models and Large Language Models

    Facebook Twitter Pinterest WhatsApp
    Meet MovieChat: An Innovative Video Understanding System that Integrates Video Foundation Models and Large Language Models
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large Language Models (LLMs) have not too long ago made appreciable strides within the Natural Language Processing (NLP) sector. Adding multi-modality to LLMs and remodeling them into Multimodal Large Language Models (MLLMs), which may carry out multimodal notion and interpretation, is a logical step. As a attainable step in the direction of Artificial General Intelligence (AGI), MLLMs have demonstrated astounding emergent expertise in numerous multimodal duties like notion (e.g., existence, depend, location, OCR), commonsense reasoning, and code reasoning. MLLMs provide a extra human-like perspective of the setting, a user-friendly interface for interplay, and a wider vary of task-solving expertise in comparison with LLMs and different task-specific fashions. 

    Existing vision-centric MLLMs use the Q-former or primary projection layer, pre-trained LLMs, a visible encoder, and further learnable modules. A unique paradigm combines present visible notion instruments (comparable to monitoring and classification) with LLMs by means of API to assemble a system with out coaching. Some earlier research within the video sector developed video MLLMs utilizing this paradigm. However, there had by no means been any investigation of a mannequin or system primarily based on prolonged motion pictures (these lasting longer than a minute), and there had by no means been set standards in opposition to which to measure the effectiveness of those techniques. 

    In this examine researchers from Zhejiang University, University of Washington, Microsoft Research Asia, and Hong Kong University introduce MovieChat, a singular framework for prolonged video interpretation challenges that combines imaginative and prescient fashions with LLMs. According to them, the remaining difficulties for prolonged video comprehension embody computing issue, reminiscence expense, and long-term temporal linkage. To do that, they counsel a reminiscence system primarily based on the Atkinson-Shiffrin reminiscence mannequin, which entails a shortly up to date short-term reminiscence and a compact, long-lasting reminiscence. 

    This distinctive framework combines imaginative and prescient fashions with LLMs and is the primary to allow prolonged video comprehension duties. This work is summarised as follows. They undertake rigorous quantitative assessments and case research to evaluate the efficiency of each understanding functionality and inference value, and they provide a kind of reminiscence mechanism to attenuate computing complexity and reminiscence value whereas enhancing the long-term temporal hyperlink. This analysis concludes by presenting a novel method for comprehending movies that mix large language fashions with video basis fashions. 

    The system solves difficulties with analyzing prolonged movies by together with a reminiscence course of impressed by the Atkinson-Shiffrin mannequin, consisting of short-term and long-term reminiscence represented by tokens in Transformers. The prompt system, MovieChat, outperforms earlier algorithms that can solely course of movies containing a couple of frames by attaining state-of-the-art efficiency in prolonged video comprehension. This methodology addresses long-term temporal relationships whereas decreasing reminiscence use and computing complexity. The work highlights the function of reminiscence processes in video comprehension, which permits the mannequin to retailer and recall pertinent data for prolonged intervals. The recognition of MovieChat has sensible ramifications for industries, together with content material evaluation, video suggestion techniques, and video monitoring. Future research may look into methods to strengthen the reminiscence system and use extra modalities, together with audio, to extend video comprehension. This examine creates prospects for functions needing a radical comprehension of visible knowledge. Their web site has a number of demos.


    Check out the Paper, GitHub, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t overlook to hitch our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI initiatives, and extra.


    Aneesh Tickoo is a consulting intern at MarktechPost. He is at the moment pursuing his undergraduate diploma in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time engaged on initiatives geared toward harnessing the ability of machine studying. His analysis curiosity is picture processing and is obsessed with constructing options round it. He loves to attach with folks and collaborate on attention-grabbing initiatives.


    🔥 Use SQL to foretell the long run (Sponsored)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    AI

    Study shows vision-language models can’t handle queries with negation words | Ztoog

    AI

    How a new type of AI is helping police skirt facial recognition bans

    AI

    Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    Unveiling Multi-Attacks in Image Classification: How One Adversarial Perturbation Can Mislead Hundreds of Images

    Adversarial assaults in picture classification, a vital difficulty in AI safety, contain delicate adjustments to…

    Technology

    Internet Archive forced to remove 500,000 books after publishers’ court win

    As a results of e book publishers efficiently suing the Internet Archive (IA) final yr,…

    The Future

    Forrester’s No-Code Citizen Development Security Breach Prediction Misses the Mark

    As the demand for software program growth continues to develop, the use of no-code, low-code…

    The Future

    How to build a solid — and profitable! — AI startup

    After a yr of individuals throwing cash at AI firms, buyers are actually searching for…

    Technology

    Stability AI announces text-to-audio tool Stable Audio, available for free for 20 songs and 20-second tracks or $12/month for 500 songs and 90-second tracks (Sean Michael Kerner/VentureBeat)

    (*20*) Michael Kerner / VentureBeat: Stability AI announces text-to-audio tool Stable Audio, available for free…

    Our Picks
    Science

    Weird particle that remembers its past discovered by quantum computer

    Mobile

    Fitbit not syncing? Here’s how you can try to fix it

    Science

    Never-Repeating Patterns of Tiles Can Safeguard Quantum Information

    Categories
    • AI (1,488)
    • Crypto (1,749)
    • Gadgets (1,800)
    • Mobile (1,845)
    • Science (1,860)
    • Technology (1,796)
    • The Future (1,642)
    Most Popular
    Mobile

    Verizon’s new myPlan Unlimited Ultimate launches on August 31 for $90 per month

    Technology

    Disabling a Weapon on an ICBM Could Prevent Nuclear War

    Technology

    A look at the legal challenges of defining and prosecuting virtual crimes, amid rising reports of attacks, harassment, and sexual assault in the metaverse (Naomi Nix/Washington Post)

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.