Close Menu
Ztoog
    What's Hot
    AI

    ByteDance AI Research Unveils Reinforced Fine-Tuning (ReFT) Method to Enhance the Generalizability of Learning LLMs for Reasoning with Math Problem Solving as an Example

    Crypto

    Mt Gox Bitcoin Distribution Hits A Snag, Users Getting Double BTC

    The Future

    Up to 60% Off on Blink Security Cameras and Doorbells

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      OPPO launches A5 Pro 5G: Premium features at a budget price

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

    • Technology

      What It Is and Why It Matters—Part 1 – O’Reilly

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Motorola’s Moto Watch needs to start living up to the brand name

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

    • Science

      Nothing is stronger than quantum connections – and now we know why

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

    • AI

      Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

    • Crypto

      Ethereum Breaks Key Resistance In One Massive Move – Higher High Confirms Momentum

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

    Ztoog
    Home » Researchers from UT Austin Introduce MUTEX: A Leap Towards Multimodal Robot Instruction with Cross-Modal Reasoning
    AI

    Researchers from UT Austin Introduce MUTEX: A Leap Towards Multimodal Robot Instruction with Cross-Modal Reasoning

    Facebook Twitter Pinterest WhatsApp
    Researchers from UT Austin Introduce MUTEX: A Leap Towards Multimodal Robot Instruction with Cross-Modal Reasoning
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Researchers have launched a cutting-edge framework known as MUTEX, quick for “MUltimodal Task specification for robot EXecution,” aimed toward considerably advancing the capabilities of robots in aiding people. The main drawback they deal with is the limitation of present robotic coverage studying strategies, which generally give attention to a single modality for job specification, leading to robots which are proficient in a single space however need assistance to deal with various communication strategies.

    MUTEX takes a groundbreaking method by unifying coverage studying from varied modalities, permitting robots to know and execute duties primarily based on directions conveyed by speech, textual content, pictures, movies, and extra. This holistic method is a pivotal step in direction of making robots versatile collaborators in human-robot groups.

    The framework’s coaching course of includes a two-stage process. The first stage combines masked modeling and cross-modal matching targets. Masked modeling encourages cross-modal interactions by masking sure tokens or options inside every modality and requiring the mannequin to foretell them utilizing info from different modalities. This ensures that the framework can successfully leverage info from a number of sources.

    In the second stage, cross-modal matching enriches the representations of every modality by associating them with the options of probably the most information-dense modality, which is video demonstrations on this case. This step ensures that the framework learns a shared embedding house that enhances the illustration of job specs throughout totally different modalities.

    MUTEX’s structure consists of modality-specific encoders, a projection layer, a coverage encoder, and a coverage decoder. It makes use of modality-specific encoders to extract significant tokens from enter job specs. These tokens are then processed by a projection layer earlier than being handed to the coverage encoder. The coverage encoder, using a transformer-based structure with cross- and self-attention layers, fuses info from varied job specification modalities and robotic observations. This output is then despatched to the coverage decoder, which leverages a Perceiver Decoder structure to generate options for motion prediction and masked token queries. Separate MLPs are used to foretell steady motion values and token values for the masked tokens.

    To consider MUTEX, the researchers created a complete dataset with 100 duties in a simulated atmosphere and 50 duties in the true world, every annotated with a number of cases of job specs in several modalities. The outcomes of their experiments had been promising, displaying substantial efficiency enhancements over strategies skilled solely for single modalities. This underscores the worth of cross-modal studying in enhancing a robotic’s means to know and execute duties. Text Goal and Speech Goal, Text Goal and Image Goal, and Speech Instructions and Video Demonstration have obtained 50.1, 59.2, and 59.6 success charges, respectively.

    In abstract, MUTEX is a groundbreaking framework that addresses the constraints of present robotic coverage studying strategies by enabling robots to grasp and execute duties specified by varied modalities. It presents promising potential for more practical human-robot collaboration, though it does have some limitations that want additional exploration and refinement. Future work will give attention to addressing these limitations and advancing the framework’s capabilities.


    Check out the Paper and Code. All Credit For This Research Goes To the Researchers on This Project. Also, don’t neglect to affix our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI tasks, and extra.

    If you want our work, you’ll love our publication..


    Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is at present pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech fanatic and has a eager curiosity within the scope of software program and knowledge science purposes. She is all the time studying in regards to the developments in several area of AI and ML.


    🚀 The finish of undertaking administration by people (Sponsored)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Science

    SpaceX is launching Blue Ghost and Resilience landers to the moon

    An artist’s impression of Blue Ghost touchdown on the moonFirefly Aerospace Two non-public spacecraft aiming…

    Science

    NASA wants you to record crickets during April’s solar eclipse

    American scientist William Wheeler not solely seemed to the sky during a complete solar eclipse;…

    Gadgets

    Save up to 40% on Samsung monitors at Amazon—but only for a limited time

    We might earn income from the merchandise obtainable on this web page and take part…

    Science

    In 1919, one eclipse chaser wanted to mount a telescope on a seaplane

    “What can the astronomer do, when, just as the moon is about to obscure the…

    Mobile

    A Google dash cam is the one Nest product I’d seriously consider

    C. Scott Brown / Android AuthorityDash cams are quick changing into important for each driver,…

    Our Picks
    Gadgets

    QSIMPLUS Introduces QSIMpro-LAN for Quantum Network Expansion

    Technology

    Deepfakes, Blackmail, and the Dangers of Generative AI

    Mobile

    ChatGPT is still down, OpenAI fears DDoS attack

    Categories
    • AI (1,483)
    • Crypto (1,745)
    • Gadgets (1,796)
    • Mobile (1,840)
    • Science (1,854)
    • Technology (1,790)
    • The Future (1,636)
    Most Popular
    Mobile

    Gmail on mobile is using AI to save you time when searching your inbox

    Gadgets

    Top 5 Camera Smartphones For Q1 2023

    Crypto

    SEC’s Hester Peirce still plans to push for a token ‘safe harbor’ plan

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.