Close Menu
Ztoog
    What's Hot
    Mobile

    Apple is once again valued at over $3 trillion; the product investors are thinking about

    Mobile

    AI probably won’t drive mankind to extinction but it can be harmful

    Crypto

    Texas Votes to Require Exchanges’ Proof of Reserves; Next Stop Governor’s Desk

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

    • Technology

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » This AI Paper Proposes an Interactive Agent Foundation Model that Uses a Novel Multi-Task Agent Training Paradigm for Training AI Agents Across a Wide Range of Domains, Datasets, and Tasks
    AI

    This AI Paper Proposes an Interactive Agent Foundation Model that Uses a Novel Multi-Task Agent Training Paradigm for Training AI Agents Across a Wide Range of Domains, Datasets, and Tasks

    Facebook Twitter Pinterest WhatsApp
    This AI Paper Proposes an Interactive Agent Foundation Model that Uses a Novel Multi-Task Agent Training Paradigm for Training AI Agents Across a Wide Range of Domains, Datasets, and Tasks
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    AI improvement is shifting from static, task-centric fashions to dynamic, adaptable agent-based methods appropriate for numerous functions. AI methods purpose to collect sensory knowledge and successfully interact with environments, a longstanding analysis aim. Developing generalist AI provides benefits, together with coaching a single neural mannequin throughout a number of duties and knowledge varieties. This method is very scalable by means of knowledge, computational assets, and mannequin parameters.

    Recent works spotlight the benefits of growing generalist AI methods by coaching a single neural mannequin throughout numerous duties and knowledge varieties, providing scalability by means of knowledge, compute, and mannequin parameters. However, challenges persist, as massive basis fashions usually produce hallucinations and infer incorrect data as a result of inadequate grounding in coaching environments. Current multimodal system approaches, counting on frozen pre-trained fashions for every modality, could perpetuate errors with out cross-modal pre-training.

    Researchers from  Stanford University, Microsoft Research, Redmond, and the University of California, Los Angeles, have proposed the Interactive Agent Foundation Model, which introduces a unified pre-training framework for processing textual content, visible knowledge, and actions, treating every as separate tokens. It makes use of pre-trained language and visual-language fashions to foretell masked tokens throughout all modalities. It permits interplay with people and environments, incorporating visual-language understanding. With 277M parameters collectively pre-trained throughout various domains, it engages successfully in multi-modal settings throughout numerous digital environments.

    The Interactive Agent Foundation Model initializes its structure with pre-trained CLIP ViT-B16 for visible encoding and OPT-125M for motion and language modeling. It incorporates cross-modal data sharing by means of a linear layer transformation. Due to reminiscence constraints, earlier actions and visible frames are included as enter, with a sliding window method. Sinusoidal positional embeddings are utilized for predicting masked seen tokens. Unlike prior fashions counting on frozen submodules, the complete mannequin is collectively skilled throughout pre-training.

    Evaluation throughout robotics, gaming, and healthcare duties demonstrates promising outcomes. Despite being outperformed in sure duties by different fashions as a result of much less knowledge for pre-training, the strategy showcases aggressive efficiency, particularly in robotics, the place it considerably surpasses a comparative mannequin. Fne-tuning the pre-trained mannequin proves notably efficient in gaming duties in comparison with coaching from scratch. In healthcare functions, the strategy outperforms a number of baselines leveraging CLIP and OPT for initialization, demonstrating the efficacy of its various pre-training method.

    In conclusion, Researchers proposed the Interactive Agent Foundation Model, which is adept at processing textual content, motion, and visible inputs and demonstrates effectiveness throughout various domains. Pre-training on a combination of robotics and gaming knowledge permits the mannequin to proficiently mannequin actions, even exhibiting constructive switch to healthcare duties throughout fine-tuning. Its broad applicability throughout decision-making contexts suggests potential for generalist brokers in multimodal methods, unlocking new alternatives for AI development.


    Check out the Paper. All credit score for this analysis goes to the researchers of this challenge. Also, don’t neglect to comply with us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our publication..

    Don’t Forget to hitch our Telegram Channel


    Asjad is an intern advisor at Marktechpost. He is persuing B.Tech in mechanical engineering on the Indian Institute of Technology, Kharagpur. Asjad is a Machine studying and deep studying fanatic who’s at all times researching the functions of machine studying in healthcare.


    🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    Samsung Food Is Coming! AI-powered Platform To Launch At IFA 2023

    Samsung is gearing as much as introduce an progressive meals integration platform known as Samsung…

    Crypto

    Valkyrie Halts Purchase Of ETH Futures Contracts

    Asset administration agency Valkyrie, one of many frontrunners for the primary Ethereum ETF (exchange-traded fund)…

    Gadgets

    Quake II gets a remaster for PC and consoles—and it’s exactly what it needs to be

    Quake II’s remastered version trailer In a shock announcement at QuakeCon, writer Bethesda Softworks introduced…

    Science

    Study: the best free-throw shooters share these biomechanical traits

    There’s hardly ever time to write down about each cool science-y story that comes our…

    Technology

    Zotac releases 4th-generation backpack PC for mobile VR

    In temporary: Most folks have possible forgotten about PC makers’ makes an attempt to ship…

    Our Picks
    Crypto

    NodeShift wants to challenge the hyperscalers with its decentralized cloud

    Technology

    Filing: Didi plans to sell its smart car development arm to Xpeng, one of China's largest EV makers, for ~$744M in Xpeng stock, or a ~3.25% stake (Bloomberg)

    Science

    The Ingenuity helicopter’s Mars mission is over, but it left a legacy

    Categories
    • AI (1,493)
    • Crypto (1,754)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,867)
    • Technology (1,803)
    • The Future (1,649)
    Most Popular
    Science

    Plant-based cheese may be getting more appetizing

    Crypto

    Binance Founder’s Lawyers Squash Flight Risk Speculations

    Gadgets

    Dealmaster: Apple watches, TV mega-deals, headphone sales, and more

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.