Close Menu
Ztoog
    What's Hot
    Crypto

    Resy and Eater co-founder raises $24M for Blackbird, a restaurant loyalty platform

    Gadgets

    Samsung Health Introduces Medications Tracking Feature

    Technology

    Kevin Mitnick, Hacker Who Eluded Authorities, Is Dead at 59

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      SEC Vs. Justin Sun Case Ends In $10M Settlement

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

    Ztoog
    Home » Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites
    AI

    Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites

    Facebook Twitter Pinterest WhatsApp
    Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Existing internet brokers face limitations that stem from the actual fact that these brokers usually depend on a single enter modality and are examined in managed environments, like internet simulators or static snapshots, which don’t precisely mirror the complexity and dynamic nature of real-world internet interactions. This considerably restricts their applicability and effectiveness in real-world situations the place dynamic interactions with internet content material are required. This creates a spot of their sensible utility, as they can not successfully navigate and work together with the various and ever-evolving content material discovered on precise web sites.

    Previous works in internet brokers have centered on autonomous navigation and interplay with internet environments. Key developments embrace WebGPT and WebAgent, which leverage GPT-3 and T5 fashions for text-based internet looking and HTML snippet extraction. There’s additionally a rising curiosity in multimodal internet brokers, like WebGUM combining T5 with Vision Transformers and PIX2ACT utilizing internet screenshots. These efforts distinction earlier single-modality or simplified internet atmosphere approaches, shifting in direction of extra real looking and dynamic internet interactions. Concurrently, giant multimodal fashions (LMMs) like GPT-4V have proven strong multimodal comprehension, laying the groundwork for extra refined internet brokers.

    Researchers from Zhejiang University, Tencent AI Lab, and Westlake University have proposed the event of WebVoyager, an LMM powered internet agent that can full consumer directions end-to-end by interacting with real-world web sites. They have proposed a brand new analysis protocol that leverages the strong multimodal comprehension capabilities of GPT-4V and features a benchmark of real-world duties from 15 extensively used web sites. The agent’s interplay with the Apple web site is demonstrated step by step, exhibiting an optimum path with out redundant actions.

    The analysis set is constructed utilizing a mixture of self-instruct and human verification strategies. Tasks are sampled and rewritten from numerous web sites, guaranteeing prime quality and relevance. Human validation is carried out to confirm the generated duties and make sure the solutions can be discovered on the corresponding web sites. Human analysis is the primary metric, the place knowledgeable annotators decide activity success based mostly on the agent’s interplay with the online. Interestingly, it makes use of GPT-4V for computerized analysis, aiming to scale back the reliance on human evaluators and experiment prices.

    WebVoyager achieved a 55.7% activity success fee, outperforming GPT-4 and its text-only variant. The computerized analysis protocol utilizing GPT-4V aligned carefully with human judgment, exhibiting an 85.3% settlement fee. Despite its sturdy efficiency on most web site duties, WebVoyager encountered challenges with text-heavy websites like Cambridge Dictionary and Wolfram Alpha. The agent’s consistency improved with extra data, reaching a Kappa rating of 0.7, matching human settlement ranges, and highlighting GPT-4V’s potential for environment friendly, large-scale evaluations of internet brokers.

    In conclusion, WebVoyager is an LMM-powered internet agent designed for end-to-end internet activity decision, with a 55.7% activity success fee. Still, there may be room for enchancment, as indicated by the excellent Error Analysis offered within the paper. Researchers allude that future work ought to give attention to higher integration strategies for visible and textual data and exploring the creation of multi-modal internet brokers utilizing open-sourced LMMs.


    Check out the Paper. All credit score for this analysis goes to the researchers of this challenge. Also, don’t overlook to observe us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our publication..

    Don’t Forget to hitch our Telegram Channel


    Nikhil is an intern advisor at Marktechpost. He is pursuing an built-in twin diploma in Materials on the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML fanatic who’s at all times researching functions in fields like biomaterials and biomedical science. With a robust background in Material Science, he’s exploring new developments and creating alternatives to contribute.


    🎯 [FREE AI WEBINAR] ‘Create Embeddings on Real-Time Data with OpenAI & SingleStore Job Service’ (Jan 31, 2024)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    Efficient parallel audio generation – Ztoog

    Posted by Zalán Borsos, Research Software Engineer, and Marco Tagliasacchi, Senior Staff Research Scientist, Google…

    Science

    Starliner Faces an Indefinite Wait in Space While NASA Investigates Its Faults

    In an replace launched late Friday night, NASA mentioned it was “adjusting” the date of…

    AI

    Can we fix AI’s evaluation crisis?

    As a tech reporter I typically get requested questions like “Is DeepSeek actually better than…

    Mobile

    Garmin’s JL Audio acquisition sounds promising, but what’s the gameplan?

    What that you must knowGarmin has acquired audio producer JL Audio, which makes a speciality…

    Mobile

    Xiaomi Watch S3 in for review

    Xiaomi introduced the Watch S3 to the worldwide viewers in Barcelona throughout its MWC keynote.…

    Our Picks
    Crypto

    Ethereum Foundation prioritizes security, targets 128-bit rule by 2026

    Mobile

    I tried a self-hosted Google Photos alternative but still can’t switch

    Science

    The Climate Crisis Is Driving People to Substance Abuse

    Categories
    • AI (1,560)
    • Crypto (1,827)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    The Future

    Ignore your fitness tracker and walk to Mordor instead

    Gadgets

    Galaxy A15, A15 5G And A25 5G Are Official! Check Out Specs And Pricing

    Technology

    Mass layoffs hit the gaming industry: 10,100 jobs lost this year so far, compared to 10,500 in all of 2023

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.