Close Menu
Ztoog
    What's Hot
    Gadgets

    14 big landlords used software to collude on rent prices, DC lawsuit says

    Gadgets

    Lava Teases ProWatch XN: The Next Generation Smartwatch With Gorilla Glass 3

    The Future

    Best of Show: Our Most Exciting, Innovative and Impactful Tech Winners of CES 2024

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites
    AI

    Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites

    Facebook Twitter Pinterest WhatsApp
    Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Existing internet brokers face limitations that stem from the actual fact that these brokers usually depend on a single enter modality and are examined in managed environments, like internet simulators or static snapshots, which don’t precisely mirror the complexity and dynamic nature of real-world internet interactions. This considerably restricts their applicability and effectiveness in real-world situations the place dynamic interactions with internet content material are required. This creates a spot of their sensible utility, as they can not successfully navigate and work together with the various and ever-evolving content material discovered on precise web sites.

    Previous works in internet brokers have centered on autonomous navigation and interplay with internet environments. Key developments embrace WebGPT and WebAgent, which leverage GPT-3 and T5 fashions for text-based internet looking and HTML snippet extraction. There’s additionally a rising curiosity in multimodal internet brokers, like WebGUM combining T5 with Vision Transformers and PIX2ACT utilizing internet screenshots. These efforts distinction earlier single-modality or simplified internet atmosphere approaches, shifting in direction of extra real looking and dynamic internet interactions. Concurrently, giant multimodal fashions (LMMs) like GPT-4V have proven strong multimodal comprehension, laying the groundwork for extra refined internet brokers.

    Researchers from Zhejiang University, Tencent AI Lab, and Westlake University have proposed the event of WebVoyager, an LMM powered internet agent that can full consumer directions end-to-end by interacting with real-world web sites. They have proposed a brand new analysis protocol that leverages the strong multimodal comprehension capabilities of GPT-4V and features a benchmark of real-world duties from 15 extensively used web sites. The agent’s interplay with the Apple web site is demonstrated step by step, exhibiting an optimum path with out redundant actions.

    The analysis set is constructed utilizing a mixture of self-instruct and human verification strategies. Tasks are sampled and rewritten from numerous web sites, guaranteeing prime quality and relevance. Human validation is carried out to confirm the generated duties and make sure the solutions can be discovered on the corresponding web sites. Human analysis is the primary metric, the place knowledgeable annotators decide activity success based mostly on the agent’s interplay with the online. Interestingly, it makes use of GPT-4V for computerized analysis, aiming to scale back the reliance on human evaluators and experiment prices.

    WebVoyager achieved a 55.7% activity success fee, outperforming GPT-4 and its text-only variant. The computerized analysis protocol utilizing GPT-4V aligned carefully with human judgment, exhibiting an 85.3% settlement fee. Despite its sturdy efficiency on most web site duties, WebVoyager encountered challenges with text-heavy websites like Cambridge Dictionary and Wolfram Alpha. The agent’s consistency improved with extra data, reaching a Kappa rating of 0.7, matching human settlement ranges, and highlighting GPT-4V’s potential for environment friendly, large-scale evaluations of internet brokers.

    In conclusion, WebVoyager is an LMM-powered internet agent designed for end-to-end internet activity decision, with a 55.7% activity success fee. Still, there may be room for enchancment, as indicated by the excellent Error Analysis offered within the paper. Researchers allude that future work ought to give attention to higher integration strategies for visible and textual data and exploring the creation of multi-modal internet brokers utilizing open-sourced LMMs.


    Check out the Paper. All credit score for this analysis goes to the researchers of this challenge. Also, don’t overlook to observe us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our publication..

    Don’t Forget to hitch our Telegram Channel


    Nikhil is an intern advisor at Marktechpost. He is pursuing an built-in twin diploma in Materials on the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML fanatic who’s at all times researching functions in fields like biomaterials and biomedical science. With a robust background in Material Science, he’s exploring new developments and creating alternatives to contribute.


    🎯 [FREE AI WEBINAR] ‘Create Embeddings on Real-Time Data with OpenAI & SingleStore Job Service’ (Jan 31, 2024)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    This AI Paper from MIT Explores the Scaling of Deep Learning Models for Chemistry Research

    Researchers from MIT investigated the scaling conduct of giant chemical language fashions, specializing in each…

    Technology

    IEEE Foundation Day Marks a Half Century of Philanthropy

    In honor of the IEEE Foundation’s fiftieth anniversary, 16 February has been established as IEEE…

    Mobile

    Motorola’s latest budget Android phones make their European debut

    What you have to knowThe Moto G84 is the most costly of the pack, retailing…

    Mobile

    Samsung confirms Galaxy Z Flip 5, Fold 5 launch details

    (*5*) What you want to knowSamsung has introduced that the launch of its subsequent foldable…

    AI

    The AI Cousin of Michelangelo: Neuralangelo is an AI Model That can Achieve High-Fidelity 3D Surface Reconstruction [Code Included]

    Neural networks have superior fairly considerably in recent times, and so they have discovered themselves…

    Our Picks
    The Future

    D-Link’s new Aquila Pro M30 Wi-Fi 6 Mesh System looks the goods

    Science

    Radiation-Detection Systems Are Quietly Running in the Background All Around You

    Mobile

    Microsoft patent application suggests a true foldable phone is coming with a thin form factor, more

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    Science

    SpaceX’s workhorse launch pad now has the accoutrements for astronauts

    Science

    Photos of the Perseid meteor shower from around the world

    Gadgets

    ASUS ROG Ally review: The best way to game on the go

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.