Close Menu
Ztoog
    What's Hot
    Crypto

    Cryptoquant Founder Explains Why Mt. Gox’s 47,000 BTC Move Won’t Affect Price

    Mobile

    Samsung Galaxy A15 and A25 arrive in Switzerland

    Mobile

    YouTube Premium gets Jump ahead button, PiP and smart downloads for Shorts

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

    • Technology

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

      Forget screens: more details emerge on the mysterious Jony Ive + OpenAI device

    • Science

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites
    AI

    Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites

    Facebook Twitter Pinterest WhatsApp
    Meet WebVoyager: An Innovative Large Multimodal Model (LMM) Powered Web Agent that can Complete User Instructions End-to-End by Interacting with Real-World Websites
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Existing internet brokers face limitations that stem from the actual fact that these brokers usually depend on a single enter modality and are examined in managed environments, like internet simulators or static snapshots, which don’t precisely mirror the complexity and dynamic nature of real-world internet interactions. This considerably restricts their applicability and effectiveness in real-world situations the place dynamic interactions with internet content material are required. This creates a spot of their sensible utility, as they can not successfully navigate and work together with the various and ever-evolving content material discovered on precise web sites.

    Previous works in internet brokers have centered on autonomous navigation and interplay with internet environments. Key developments embrace WebGPT and WebAgent, which leverage GPT-3 and T5 fashions for text-based internet looking and HTML snippet extraction. There’s additionally a rising curiosity in multimodal internet brokers, like WebGUM combining T5 with Vision Transformers and PIX2ACT utilizing internet screenshots. These efforts distinction earlier single-modality or simplified internet atmosphere approaches, shifting in direction of extra real looking and dynamic internet interactions. Concurrently, giant multimodal fashions (LMMs) like GPT-4V have proven strong multimodal comprehension, laying the groundwork for extra refined internet brokers.

    Researchers from Zhejiang University, Tencent AI Lab, and Westlake University have proposed the event of WebVoyager, an LMM powered internet agent that can full consumer directions end-to-end by interacting with real-world web sites. They have proposed a brand new analysis protocol that leverages the strong multimodal comprehension capabilities of GPT-4V and features a benchmark of real-world duties from 15 extensively used web sites. The agent’s interplay with the Apple web site is demonstrated step by step, exhibiting an optimum path with out redundant actions.

    The analysis set is constructed utilizing a mixture of self-instruct and human verification strategies. Tasks are sampled and rewritten from numerous web sites, guaranteeing prime quality and relevance. Human validation is carried out to confirm the generated duties and make sure the solutions can be discovered on the corresponding web sites. Human analysis is the primary metric, the place knowledgeable annotators decide activity success based mostly on the agent’s interplay with the online. Interestingly, it makes use of GPT-4V for computerized analysis, aiming to scale back the reliance on human evaluators and experiment prices.

    WebVoyager achieved a 55.7% activity success fee, outperforming GPT-4 and its text-only variant. The computerized analysis protocol utilizing GPT-4V aligned carefully with human judgment, exhibiting an 85.3% settlement fee. Despite its sturdy efficiency on most web site duties, WebVoyager encountered challenges with text-heavy websites like Cambridge Dictionary and Wolfram Alpha. The agent’s consistency improved with extra data, reaching a Kappa rating of 0.7, matching human settlement ranges, and highlighting GPT-4V’s potential for environment friendly, large-scale evaluations of internet brokers.

    In conclusion, WebVoyager is an LMM-powered internet agent designed for end-to-end internet activity decision, with a 55.7% activity success fee. Still, there may be room for enchancment, as indicated by the excellent Error Analysis offered within the paper. Researchers allude that future work ought to give attention to higher integration strategies for visible and textual data and exploring the creation of multi-modal internet brokers utilizing open-sourced LMMs.


    Check out the Paper. All credit score for this analysis goes to the researchers of this challenge. Also, don’t overlook to observe us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our publication..

    Don’t Forget to hitch our Telegram Channel


    Nikhil is an intern advisor at Marktechpost. He is pursuing an built-in twin diploma in Materials on the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML fanatic who’s at all times researching functions in fields like biomaterials and biomedical science. With a robust background in Material Science, he’s exploring new developments and creating alternatives to contribute.


    🎯 [FREE AI WEBINAR] ‘Create Embeddings on Real-Time Data with OpenAI & SingleStore Job Service’ (Jan 31, 2024)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    Is AI in the eye of the beholder?

    Someone’s prior beliefs about a synthetic intelligence agent, like a chatbot, have a major impact…

    AI

    Using deep learning to image the Earth’s planetary boundary layer | Ztoog

    Although the troposphere is commonly considered the closest layer of the environment to the Earth’s…

    AI

    Advances in private training for production on-device language models – Google Research Blog

    Posted by Zheng Xu, Research Scientist, and Yanxiang Zhang, Software Engineer, Google

    Science

    Ice-spewing supervolcano may have been found on Pluto

    A photograph of Pluto taken by NASA’s New Horizons missionNASA/Johns Hopkins University Applied Physics Laboratory/Southwest…

    AI

    Say Goodbye to Costly Auto-GPT and LangChain Runs: Meet ReWOO – The Game-Changing Modular Paradigm that Cuts Token Consumption by Detaching Reasoning from External Observations

    Large Language Models (LLMs) have efficiently catered their method into the difficult areas of Artificial…

    Our Picks
    Gadgets

    Lenovo seeks halt of Asus laptop sales over alleged patent infringement

    The Future

    Google Pixel 8 Pro vs. iPhone 15 Pro Max, Galaxy S23 Ultra: Top-End Phones Compared

    Science

    2024 is set to be the year of the moon, but let’s proceed with care

    Categories
    • AI (1,493)
    • Crypto (1,753)
    • Gadgets (1,805)
    • Mobile (1,850)
    • Science (1,866)
    • Technology (1,802)
    • The Future (1,648)
    Most Popular
    Crypto

    Is Ethereum Doomed? Whales Have Sold 12M ETH In Past Year

    Technology

    PlayStation 5 beta software adds Dolby Atmos, 8TB SSD support, and new UI features

    AI

    FlashSpeech: A Novel Speech Generation System that Significantly Reduces Computational Costs while Maintaining High-Quality Speech Output

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.