Close Menu
Ztoog
    What's Hot
    Science

    The 2024 US Open Is Designed to Thwart Golf’s Big Hitters

    Crypto

    How to Create a Cryptocurrency

    Science

    Parasites found in 200 million-year-old fossilized poop

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

    • Technology

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » CMU Researchers Introduce VisualWebArena: An AI Benchmark Designed to Evaluate the Performance of Multimodal Web Agents on Realistic and Visually Stimulating Challenges
    AI

    CMU Researchers Introduce VisualWebArena: An AI Benchmark Designed to Evaluate the Performance of Multimodal Web Agents on Realistic and Visually Stimulating Challenges

    Facebook Twitter Pinterest WhatsApp
    CMU Researchers Introduce VisualWebArena: An AI Benchmark Designed to Evaluate the Performance of Multimodal Web Agents on Realistic and Visually Stimulating Challenges
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    The subject of Artificial Intelligence (AI) has all the time had a long-standing purpose of automating on a regular basis pc operations utilizing autonomous brokers. Basically, the web-based autonomous brokers with the potential to cause, plan, and act are a possible means to automate a spread of pc operations. However, the principal impediment to engaging in this purpose is creating brokers that may function computer systems with ease, course of textual and visible inputs, perceive advanced pure language instructions, and perform actions to accomplish predetermined targets. The majority of presently current benchmarks on this space have predominantly concentrated on text-based brokers.

    In order to deal with these challenges, a workforce of researchers from Carnegie Mellon University has launched VisibleWebArea, a benchmark designed and developed to consider the efficiency of multimodal net brokers on lifelike and visually stimulating challenges. This benchmark consists of a variety of advanced web-based challenges that assess a number of features of autonomous multimodal brokers’ talents.

    In VisibleWebArea, brokers are required to learn image-text inputs precisely, decipher pure language directions, and carry out actions on web sites so as to accomplish user-defined targets. A complete evaluation has been carried out on the most superior Large Language Model (LLM)–primarily based autonomous brokers, which embrace many multimodal fashions. Text-only LLM brokers have been discovered to have sure limitations by each quantitative and qualitative evaluation. The gaps in the capabilities of the most superior multimodal language brokers have additionally been disclosed, thus providing insightful info.

    The workforce has shared that VisibleWebArea consists of 910 lifelike actions in three completely different on-line environments, i.e., Reddit, Shopping, and Classifieds. While the Shopping and Reddit environments are carried over from WebArea, the Classifieds surroundings is a brand new addition to real-world information. Unlike WebArea, which doesn’t have this visible want, all challenges supplied in VisibleWebArea are notable for being visually anchored and requiring a radical grasp of the content material for efficient decision. Since pictures are used as enter, about 25.2% of the duties require understanding interleaving.

    The research has completely in contrast the present state-of-the-art Large Language Models and Vision-Language Models (VLMs) in phrases of their autonomy. The outcomes have demonstrated that highly effective VLMs outperform text-based LLMs on VisibleWebArea duties. The highest-achieving VLM brokers have proven to attain successful charge of 16.4%, which is considerably decrease than the human efficiency of 88.7%.

    An vital discrepancy between open-sourced and API-based VLM brokers has additionally been discovered, highlighting the necessity of thorough evaluation metrics. A singular VLM agent has additionally been recommended, which attracts inspiration from the Set-of-Marks prompting technique. This new method has proven important efficiency advantages, particularly on graphically advanced net pages, by streamlining the motion area. By addressing the shortcomings of LLM brokers, this VLM agent has supplied a potential means to enhance the capabilities of autonomous brokers in visually advanced net contexts.

    In conclusion, VisibleWebArea is a tremendous resolution for offering a framework for assessing multimodal autonomous language brokers in addition to providing information which may be utilized to the creation of extra highly effective autonomous brokers for on-line duties.


    Check out the Paper and Github. All credit score for this analysis goes to the researchers of this challenge. Also, don’t overlook to observe us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our publication..

    Don’t Forget to be part of our Telegram Channel


    Tanya Malhotra is a ultimate yr undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
    She is a Data Science fanatic with good analytical and crucial pondering, together with an ardent curiosity in buying new expertise, main teams, and managing work in an organized method.


    🎯 [FREE AI WEBINAR] ‘Actions in GPTs: Developer Tips, Tricks & Techniques’ (Feb 12, 2024)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Crypto

    Will $0.055 Launch a Recovery Phase?

    Dogecoin (DOGE) has discovered itself in a precarious place, with its value being squeezed into…

    Mobile

    Samsung rep reveals Galaxy S23 line will receive first One UI 6.0/Android 14 Beta next week

    If free lips sink ships because the saying goes, Samsung Germany simply torpedoed a battleship…

    Mobile

    Xiaomi 13T caught in the wild

    The Xiaomi 13T has been dealt with and pictured in the wild. The photos beneath…

    Science

    2024 is set to be the year of the moon, but let’s proceed with care

    Joe Marino/UPI Credit/Alamy Live News THIS is the year of the moon, and it is…

    Mobile

    Samsung Galaxy S24 Ultra design revealed through CAD renders

    Right after the Galaxy S24’s design was revealed by SmartPrix in collaboration with @OnLeaks, it…

    Our Picks
    Mobile

    TikTokers worry that they won’t be able to “educate” kids if TikTok is banned

    The Future

    How IT Services are Empowering Businesses Worldwide

    AI

    FairProof: An AI System that Uses Zero-Knowledge Proofs to Publicly Verify the Fairness of a Model while Maintaining Confidentiality

    Categories
    • AI (1,493)
    • Crypto (1,753)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,866)
    • Technology (1,802)
    • The Future (1,648)
    Most Popular
    Gadgets

    Report: Google’s money was “key” factor in Apple rejecting Bing purchase

    Crypto

    Shiba Inu Set To Hit $100 Billion Market Cap, Expert Predicts

    AI

    Google DeepMind Introduces a Parameter-Efficient Expert Retrieval Mechanism that Leverages the Product Key Technique for Sparse Retrieval from a Million Tiny Experts

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.