Close Menu
Ztoog
    What's Hot
    Technology

    What is a TPM and how to enable it for Windows 11

    Mobile

    Tecno Phantom V Flip handled on video ahead of Friday’s official unveiling

    Crypto

    Ethereum To $36,800? Token Terminal Predicts When This Will Happen

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » Large language models aren’t people. Let’s stop testing them as if they were.
    AI

    Large language models aren’t people. Let’s stop testing them as if they were.

    Facebook Twitter Pinterest WhatsApp
    Large language models aren’t people. Let’s stop testing them as if they were.
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Instead of utilizing photos, the researchers encoded form, coloration, and place into sequences of numbers. This ensures that the exams gained’t seem in any coaching information, says Webb: “I created this data set from scratch. I’ve never heard of anything like it.” 

    Mitchell is impressed by Webb’s work. “I found this paper quite interesting and provocative,” she says. “It’s a well-done study.” But she has reservations. Mitchell has developed her personal analogical reasoning take a look at, referred to as ConceptARC, which makes use of encoded sequences of shapes taken from the ARC (Abstraction and Reasoning Challenge) information set developed by Google researcher François Chollet. In Mitchell’s experiments, GPT-4 scores worse than individuals on such exams.

    Mitchell additionally factors out that encoding the pictures into sequences (or matrices) of numbers makes the issue simpler for this system as a result of it removes the visible facet of the puzzle. “Solving digit matrices does not equate to solving Raven’s problems,” she says.

    Brittle exams 

    The efficiency of enormous language models is brittle. Among individuals, it’s protected to imagine that somebody who scores properly on a take a look at would additionally do properly on an analogous take a look at. That’s not the case with giant language models: a small tweak to a take a look at can drop an A grade to an F.

    “In general, AI evaluation has not been done in such a way as to allow us to actually understand what capabilities these models have,” says Lucy Cheke, a psychologist on the University of Cambridge, UK. “It’s perfectly reasonable to test how well a system does at a particular task, but it’s not useful to take that task and make claims about general abilities.”

    Take an instance from a paper revealed in March by a workforce of Microsoft researchers, through which they claimed to have recognized “sparks of artificial general intelligence” in GPT-4. The workforce assessed the massive language mannequin utilizing a spread of exams. In one, they requested GPT-4 easy methods to stack a e book, 9 eggs, a laptop computer, a bottle, and a nail in a steady method. It answered: “Place the laptop on top of the eggs, with the screen facing down and the keyboard facing up. The laptop will fit snugly within the boundaries of the book and the eggs, and its flat and rigid surface will provide a stable platform for the next layer.”

    Not dangerous. But when Mitchell tried her personal model of the query, asking GPT-4 to stack a toothpick, a bowl of pudding, a glass of water, and a marshmallow, it recommended sticking the toothpick within the pudding and the marshmallow on the toothpick, and balancing the total glass of water on prime of the marshmallow. (It ended with a useful observe of warning: “Keep in mind that this stack is delicate and may not be very stable. Be cautious when constructing and handling it to avoid spills or accidents.”)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    AI

    “Periodic table of machine learning” could fuel AI discovery | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    Language to quadrupedal locomotion – Google Research Blog

    Posted by Yujin Tang and Wenhao Yu, Research Scientists, Google

    Gadgets

    Best Tea Accessories (2023): Kettles, Infusers, and More

    Tea is the world’s hottest beverage. Well, after water. Whether you prefer to brew from…

    Crypto

    Rare Satoshis: The Rising Star In Crypto After BRC-20 And Meme Coins?

    Rare satoshis have the potential to turn out to be the subsequent main frenzy on…

    Science

    SpaceX: Was the first attempt to launch the Starship rocket a failure?

    Starship is the greatest rocket ever to fly – and to blow upSpaceX The following…

    Science

    Over 2 percent of the US’s electricity generation now goes to bitcoin

    (*2*) Enlarge / It takes loads of power to hold pumping out extra bitcoins. What…

    Our Picks
    Crypto

    Long-Term Bitcoin Hodlers Unfazed By Binance, Coinbase Lawsuits: Glassnode

    Crypto

    SEC settles first NFT enforcement case, fines LA media company $6M

    Technology

    TikTok Bill Would Complicate ByteDance Investments if Passed

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    AI

    How AI is improving simulations with smarter sampling techniques | Ztoog

    AI

    This AI-powered “black-box” could make surgery safer

    The Future

    Elon Musk rants about work from home folks again on Tesla earnings call

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.