Close Menu
Ztoog
    What's Hot
    The Future

    Social media companies change their policies in the wake of bad press

    Crypto

    Bullish Signal: Bitcoin Could Reach $30,000 As BTC Continues To Defy Shorters

    The Future

    Human Emotions Integration: Can it Be A Challenge for AI?

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How to Get Bot Lobbies in Fortnite? (2025 Guide)

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

    • Technology

      What does a millennial midlife crisis look like?

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

    • Gadgets

      Watch Apple’s WWDC 2025 keynote right here

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

    • Mobile

      YouTube is testing a leaderboard to show off top live stream fans

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

    • Science

      Some parts of Trump’s proposed budget for NASA are literally draconian

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

    • AI

      Fueling seamless AI at scale

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » Large language models aren’t people. Let’s stop testing them as if they were.
    AI

    Large language models aren’t people. Let’s stop testing them as if they were.

    Facebook Twitter Pinterest WhatsApp
    Large language models aren’t people. Let’s stop testing them as if they were.
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Instead of utilizing photos, the researchers encoded form, coloration, and place into sequences of numbers. This ensures that the exams gained’t seem in any coaching information, says Webb: “I created this data set from scratch. I’ve never heard of anything like it.” 

    Mitchell is impressed by Webb’s work. “I found this paper quite interesting and provocative,” she says. “It’s a well-done study.” But she has reservations. Mitchell has developed her personal analogical reasoning take a look at, referred to as ConceptARC, which makes use of encoded sequences of shapes taken from the ARC (Abstraction and Reasoning Challenge) information set developed by Google researcher François Chollet. In Mitchell’s experiments, GPT-4 scores worse than individuals on such exams.

    Mitchell additionally factors out that encoding the pictures into sequences (or matrices) of numbers makes the issue simpler for this system as a result of it removes the visible facet of the puzzle. “Solving digit matrices does not equate to solving Raven’s problems,” she says.

    Brittle exams 

    The efficiency of enormous language models is brittle. Among individuals, it’s protected to imagine that somebody who scores properly on a take a look at would additionally do properly on an analogous take a look at. That’s not the case with giant language models: a small tweak to a take a look at can drop an A grade to an F.

    “In general, AI evaluation has not been done in such a way as to allow us to actually understand what capabilities these models have,” says Lucy Cheke, a psychologist on the University of Cambridge, UK. “It’s perfectly reasonable to test how well a system does at a particular task, but it’s not useful to take that task and make claims about general abilities.”

    Take an instance from a paper revealed in March by a workforce of Microsoft researchers, through which they claimed to have recognized “sparks of artificial general intelligence” in GPT-4. The workforce assessed the massive language mannequin utilizing a spread of exams. In one, they requested GPT-4 easy methods to stack a e book, 9 eggs, a laptop computer, a bottle, and a nail in a steady method. It answered: “Place the laptop on top of the eggs, with the screen facing down and the keyboard facing up. The laptop will fit snugly within the boundaries of the book and the eggs, and its flat and rigid surface will provide a stable platform for the next layer.”

    Not dangerous. But when Mitchell tried her personal model of the query, asking GPT-4 to stack a toothpick, a bowl of pudding, a glass of water, and a marshmallow, it recommended sticking the toothpick within the pudding and the marshmallow on the toothpick, and balancing the total glass of water on prime of the marshmallow. (It ended with a useful observe of warning: “Keep in mind that this stack is delicate and may not be very stable. Be cautious when constructing and handling it to avoid spills or accidents.”)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Fueling seamless AI at scale

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Crypto

    Bloomberg Analyst Cuts Probability of Bitcoin Spot ETF Rejection to 5%

    Popular Bloomberg ETF analyst Eric Balchunas has lowered the likelihood of the US Securities and…

    Mobile

    OnePlus 13: Rumors, specs, and everything we want to see

    OnePlus simply launched the OnePlus 12 globally, and the cellphone has all of the options…

    Technology

    Hiboy P6 Fat Tire Electric Bike | Review

    Looking for a brand new mode of journey round city? Or possibly off-road journeys are…

    Gadgets

    Virgin Galactic Fulfills Decades-Long Promise With Inaugural Space Tourism Launch

    Virgin Galactic, the house tourism enterprise based by British tycoon Richard Branson, has, finally, achieved…

    Mobile

    Samsung rolls out One UI 6.1.1 update for Galaxy S24 series, Flip5 and Fold5

    Samsung’s customized Android pores and skin, One UI 6.1.1, which was launched with the Galaxy…

    Our Picks
    Technology

    Apple Watch Protection: Is Insurance Worth It?

    Technology

    Investors and analysts expect the Israel-Hamas war to derail the fragile recovery of Israel's tech sector, which accounts for 14% of the country's workforce (Steven Scheer/Reuters)

    Technology

    Generative AI as Learning Tool – O’Reilly

    Categories
    • AI (1,494)
    • Crypto (1,754)
    • Gadgets (1,806)
    • Mobile (1,852)
    • Science (1,868)
    • Technology (1,804)
    • The Future (1,650)
    Most Popular
    Mobile

    Meizu 21x charging revealed, the company isn’t quitting on smartphones just yet

    The Future

    TimeCamp vs Hubstaff: 2023 Comparison

    The Future

    This Single Device Has Cut My Phone Time in Half in 2024

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.