Close Menu
Ztoog
    What's Hot
    The Future

    Late-Night Snack? Make It One of These 7 Expert-Approved Foods for Better Sleep

    AI

    Revolutionizing Adapter Techniques: Qualcomm AI’s Sparse High Rank Adapters (SHiRA) for Efficient and Rapid Deployment in Large Language Models

    AI

    Meta AI Researchers Open-Source Pearl: A Production-Ready Reinforcement Learning AI Agent Library

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

    • Technology

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

    • AI

      Fueling seamless AI at scale

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » Large language models aren’t people. Let’s stop testing them as if they were.
    AI

    Large language models aren’t people. Let’s stop testing them as if they were.

    Facebook Twitter Pinterest WhatsApp
    Large language models aren’t people. Let’s stop testing them as if they were.
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Instead of utilizing photos, the researchers encoded form, coloration, and place into sequences of numbers. This ensures that the exams gained’t seem in any coaching information, says Webb: “I created this data set from scratch. I’ve never heard of anything like it.” 

    Mitchell is impressed by Webb’s work. “I found this paper quite interesting and provocative,” she says. “It’s a well-done study.” But she has reservations. Mitchell has developed her personal analogical reasoning take a look at, referred to as ConceptARC, which makes use of encoded sequences of shapes taken from the ARC (Abstraction and Reasoning Challenge) information set developed by Google researcher François Chollet. In Mitchell’s experiments, GPT-4 scores worse than individuals on such exams.

    Mitchell additionally factors out that encoding the pictures into sequences (or matrices) of numbers makes the issue simpler for this system as a result of it removes the visible facet of the puzzle. “Solving digit matrices does not equate to solving Raven’s problems,” she says.

    Brittle exams 

    The efficiency of enormous language models is brittle. Among individuals, it’s protected to imagine that somebody who scores properly on a take a look at would additionally do properly on an analogous take a look at. That’s not the case with giant language models: a small tweak to a take a look at can drop an A grade to an F.

    “In general, AI evaluation has not been done in such a way as to allow us to actually understand what capabilities these models have,” says Lucy Cheke, a psychologist on the University of Cambridge, UK. “It’s perfectly reasonable to test how well a system does at a particular task, but it’s not useful to take that task and make claims about general abilities.”

    Take an instance from a paper revealed in March by a workforce of Microsoft researchers, through which they claimed to have recognized “sparks of artificial general intelligence” in GPT-4. The workforce assessed the massive language mannequin utilizing a spread of exams. In one, they requested GPT-4 easy methods to stack a e book, 9 eggs, a laptop computer, a bottle, and a nail in a steady method. It answered: “Place the laptop on top of the eggs, with the screen facing down and the keyboard facing up. The laptop will fit snugly within the boundaries of the book and the eggs, and its flat and rigid surface will provide a stable platform for the next layer.”

    Not dangerous. But when Mitchell tried her personal model of the query, asking GPT-4 to stack a toothpick, a bowl of pudding, a glass of water, and a marshmallow, it recommended sticking the toothpick within the pudding and the marshmallow on the toothpick, and balancing the total glass of water on prime of the marshmallow. (It ended with a useful observe of warning: “Keep in mind that this stack is delicate and may not be very stable. Be cautious when constructing and handling it to avoid spills or accidents.”)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Fueling seamless AI at scale

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    Samsung Galaxy S23 FE, Tab S9 FE, Buds FE (2023): Features, Specs, Price

    If you heard the phrases Fan Edition, chances are high you’d anticipate one thing extraordinary.…

    Mobile

    Weekly deals roundup: All about those Galaxy Z Flip 5, Z Fold 5, Watch 6, and Tab S9 launch offers

    With each Amazon’s large Prime Day sale and Best Buy’s equally expansive and beneficiant Black…

    The Future

    What to Look for When Choosing a Cloud-based Translation Solution for Your Company

    Today, most corporations promote their services in a giant variety of nations. It requires them…

    The Future

    Top 7 Omegle Alternatives for Engaging in Digital Conversations

    In at this time’s state of affairs, folks look to search out solace in the…

    Science

    NASA workers paint iconic logo onto Artemis II rocket boosters

    ART and science merge to spectacular impact in these images, just lately launched by NASA.…

    Our Picks
    The Future

    Barbarian is Getting a Video Game Adaptation

    AI

    This AI Paper Unveils How Multilingual Instruction-Tuning Boosts Cross-Lingual Understanding in Large Language Models

    Science

    Humans have evolution to thank for our ability to throw

    Categories
    • AI (1,494)
    • Crypto (1,754)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,867)
    • Technology (1,803)
    • The Future (1,649)
    Most Popular
    AI

    Health-specific embedding tools for dermatology and pathology – Google Research Blog

    Technology

    Five Great Microsoft Forms Features for Teachers

    Crypto

    Spot Ethereum ETFs Expected To Begin Trading On July 2, Can This Propel ETH To $10,000?

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.