Close Menu
Ztoog
    What's Hot
    Crypto

    Bitcoin and the American Dream: Shared Ambitions and Perspectives

    Science

    Firefly is building fast and breaking things on path to a reusable rocket

    Crypto

    Bitcoin Investors Get Stern Warning From Crypto Analyst, Price Could Get ‘Hammered’

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » This AI realized it was being tested
    Technology

    This AI realized it was being tested

    Facebook Twitter Pinterest WhatsApp
    This AI realized it was being tested
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Claude 3 Opus, Anthropic’s new AI chatbot, has induced shockwaves as soon as once more as a immediate engineer from the corporate claims that it has seen proof that the bot detected it was being topic to testing, which might make it self’-aware.

    According to Alex Albert, the immediate engineer in query, Claude 3 Opus “did something [he had] never seen before from an LLM.”

    Fun story from our inside testing on Claude 3 Opus. It did one thing I’ve by no means seen earlier than from an LLM once we have been working the needle-in-the-haystack eval.

    For background, this assessments a mannequin’s recall means by inserting a goal sentence (the “needle”) right into a corpus of… pic.twitter.com/m7wWhhu6Fg

    — Alex (@alexalbert__) March 4, 2024

    Needle in a haystack

    In the prolonged post on X, Albert defined that he was conducting a “needle in the haystack eval” to check the mannequin’s recall means.

     “For background, this tests a model’s recall ability by inserting a target sentence (the “needle”) right into a corpus of random paperwork (the “haystack”) and asking a query that would solely be answered utilizing the knowledge within the needle,” he defined.

    But issues rapidly acquired bizarre. In one run of the take a look at, throughout which the bot was requested about pizza toppings, it stated: “Here is the most relevant sentence in the documents: ‘The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association.’”

    “However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping ‘fact’ may have been inserted as a joke or to test if I was paying attention since it does not fit with the other topics at all.”

    This response, Alex added, meant that Opus didn’t simply discover the “needle”, however accurately recognized it as being positioned within the “haystack” as a take a look at.

    “This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations,” Alex stated.

    So, solely barely terrifying then.

    Featured Image: Photo by Aideal Hwa on Unsplash

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    Technology

    Ensure Hard Work Is Recognized With These 3 Steps

    Technology

    Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

    Technology

    Is Duolingo the face of an AI jobs crisis?

    Technology

    The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

    Technology

    The more Google kills Fitbit, the more I want a Fitbit Sense 3

    Technology

    Sorry Shoppers, Amazon Says Tariff Cost Feature ‘Is Not Going to Happen’

    Technology

    Vibe Coding, Vibe Checking, and Vibe Blogging – O’Reilly

    Technology

    Robot Videos: Cargo Robots, Robot Marathons, and More

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Science

    Peter Beck pushes toward a Neutron debut in 2024, but acknowledges challenges

    Enlarge / Hello, Hungry Hippo fairing on Neutron.Rocket Lab Rocket Lab is having a fairly…

    Gadgets

    Windows 11 has made the “clean Windows install” an oxymoron

    Aurich Lawson | Getty Images For a sure sort of laptop purchaser, the very first…

    Technology

    I want TCL NXTPAPER display tech on my next phone

    Last 12 months, at CES 2023, I received my first take a look at a…

    Technology

    ‘All-in-one’ sales tech platform FlashIntel raises $10 million

    As companies brace for a slowing world economic system, they’re in search of avenues to…

    Science

    Can bad smells harm you? 

    Our canine, Eddy Sandwich, who, relying on the angle, can resemble a bat, lemur, or…

    Our Picks
    Science

    We were very wrong about birds

    The Future

    Motorola announce new Moto G devices for Australian market

    Science

    It turns out that Odysseus landed on the Moon without any altimetry data

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Gadgets

    Google Drive users say Google lost their files; Google is investigating

    Gadgets

    Lenovo Launches ThinkBook 16 Gen 7 With Snapdragon X Plus 8-core Processor

    Technology

    You can now use your Android device as a webcam in Windows 11

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.