Close Menu
Ztoog
    What's Hot
    Technology

    A timeline of Israel-Palestine peace negotiations

    Science

    The far north is burning—and turning up the heat on the planet

    Science

    A locally grown solution for period poverty

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI
    The Future

    OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI

    Facebook Twitter Pinterest WhatsApp
    OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    OpenAI introduced a breakthrough achievement for its new o3 AI model

    Rokas Tenys / Alamy

    OpenAI’s new o3 synthetic intelligence model has achieved a breakthrough excessive rating on a prestigious AI reasoning test referred to as the ARC Challenge, inspiring some AI followers to take a position that o3 has achieved synthetic basic intelligence (AGI). But at the same time as ARC Challenge organisers described o3’s achievement as a main milestone, in addition they cautioned that it has not received the competitors’s grand prize – and it’s only one step on the trail in the direction of AGI, a time period for hypothetical future AI with human-like intelligence.

    The o3 model is the newest in a line of AI releases that observe on from the massive language fashions powering ChatGPT. “This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models,” stated François Chollet, an engineer at Google and the principle creator of the ARC Challenge, in a weblog submit.

    What did OpenAI’s o3 model truly do?

    Chollet designed the Abstraction and Reasoning Corpus (ARC) Challenge in 2019 to test how effectively AIs can discover right patterns linking pairs of colored grids. Such visible puzzles are meant to make AIs reveal a type of basic intelligence with primary reasoning capabilities. But throwing sufficient computing energy on the puzzles may let even a non-reasoning program merely resolve them via brute pressure. To stop this, the competitors additionally requires official rating submissions to fulfill sure limits on computing energy.

    OpenAI’s newly introduced o3 model – which is scheduled for launch in early 2025 – achieved its official breakthrough rating of 75.7 per cent on the ARC Challenge’s “semi-private” test, which is used for rating opponents on a public leaderboard. The computing value of its achievement was roughly $20 for every visible puzzle activity, assembly the competitors’s restrict of lower than $10,000 whole. However, the tougher “private” test that’s used to find out grand prize winners has an much more stringent computing energy restrict, equal to spending simply 10 cents on every activity, which OpenAI did not meet.

    The o3 model additionally achieved an unofficial rating of 87.5 per cent by making use of roughly 172 occasions extra computing energy than it did on the official rating. For comparability, the everyday human rating is 84 per cent, and an 85 per cent rating is sufficient to win the ARC Challenge’s $600,000 grand prize – if the model may maintain its computing prices inside the required limits.

    But to achieve its unofficial rating, o3’s value soared to 1000’s of {dollars} spent fixing every activity. OpenAI requested that the problem organisers not publish the precise computing prices.

    Does this o3 achievement present that AGI has been reached?

    No, the ARC problem organisers have particularly stated they do not take into account beating this competitors benchmark to be an indicator of having achieved AGI.

    The o3 model additionally failed to resolve greater than 100 visible puzzle duties, even when OpenAI utilized a very great amount of computing energy towards the unofficial rating, stated Mike Knoop, an ARC Challenge organiser at software program firm Zapier, in a social media submit on X.

    In a social media submit on Bluesky, Melanie Mitchell on the Santa Fe Institute in New Mexico stated the next about o3’s progress on the ARC benchmark: “I think solving these tasks by brute-force compute defeats the original purpose”.

    “While the new model is very impressive and represents a big milestone on the way towards AGI, I don’t believe this is AGI – there’s still a fair number of very easy [ARC Challenge] tasks that o3 can’t solve,” stated Chollet in one other X submit.

    However, Chollet described how we’d know when human-level intelligence has been demonstrated by some type of AGI. “You’ll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible,” he stated within the weblog submit.

    Thomas Dietterich at Oregon State University suggests one other technique to recognise AGI. “Those architectures claim to include all of the functional components required for human cognition,” he says. “By this measure, the commercial AI systems are missing episodic memory, planning, logical reasoning and, most importantly, meta-cognition.”

    So what does o3’s excessive rating actually imply?

    The o3 model’s excessive rating comes because the tech business and AI researchers have been reckoning with a slower tempo of progress within the newest AI fashions for 2024, in contrast with the preliminary explosive developments of 2023.

    Although it did not win the ARC Challenge, o3’s excessive rating signifies that AI fashions may beat the competitors benchmark within the close to future. Beyond its unofficial excessive rating, Chollet says many official low-compute submissions have already scored above 81 per cent on the non-public analysis test set.

    Dietterich additionally thinks that “this is a very impressive leap in performance”. However, he cautions that, with out understanding extra about how OpenAI’s o1 and o3 fashions work, it’s unimaginable to guage simply how spectacular the excessive rating is. For occasion, if o3 was in a position to practise the ARC issues prematurely, then that may make its achievement simpler. “We will need to await an open-source replication to understand the full significance of this,” says Dietterich.

    The ARC Challenge organisers are already trying to launch a second and tougher set of benchmark checks someday in 2025. They may even maintain the ARC Prize 2025 problem operating till somebody achieves the grand prize and open-sources their answer.

    Topics:

    • synthetic intelligence/
    • AI
    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    The Future

    How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

    The Future

    Is it the best tool for 2025?

    The Future

    The clocks that helped define time from London’s Royal Observatory

    The Future

    Summer Movies Are Here, and So Are the New Popcorn Buckets

    The Future

    India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    The Future

    Meta says its Llama AI models have been downloaded 1.2B times

    The Future

    Your Kidneys Deserve Better — These 13 Superfoods Can Help

    The Future

    Oclean announces 50% off sale for Black Friday at Shaver Shop

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    DeepMind AI with built-in fact-checker makes mathematical discoveries

    DeepMind’s FunSearch AI can deal with mathematical issuesalengo/Getty Images Google DeepMind claims to have made…

    Technology

    As Half-Life 2 turns 20, Valve celebrates with a free to download weekend, new Episode 3 footage reveal

    What simply occurred? Valve is celebrating Half-Life 2’s twentieth anniversary with a large replace, a…

    Science

    These tiny worms are no match for carnivorous fungi

    If nematodes have nightmares, they may be dreaming concerning the terror of being eaten alive…

    Technology

    TurboTax-maker Intuit offers an AI agent that provides financial tips

    On Wednesday, TurboTax-maker Intuit launched an AI assistant referred to as “Intuit Assist” that can…

    Crypto

    Asymmetric Financial has a plan to unlock Bitcoin’s trillion-dollar potential with dedicated DeFi fund

    As the digital asset trade picks up steam once more, some crypto funds are wanting…

    Our Picks
    Crypto

    VanEck Goes Back To The Drawing Board: Unveils Revised Spot Bitcoin ETF Filing

    The Future

    Smartphone flaw allows hackers and governments to map your home

    Technology

    Data.ai and IDC expect consumers to spend $108B on mobile games in 2023, or 55% of global spending on games, followed by $43B on consoles and $40B on PC/Mac (Jordan Fragen/VentureBeat)

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Technology

    Modern workplace tech linked to lower employee well-being, study finds

    Crypto

    Bitcoin Retests $95,000, Is A New Year Rebound Coming?

    The Future

    ‘Plastic’ rocks found on remote Brazil islands, scientists raise concerns

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.