Close Menu
Ztoog
    What's Hot
    Crypto

    SBF sentenced, Worldcoin hit with another ban order and big web3 pre-seed rounds are back

    Gadgets

    From Heart Health to GPS Tracking: Unveiling The Invoxia Minitailz Smart Pet Tracker At CES 2024

    Gadgets

    How to build a mosquito kill bucket

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » Why it’s critical to move beyond overly aggregated machine-learning metrics | Ztoog
    AI

    Why it’s critical to move beyond overly aggregated machine-learning metrics | Ztoog

    Facebook Twitter Pinterest WhatsApp
    Why it’s critical to move beyond overly aggregated machine-learning metrics | Ztoog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    MIT researchers have recognized vital examples of machine-learning mannequin failure when these fashions are utilized to information aside from what they have been educated on, elevating questions in regards to the want to check every time a mannequin is deployed in a brand new setting.

    “We demonstrate that even when you train models on large amounts of data, and choose the best average model, in a new setting this ‘best model’ could be the worst model for 6-75 percent of the new data,” says Marzyeh Ghassemi, an affiliate professor in MIT’s Department of Electrical Engineering and Computer Science (EECS), a member of the Institute for Medical Engineering and Science, and principal investigator on the Laboratory for Information and Decision Systems.

    In a paper that was introduced on the Neural Information Processing Systems (NeurIPS 2025) convention in December, the researchers level out that fashions educated to successfully diagnose sickness in chest X-rays at one hospital, for instance, could also be thought-about efficient in a unique hospital, on common. The researchers’ efficiency evaluation, nonetheless, revealed that a number of the best-performing fashions on the first hospital have been the worst-performing on up to 75 % of sufferers on the second hospital, regardless that when all sufferers are aggregated within the second hospital, excessive common efficiency hides this failure.

    Their findings exhibit that though spurious correlations — a easy instance of which is when a machine-learning system, not having “seen” many cows pictured on the seaside, classifies a photograph of a beach-going cow as an orca merely due to its background — are thought to be mitigated by simply enhancing mannequin efficiency on noticed information, they really nonetheless happen and stay a threat to a mannequin’s trustworthiness in new settings. In many cases — together with areas examined by the researchers reminiscent of chest X-rays, most cancers histopathology photographs, and hate speech detection — such spurious correlations are a lot more durable to detect.

    In the case of a medical analysis mannequin educated on chest X-rays, for instance, the mannequin might have discovered to correlate a selected and irrelevant marking on one hospital’s X-rays with a sure pathology. At one other hospital the place the marking shouldn’t be used, that pathology could possibly be missed.

    Previous analysis by Ghassemi’s group has proven that fashions can spuriously correlate such elements as age, gender, and race with medical findings. If, as an example, a mannequin has been educated on extra older folks’s chest X-rays which have pneumonia and hasn’t “seen” as many X-rays belonging to youthful folks, it would predict that solely older sufferers have pneumonia.

    “We want models to learn how to look at the anatomical features of the patient and then make a decision based on that,” says Olawale Salaudeen, an MIT postdoc and the lead writer of the paper, “but really anything that’s in the data that’s correlated with a decision can be used by the model. And those correlations might not actually be robust with changes in the environment, making the model predictions unreliable sources of decision-making.”

    Spurious correlations contribute to the dangers of biased decision-making. In the NeurIPS convention paper, the researchers confirmed that, for instance, chest X-ray fashions that improved general analysis efficiency really carried out worse on sufferers with pleural circumstances or enlarged cardiomediastinum, which means enlargement of the center or central chest cavity.

    Other authors of the paper included PhD college students Haoran Zhang and Kumail Alhamoud, EECS Assistant Professor Sara Beery, and Ghassemi.

    While earlier work has typically accepted that fashions ordered best-to-worst by efficiency will protect that order when utilized in new settings, known as accuracy-on-the-line, the researchers have been in a position to exhibit examples of when the best-performing fashions in a single setting have been the worst-performing in one other.

    Salaudeen devised an algorithm known as OODSelect to discover examples the place accuracy-on-the-line was damaged. Basically, he educated hundreds of fashions utilizing in-distribution information, which means the information have been from the primary setting, and calculated their accuracy. Then he utilized the fashions to the information from the second setting. When these with the best accuracy on the first-setting information have been improper when utilized to a big proportion of examples within the second setting, this recognized the issue subsets, or sub-populations. Salaudeen additionally emphasizes the risks of mixture statistics for analysis, which may obscure extra granular and consequential details about mannequin efficiency.

    In the course of their work, the researchers separated out the “most miscalculated examples” in order not to conflate spurious correlations inside a dataset with conditions which might be merely troublesome to classify.

    The NeurIPS paper releases the researchers’ code and a few recognized subsets for future work.

    Once a hospital, or any group using machine studying, identifies subsets on which a mannequin is performing poorly, that data can be utilized to enhance the mannequin for its specific activity and setting. The researchers suggest that future work undertake OODSelect so as to spotlight targets for analysis and design approaches to enhancing efficiency extra constantly.

    “We hope the released code and OODSelect subsets become a steppingstone,” the researchers write, “toward benchmarks and models that confront the adverse effects of spurious correlations.”

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Science

    Fluffy exoplanet blasted by its sun has clouds that rain sand

    Artist’s impression of fluffy planet WASP-107b and its father or mother starLUCA School of Arts,…

    Crypto

    Raydium Prices Shot 30% 2 Weeks Ago Before Plunging, What Happened?

    RAY, the native governance token of Raydium, the automated market maker (AMM) decentralized trade (DEX)…

    Crypto

    New All-Time Highs For Bitcoin In 2023? Analyst Shares Prediction

    The flagship cryptocurrency, Bitcoin, is up by over 100% year-to-date (YTD). Despite these spectacular features,…

    The Future

    Harvest Vs RescueTime: A comparison

    Are you looking for an in depth Harvest vs RescueTime comparison? Tracking work hours is…

    Science

    How archaeologists reconstructed the burning of Jerusalem in 586 BCE

    Assaf Peretz/Israel Antiquities Authority There’s hardly ever time to put in writing about each cool…

    Our Picks
    Mobile

    T-Mobile vs Verizon vs AT&T: There’s only one spot in the limelight in new 5G report

    The Future

    Epic Games Store and Fortnite are coming to iPhones in 2024

    Technology

    Meta has poached Frank Chu, an Apple exec who led AI teams focused on cloud infrastructure, training and search, despite Meta’s plans to slow hiring (Mark Gurman/Bloomberg)

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    Crypto

    El Salvador President Says No To Selling As Bitcoin Investment Pays Off Big

    AI

    Say Goodbye to Costly Auto-GPT and LangChain Runs: Meet ReWOO – The Game-Changing Modular Paradigm that Cuts Token Consumption by Detaching Reasoning from External Observations

    Gadgets

    20+ luxury items that are less pricey for Cyber Monday

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.