Close Menu
Ztoog
    What's Hot
    Crypto

    Public Miners Account for Just 28% – Is Decentralization in Jeopardy?”

    Mobile

    Nice! The TicWatch Pro 3 Ultra just got a $120 discount

    Gadgets

    China’s Best Self-Driving Car Platforms, Tested and Compared

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » Researchers from the University of Toronto Unveil a Surprising Redundancy in Large Materials Datasets and the Power of Informative Data for Enhanced Machine Learning Performance
    AI

    Researchers from the University of Toronto Unveil a Surprising Redundancy in Large Materials Datasets and the Power of Informative Data for Enhanced Machine Learning Performance

    Facebook Twitter Pinterest WhatsApp
    Researchers from the University of Toronto Unveil a Surprising Redundancy in Large Materials Datasets and the Power of Informative Data for Enhanced Machine Learning Performance
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    With the introduction of AI, its use is being felt in all spheres of our lives. AI is discovering its utility in all walks of life. But AI wants information for the coaching. AI’s effectiveness depends closely on information availability for coaching functions.

    Conventionally, attaining accuracy in coaching AI fashions has been linked to the availability of substantial quantities of information. Addressing this problem in this area includes navigating an intensive potential search house. For instance, The Open Catalyst Project, makes use of greater than 200 million information factors associated to potential catalyst supplies. 

    The computation assets required for evaluation and mannequin improvement on such datasets are a massive drawback. Open Catalyst datasets used 16,000 GPU days for analyzing and creating fashions. Such coaching budgets are solely obtainable to some researchers, usually limiting mannequin improvement to smaller datasets or a portion of the obtainable information. Consequently, mannequin improvement is often restricted to smaller datasets or a fraction of the obtainable information.

    A examine by University of Toronto Engineering researchers, revealed in Nature Communications, means that the perception that deep studying fashions require a lot of coaching information might not be all the time true. 

    The researchers stated that we have to discover a technique to determine smaller datasets that can be utilized to coach fashions on. Dr. Kangming Li, a postdoctoral scholar at Hattrick-Simpers, used an instance of a mannequin that forecasts college students’ ultimate scores and emphasised that it performs finest on the dataset of Canadian college students on which it’s educated, however it won’t have the ability to predict grades for college students from of different international locations.

    One attainable answer is discovering subsets of information inside extremely enormous datasets to deal with the points raised. These subsets ought to comprise all the variety and data in the authentic dataset however be simpler to deal with throughout processing.

    Li developed strategies for finding high-quality subsets of data from supplies datasets which have already been made public, comparable to JARVIS, The Materials Project, and Open Quantum Materials. The aim was to realize extra perception into how dataset properties have an effect on the fashions they prepare.

    To create his pc program, he used the authentic dataset and a a lot smaller subset with 95% fewer information factors. The mannequin educated on 5% of the information carried out comparably to the mannequin educated on the complete dataset when predicting the properties of supplies inside the dataset’s area. According to this, machine studying coaching can safely exclude as much as 95% of the information with little to no impact on the accuracy of in-distribution predictions. The overrepresented materials is the major topic of the redundant information.

    According to Li, the examine’s conclusions present a technique to gauge how redundant a dataset is. If including extra information doesn’t enhance mannequin efficiency, it’s redundant and doesn’t present the fashions with any new data to be taught.

    The examine helps a rising physique of information amongst consultants in AI throughout a number of domains: fashions educated on comparatively small datasets can carry out nicely, offered the information high quality is excessive.

    In conclusion, the significance of data richness is burdened greater than the quantity of information alone. The high quality of the data must be prioritized over gathering huge volumes of information.


    Check out the Paper. All credit score for this analysis goes to the researchers of this mission. Also, don’t neglect to affix our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the newest AI analysis information, cool AI tasks, and extra.

    If you want our work, you’ll love our e-newsletter..


    Rachit Ranjan is a consulting intern at MarktechPost . He is presently pursuing his B.Tech from Indian Institute of Technology(IIT) Patna . He is actively shaping his profession in the area of Artificial Intelligence and Data Science and is passionate and devoted for exploring these fields.


    🔥 Join The AI Startup Newsletter To Learn About Latest AI Startups

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    AI

    “Periodic table of machine learning” could fuel AI discovery | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    This AI Paper Introduces Lemur and Lemur Chat For Harmonizing Natural Language and Code For Language Agents

    In a broad sense, clever brokers are autonomous drawback solvers endowed with notion, judgment, and…

    Gadgets

    Crush the CompTIA exams with the help of this $60 bundle

    We might earn income from the merchandise out there on this web page and take…

    Technology

    Who could buy TikTok if Congress enacts a ban?

    The Senate is now contemplating a bipartisan invoice that could power a sale of TikTok,…

    AI

    This Machine Learning Research from Stanford and Microsoft Advances the Understanding of Generalization in Diffusion Models

    Diffusion fashions are at the forefront of generative mannequin analysis. These fashions, important in replicating…

    Crypto

    Netflix Director’s $4M Bet Turns Into $27M

    In an audacious monetary play, Carl Erik Rinsch, the director of Netflix’s sci-fi collection “Conquest,”…

    Our Picks
    Technology

    Abortion bans: Women’s mental health suffers in trigger ban states

    Technology

    Microsoft reports Q1 devices revenue down 22% YoY, Windows revenue up 5%, Xbox content and services revenue up 13%, search and news advertising revenue up 10% (Zachary Boddy/Windows Central)

    AI

    Anthropic AI Releases Claude 3.5: A New AI Model that Surpasses GPT-4o on Multiple Benchmarks While Being 2x Faster than Claude 3 Opus

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Gadgets

    Apple’s $130 Thunderbolt 4 cable could be worth it, as seen in X-ray CT scans

    AI

    Meta AI Unveils SeamlessM4T: A Foundational Multilingual and Multitask Model that Seamlessly Translates and Transcribes Across Speech and Text

    The Future

    Garmin Forerunner 265 Review: It Isn’t Cheap, But Has a Lot to Offer Serious(ish) Runners

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.