Close Menu
Ztoog
    What's Hot
    Crypto

    Curve Finance Exploiter Returns 61,000 ETH After Protocol’s Stern Warning

    Gadgets

    ‘Palworld’: 6 Beginner Tips for Getting Started

    Mobile

    How to transfer photos from Android to iPhone

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » Learning the importance of training data under concept drift – Google Research Blog
    AI

    Learning the importance of training data under concept drift – Google Research Blog

    Facebook Twitter Pinterest WhatsApp
    Learning the importance of training data under concept drift – Google Research Blog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Posted by Nishant Jain, Pre-doctoral Researcher, and Pradeep Shenoy, Research Scientist, Google Research

    The continually altering nature of the world round us poses a major problem for the growth of AI fashions. Often, fashions are skilled on longitudinal data with the hope that the training data used will precisely characterize inputs the mannequin might obtain in the future. More typically, the default assumption that each one training data are equally related usually breaks in apply. For instance, the determine beneath reveals pictures from the CLEAR nonstationary studying benchmark, and it illustrates how visible options of objects evolve considerably over a ten 12 months span (a phenomenon we consult with as sluggish concept drift), posing a problem for object categorization fashions.

    Sample pictures from the CLEAR benchmark. (Adapted from Lin et al.)

    Alternative approaches, comparable to on-line and continuous studying, repeatedly replace a mannequin with small quantities of latest data with a purpose to maintain it present. This implicitly prioritizes latest data, as the learnings from previous data are progressively erased by subsequent updates. However in the actual world, totally different varieties of data lose relevance at totally different charges, so there are two key points: 1) By design they focus solely on the most up-to-date data and lose any sign from older data that’s erased. 2) Contributions from data situations decay uniformly over time irrespective of the contents of the data.

    In our latest work, “Instance-Conditional Timescales of Decay for Non-Stationary Learning”, we suggest to assign every occasion an importance rating throughout training with a purpose to maximize mannequin efficiency on future data. To accomplish this, we make use of an auxiliary mannequin that produces these scores utilizing the training occasion in addition to its age. This mannequin is collectively discovered with the main mannequin. We handle each the above challenges and obtain vital good points over different sturdy studying strategies on a variety of benchmark datasets for nonstationary studying. For occasion, on a latest large-scale benchmark for nonstationary studying (~39M photographs over a ten 12 months interval), we present as much as 15% relative accuracy good points by means of discovered reweighting of training data.

    The problem of concept drift for supervised studying

    To achieve quantitative perception into sluggish concept drift, we constructed classifiers on a latest picture categorization activity, comprising roughly 39M images sourced from social media web sites over a ten 12 months interval. We in contrast offline training, which iterated over all the training data a number of occasions in random order, and continuous training, which iterated a number of occasions over every month of data in sequential (temporal) order. We measured mannequin accuracy each throughout the training interval and through a subsequent interval the place each fashions had been frozen, i.e., not up to date additional on new data (proven beneath). At the finish of the training interval (left panel, x-axis = 0), each approaches have seen the similar quantity of data, however present a big efficiency hole. This is because of catastrophic forgetting, an issue in continuous studying the place a mannequin’s information of data from early on in the training sequence is diminished in an uncontrolled method. On the different hand, forgetting has its benefits — over the take a look at interval (proven on the proper), the continuous skilled mannequin degrades a lot much less quickly than the offline mannequin as a result of it’s much less depending on older data. The decay of each fashions’ accuracy in the take a look at interval is affirmation that the data is certainly evolving over time, and each fashions develop into more and more much less related.

    Comparing offline and regularly skilled fashions on the picture classification activity.

    Time-sensitive reweighting of training data

    We design a technique combining the advantages of offline studying (the flexibility of successfully reusing all out there data) and continuous studying (the means to downplay older data) to deal with sluggish concept drift. We construct upon offline studying, then add cautious management over the affect of previous data and an optimization goal, each designed to cut back mannequin decay in the future.

    Suppose we want to practice a mannequin, M, given some training data collected over time. We suggest to additionally practice a helper mannequin that assigns a weight to every level primarily based on its contents and age. This weight scales the contribution from that data level in the training goal for M. The goal of the weights is to enhance the efficiency of M on future data.

    In our work, we describe how the helper mannequin will be meta-learned, i.e., discovered alongside M in a way that helps the studying of the mannequin M itself. A key design selection of the helper mannequin is that we separated out instance- and age-related contributions in a factored method. Specifically, we set the weight by combining contributions from a number of totally different fastened timescales of decay, and be taught an approximate “assignment” of a given occasion to its most suited timescales. We discover in our experiments that this way of the helper mannequin outperforms many different options we thought-about, starting from unconstrained joint capabilities to a single timescale of decay (exponential or linear), because of its mixture of simplicity and expressivity. Full particulars could also be present in the paper.

    Instance weight scoring

    The high determine beneath reveals that our discovered helper mannequin certainly up-weights extra modern-looking objects in the CLEAR object recognition problem; older-looking objects are correspondingly down-weighted. On nearer examination (backside determine beneath, gradient-based characteristic importance evaluation), we see that the helper mannequin focuses on the main object inside the picture, versus, e.g., background options that will spuriously be correlated with occasion age.

    Sample pictures from the CLEAR benchmark (digital camera & laptop classes) assigned the highest and lowest weights respectively by our helper mannequin.

    Feature importance evaluation of our helper mannequin on pattern pictures from the CLEAR benchmark.

    Results

    Gains on large-scale data

    We first research the large-scale picture categorization activity (PCAT) on the YFCC100M dataset mentioned earlier, utilizing the first 5 years of data for training and the subsequent 5 years as take a look at data. Our methodology (proven in pink beneath) improves considerably over the no-reweighting baseline (black) in addition to many different sturdy studying methods. Interestingly, our methodology intentionally trades off accuracy on the distant previous (training data unlikely to reoccur in the future) in alternate for marked enhancements in the take a look at interval. Also, as desired, our methodology degrades lower than different baselines in the take a look at interval.

    Comparison of our methodology and related baselines on the PCAT dataset.

    Broad applicability

    We validated our findings on a variety of nonstationary studying problem datasets sourced from the educational literature (see 1, 2, 3, 4 for particulars) that spans data sources and modalities (photographs, satellite tv for pc pictures, social media textual content, medical information, sensor readings, tabular data) and sizes (starting from 10k to 39M situations). We report vital good points in the take a look at interval when in comparison with the nearest printed benchmark methodology for every dataset (proven beneath). Note that the earlier best-known methodology could also be totally different for every dataset. These outcomes showcase the broad applicability of our method.

    Performance achieve of our methodology on a spread of duties finding out pure concept drift. Our reported good points are over the earlier best-known methodology for every dataset.

    Extensions to continuous studying

    Finally, we contemplate an attention-grabbing extension of our work. The work above described how offline studying will be prolonged to deal with concept drift utilizing concepts impressed by continuous studying. However, typically offline studying is infeasible — for instance, if the quantity of training data out there is simply too giant to keep up or course of. We tailored our method to continuous studying in an easy method by making use of temporal reweighting inside the context of every bucket of data getting used to sequentially replace the mannequin. This proposal nonetheless retains some limitations of continuous studying, e.g., mannequin updates are carried out solely on most-recent data, and all optimization selections (together with our reweighting) are solely remodeled that data. Nevertheless, our method persistently beats common continuous studying in addition to a variety of different continuous studying algorithms on the picture categorization benchmark (see beneath). Since our method is complementary to the concepts in lots of baselines in contrast right here, we anticipate even bigger good points when mixed with them.

    Results of our methodology tailored to continuous studying, in comparison with the newest baselines.

    Conclusion

    We addressed the problem of data drift in studying by combining the strengths of earlier approaches — offline studying with its efficient reuse of data, and continuous studying with its emphasis on more moderen data. We hope that our work helps enhance mannequin robustness to concept drift in apply, and generates elevated curiosity and new concepts in addressing the ubiquitous drawback of sluggish concept drift.

    Acknowledgements

    We thank Mike Mozer for a lot of attention-grabbing discussions in the early part of this work, in addition to very useful recommendation and suggestions throughout its growth.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    AI

    “Periodic table of machine learning” could fuel AI discovery | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Science

    How to make a pinhole eclipse viewer and a box eclipse viewer

    You want some primary gear to view the eclipse safelyLiang Sen / Imago / Alamy…

    Gadgets

    HTC Launches U24 Pro With Advanced Camera And Fast Charging

    HTC has formally launched the U24 Pro, a smartphone that includes a 6.8-inch 1080×2436 OLED…

    Technology

    Kajabi versus ClickFunnels: Which is Better?

    In at this time’s fast-paced digital panorama, on-line entrepreneurs and companies have more and more…

    Gadgets

    8 Best Language Learning Apps (2023): Online Courses and a Pocket Translator

    Here are a few extra companies and apps that may work effectively as supplemental instruments…

    The Future

    FTC Say Amazon Top Brass ‘Okay’ With Dark Pattern Trickery

    Image: Hadrian (Shutterstock)Amazon is beneath fireplace in a lawsuit filed by the Federal Trade Commission…

    Our Picks
    Technology

    Q-Pixel introduces 5,000 PPI full-color UHD microLED display

    Gadgets

    ChatGPT’s much-heralded Mac app was storing conversations as plain text

    Technology

    The Biden administration’s new US Cyber Trust Mark will tell you if your IoT device is secure

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Technology

    Supreme Court begins a new term. Here’s what’s at stake.

    Mobile

    Google Pixel Fold will reportedly have DIY repair guides and genuine parts via iFixit

    The Future

    Bringing the world to innovation | Ztoog

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.