Close Menu
Ztoog
    What's Hot
    Technology

    Spatial Data Makes AI Crop-Yield Predictions Better

    Technology

    Catch the first big cash discount on the Samsung Galaxy Tab A9 Plus

    Crypto

    Bitcoin Short-Term Holders Go On 1.2 Million BTC Buying Spree, Is Retail Finally Here?

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » Re-weighted gradient descent via distributionally robust optimization – Google Research Blog
    AI

    Re-weighted gradient descent via distributionally robust optimization – Google Research Blog

    Facebook Twitter Pinterest WhatsApp
    Re-weighted gradient descent via distributionally robust optimization – Google Research Blog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Ramnath Kumar, Pre-Doctoral Researcher, and Arun Sai Suggala, Research Scientist, Google Research

    Deep neural networks (DNNs) have change into important for fixing a variety of duties, from customary supervised studying (picture classification utilizing ViT) to meta-learning. The most commonly-used paradigm for studying DNNs is empirical threat minimization (ERM), which goals to establish a community that minimizes the common loss on coaching knowledge factors. Several algorithms, together with stochastic gradient descent (SGD), Adam, and Adagrad, have been proposed for fixing ERM. However, a downside of ERM is that it weights all of the samples equally, typically ignoring the uncommon and harder samples, and specializing in the better and ample samples. This results in suboptimal efficiency on unseen knowledge, particularly when the coaching knowledge is scarce.

    To overcome this problem, current works have developed knowledge re-weighting methods for bettering ERM efficiency. However, these approaches concentrate on particular studying duties (similar to classification) and/or require studying a further meta mannequin that predicts the weights of every knowledge level. The presence of a further mannequin considerably will increase the complexity of coaching and makes them unwieldy in apply.

    In “Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization” we introduce a variant of the classical SGD algorithm that re-weights knowledge factors throughout every optimization step primarily based on their issue. Stochastic Re-weighted Gradient Descent (RGD) is a light-weight algorithm that comes with a easy closed-form expression, and will be utilized to resolve any studying job utilizing simply two strains of code. At any stage of the training course of, RGD merely reweights a knowledge level because the exponential of its loss. We empirically show that the RGD reweighting algorithm improves the efficiency of quite a few studying algorithms throughout varied duties, starting from supervised studying to meta studying. Notably, we present enhancements over state-of-the-art strategies on DomainMattress and Tabular classification. Moreover, the RGD algorithm additionally boosts efficiency for BERT utilizing the GLUE benchmarks and ViT on ImageNet-1K.

    Distributionally robust optimization

    Distributionally robust optimization (DRO) is an strategy that assumes a “worst-case” knowledge distribution shift might happen, which might hurt a mannequin’s efficiency. If a mannequin has focussed on figuring out few spurious options for prediction, these “worst-case” knowledge distribution shifts may result in the misclassification of samples and, thus, a efficiency drop. DRO optimizes the loss for samples in that “worst-case” distribution, making the mannequin robust to perturbations (e.g., eradicating a small fraction of factors from a dataset, minor up/down weighting of knowledge factors, and so on.) within the knowledge distribution. In the context of classification, this forces the mannequin to put much less emphasis on noisy options and extra emphasis on helpful and predictive options. Consequently, fashions optimized utilizing DRO are likely to have higher generalization ensures and stronger efficiency on unseen samples.

    Inspired by these outcomes, we develop the RGD algorithm as a method for fixing the DRO goal. Specifically, we concentrate on Kullback–Leibler divergence-based DRO, the place one provides perturbations to create distributions which might be near the unique knowledge distribution within the KL divergence metric, enabling a mannequin to carry out properly over all attainable perturbations.

    Figure illustrating DRO. In distinction to ERM, which learns a mannequin that minimizes anticipated loss over unique knowledge distribution, DRO learns a mannequin that performs properly on a number of perturbed variations of the unique knowledge distribution.

    Stochastic re-weighted gradient descent

    Consider a random subset of samples (known as a mini-batch), the place every knowledge level has an related loss Li. Traditional algorithms like SGD give equal significance to all of the samples within the mini-batch, and replace the parameters of the mannequin by descending alongside the averaged gradients of the lack of these samples. With RGD, we reweight every pattern within the mini-batch and provides extra significance to factors that the mannequin identifies as harder. To be exact, we use the loss as a proxy to calculate the issue of a degree, and reweight it by the exponential of its loss. Finally, we replace the mannequin parameters by descending alongside the weighted common of the gradients of the samples.

    Due to stability concerns, in our experiments we clip and scale the loss earlier than computing its exponential. Specifically, we clip the loss at some threshold T, and multiply it with a scalar that’s inversely proportional to the edge. An essential side of RGD is its simplicity because it doesn’t depend on a meta mannequin to compute the weights of knowledge factors. Furthermore, it may be carried out with two strains of code, and mixed with any widespread optimizers (similar to SGD, Adam, and Adagrad.

    Figure illustrating the intuitive thought behind RGD in a binary classification setting. Feature 1 and Feature 2 are the options obtainable to the mannequin for predicting the label of a knowledge level. RGD upweights the information factors with excessive losses which have been misclassified by the mannequin.

    Results

    We current empirical outcomes evaluating RGD with state-of-the-art methods on customary supervised studying and area adaptation (consult with the paper for outcomes on meta studying). In all our experiments, we tune the clipping degree and the training price of the optimizer utilizing a held-out validation set.

    Supervised studying

    We consider RGD on a number of supervised studying duties, together with language, imaginative and prescient, and tabular classification. For the duty of language classification, we apply RGD to the BERT mannequin educated on the General Language Understanding Evaluation (GLUE) benchmark and present that RGD outperforms the BERT baseline by +1.94% with a normal deviation of 0.42%. To consider RGD’s efficiency on imaginative and prescient classification, we apply RGD to the ViT-S mannequin educated on the ImageNet-1K dataset, and present that RGD outperforms the ViT-S baseline by +1.01% with a normal deviation of 0.23%. Moreover, we carry out speculation checks to verify that these outcomes are statistically important with a p-value that’s lower than 0.05.

    RGD’s efficiency on language and imaginative and prescient classification utilizing GLUE and Imagenet-1K benchmarks. Note that MNLI, QQP, QNLI, SST-2, MRPC, RTE and COLA are various datasets which comprise the GLUE benchmark.

    For tabular classification, we use MET as our baseline, and take into account varied binary and multi-class datasets from UC Irvine’s machine studying repository. We present that making use of RGD to the MET framework improves its efficiency by 1.51% and 1.27% on binary and multi-class tabular classification, respectively, attaining state-of-the-art efficiency on this area.

    Performance of RGD for classification of varied tabular datasets.

    Domain generalization

    To consider RGD’s generalization capabilities, we use the usual DomainMattress benchmark, which is often used to review a mannequin’s out-of-domain efficiency. We apply RGD to FRR, a current strategy that improved out-of-domain benchmarks, and present that RGD with FRR performs a mean of 0.7% higher than the FRR baseline. Furthermore, we verify with speculation checks that almost all benchmark outcomes (apart from Office Home) are statistically important with a p-value lower than 0.05.

    Performance of RGD on DomainMattress benchmark for distributional shifts.

    Class imbalance and equity

    To show that fashions realized utilizing RGD carry out properly regardless of class imbalance, the place sure courses within the dataset are underrepresented, we examine RGD’s efficiency with ERM on long-tailed CIFAR-10. We report that RGD improves the accuracy of baseline ERM by a mean of two.55% with a normal deviation of 0.23%. Furthermore, we carry out speculation checks and make sure that these outcomes are statistically important with a p-value of lower than 0.05.

    Performance of RGD on the long-tailed Cifar-10 benchmark for sophistication imbalance area.

    Limitations

    The RGD algorithm was developed utilizing widespread analysis datasets, which have been already curated to take away corruptions (e.g., noise and incorrect labels). Therefore, RGD might not present efficiency enhancements in eventualities the place coaching knowledge has a excessive quantity of corruptions. A possible strategy to deal with such eventualities is to use an outlier removing approach to the RGD algorithm. This outlier removing approach needs to be able to filtering out outliers from the mini-batch and sending the remaining factors to our algorithm.

    Conclusion

    RGD has been proven to be efficient on quite a lot of duties, together with out-of-domain generalization, tabular illustration studying, and sophistication imbalance. It is straightforward to implement and will be seamlessly built-in into current algorithms with simply two strains of code change. Overall, RGD is a promising approach for reinforcing the efficiency of DNNs, and will assist push the boundaries in varied domains.

    Acknowledgements

    The paper described on this weblog submit was written by Ramnath Kumar, Arun Sai Suggala, Dheeraj Nagaraj and Kushal Majmundar. We prolong our honest gratitude to the nameless reviewers, Prateek Jain, Pradeep Shenoy, Anshul Nasery, Lovish Madaan, and the quite a few devoted members of the machine studying and optimization crew at Google Research India for his or her invaluable suggestions and contributions to this work.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    AI

    “Periodic table of machine learning” could fuel AI discovery | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    Most work is new work, long-term study of U.S. census data shows | Ztoog

    This is half 1 of a two-part Ztoog characteristic analyzing new job creation within the…

    The Future

    Gaza’s phone and internet services have completely collapsed

    Two million individuals in Gaza are with out dependable communicationsAFP by way of Getty Images…

    Science

    Hurricanes push heat deeper into the ocean than scientists realized, new research shows

    Enlarge / Satellite knowledge illustrates the heat signature of Hurricane Maria above heat floor water…

    Crypto

    Ethereum Whales Go On 9-Day Accumulation Spree: ETH Price Rally Incoming?

    The worth of Ethereum has been on a gradual and monumental rise previously few weeks,…

    Crypto

    Bitcoin Funding Rates On BitMEX Turn Deep Red, Here’s Why This Is Bullish

    Data reveals the Bitcoin funding charges on the cryptocurrency trade BitMEX have turned fairly detrimental…

    Our Picks
    Mobile

    So many Quest 3 leaks, so little time. Plus, we’re taking a look at more Quest game releases including a new NFL VR game, and Apple’s commitment to the Vision Pro’s long-term success.

    The Future

    Why Tech Innovators Are Turning Their Backs On Belarus

    Crypto

    DeFi Protocol Conic Finance Hacked for 1700 ETH

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    AI

    A method to interpret AI might not be so interpretable after all | Ztoog

    The Future

    Unveiling My Favorite Websites

    Technology

    Once a Sheriff’s Deputy in Florida, Now a Source of Disinformation From Russia

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.