Close Menu
Ztoog
    What's Hot
    The Future

    Sound vibrations can encode and process data like quantum computers do

    Mobile

    Samsung Galaxy S23 Ultra Limited Edition announced

    The Future

    A Deep Dive into Multi-Channel Lead Routing: Maximizing Outreach Potential

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

    • Technology

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » Re-weighted gradient descent via distributionally robust optimization – Google Research Blog
    AI

    Re-weighted gradient descent via distributionally robust optimization – Google Research Blog

    Facebook Twitter Pinterest WhatsApp
    Re-weighted gradient descent via distributionally robust optimization – Google Research Blog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Ramnath Kumar, Pre-Doctoral Researcher, and Arun Sai Suggala, Research Scientist, Google Research

    Deep neural networks (DNNs) have change into important for fixing a variety of duties, from customary supervised studying (picture classification utilizing ViT) to meta-learning. The most commonly-used paradigm for studying DNNs is empirical threat minimization (ERM), which goals to establish a community that minimizes the common loss on coaching knowledge factors. Several algorithms, together with stochastic gradient descent (SGD), Adam, and Adagrad, have been proposed for fixing ERM. However, a downside of ERM is that it weights all of the samples equally, typically ignoring the uncommon and harder samples, and specializing in the better and ample samples. This results in suboptimal efficiency on unseen knowledge, particularly when the coaching knowledge is scarce.

    To overcome this problem, current works have developed knowledge re-weighting methods for bettering ERM efficiency. However, these approaches concentrate on particular studying duties (similar to classification) and/or require studying a further meta mannequin that predicts the weights of every knowledge level. The presence of a further mannequin considerably will increase the complexity of coaching and makes them unwieldy in apply.

    In “Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization” we introduce a variant of the classical SGD algorithm that re-weights knowledge factors throughout every optimization step primarily based on their issue. Stochastic Re-weighted Gradient Descent (RGD) is a light-weight algorithm that comes with a easy closed-form expression, and will be utilized to resolve any studying job utilizing simply two strains of code. At any stage of the training course of, RGD merely reweights a knowledge level because the exponential of its loss. We empirically show that the RGD reweighting algorithm improves the efficiency of quite a few studying algorithms throughout varied duties, starting from supervised studying to meta studying. Notably, we present enhancements over state-of-the-art strategies on DomainMattress and Tabular classification. Moreover, the RGD algorithm additionally boosts efficiency for BERT utilizing the GLUE benchmarks and ViT on ImageNet-1K.

    Distributionally robust optimization

    Distributionally robust optimization (DRO) is an strategy that assumes a “worst-case” knowledge distribution shift might happen, which might hurt a mannequin’s efficiency. If a mannequin has focussed on figuring out few spurious options for prediction, these “worst-case” knowledge distribution shifts may result in the misclassification of samples and, thus, a efficiency drop. DRO optimizes the loss for samples in that “worst-case” distribution, making the mannequin robust to perturbations (e.g., eradicating a small fraction of factors from a dataset, minor up/down weighting of knowledge factors, and so on.) within the knowledge distribution. In the context of classification, this forces the mannequin to put much less emphasis on noisy options and extra emphasis on helpful and predictive options. Consequently, fashions optimized utilizing DRO are likely to have higher generalization ensures and stronger efficiency on unseen samples.

    Inspired by these outcomes, we develop the RGD algorithm as a method for fixing the DRO goal. Specifically, we concentrate on Kullback–Leibler divergence-based DRO, the place one provides perturbations to create distributions which might be near the unique knowledge distribution within the KL divergence metric, enabling a mannequin to carry out properly over all attainable perturbations.

    Figure illustrating DRO. In distinction to ERM, which learns a mannequin that minimizes anticipated loss over unique knowledge distribution, DRO learns a mannequin that performs properly on a number of perturbed variations of the unique knowledge distribution.

    Stochastic re-weighted gradient descent

    Consider a random subset of samples (known as a mini-batch), the place every knowledge level has an related loss Li. Traditional algorithms like SGD give equal significance to all of the samples within the mini-batch, and replace the parameters of the mannequin by descending alongside the averaged gradients of the lack of these samples. With RGD, we reweight every pattern within the mini-batch and provides extra significance to factors that the mannequin identifies as harder. To be exact, we use the loss as a proxy to calculate the issue of a degree, and reweight it by the exponential of its loss. Finally, we replace the mannequin parameters by descending alongside the weighted common of the gradients of the samples.

    Due to stability concerns, in our experiments we clip and scale the loss earlier than computing its exponential. Specifically, we clip the loss at some threshold T, and multiply it with a scalar that’s inversely proportional to the edge. An essential side of RGD is its simplicity because it doesn’t depend on a meta mannequin to compute the weights of knowledge factors. Furthermore, it may be carried out with two strains of code, and mixed with any widespread optimizers (similar to SGD, Adam, and Adagrad.

    Figure illustrating the intuitive thought behind RGD in a binary classification setting. Feature 1 and Feature 2 are the options obtainable to the mannequin for predicting the label of a knowledge level. RGD upweights the information factors with excessive losses which have been misclassified by the mannequin.

    Results

    We current empirical outcomes evaluating RGD with state-of-the-art methods on customary supervised studying and area adaptation (consult with the paper for outcomes on meta studying). In all our experiments, we tune the clipping degree and the training price of the optimizer utilizing a held-out validation set.

    Supervised studying

    We consider RGD on a number of supervised studying duties, together with language, imaginative and prescient, and tabular classification. For the duty of language classification, we apply RGD to the BERT mannequin educated on the General Language Understanding Evaluation (GLUE) benchmark and present that RGD outperforms the BERT baseline by +1.94% with a normal deviation of 0.42%. To consider RGD’s efficiency on imaginative and prescient classification, we apply RGD to the ViT-S mannequin educated on the ImageNet-1K dataset, and present that RGD outperforms the ViT-S baseline by +1.01% with a normal deviation of 0.23%. Moreover, we carry out speculation checks to verify that these outcomes are statistically important with a p-value that’s lower than 0.05.

    RGD’s efficiency on language and imaginative and prescient classification utilizing GLUE and Imagenet-1K benchmarks. Note that MNLI, QQP, QNLI, SST-2, MRPC, RTE and COLA are various datasets which comprise the GLUE benchmark.

    For tabular classification, we use MET as our baseline, and take into account varied binary and multi-class datasets from UC Irvine’s machine studying repository. We present that making use of RGD to the MET framework improves its efficiency by 1.51% and 1.27% on binary and multi-class tabular classification, respectively, attaining state-of-the-art efficiency on this area.

    Performance of RGD for classification of varied tabular datasets.

    Domain generalization

    To consider RGD’s generalization capabilities, we use the usual DomainMattress benchmark, which is often used to review a mannequin’s out-of-domain efficiency. We apply RGD to FRR, a current strategy that improved out-of-domain benchmarks, and present that RGD with FRR performs a mean of 0.7% higher than the FRR baseline. Furthermore, we verify with speculation checks that almost all benchmark outcomes (apart from Office Home) are statistically important with a p-value lower than 0.05.

    Performance of RGD on DomainMattress benchmark for distributional shifts.

    Class imbalance and equity

    To show that fashions realized utilizing RGD carry out properly regardless of class imbalance, the place sure courses within the dataset are underrepresented, we examine RGD’s efficiency with ERM on long-tailed CIFAR-10. We report that RGD improves the accuracy of baseline ERM by a mean of two.55% with a normal deviation of 0.23%. Furthermore, we carry out speculation checks and make sure that these outcomes are statistically important with a p-value of lower than 0.05.

    Performance of RGD on the long-tailed Cifar-10 benchmark for sophistication imbalance area.

    Limitations

    The RGD algorithm was developed utilizing widespread analysis datasets, which have been already curated to take away corruptions (e.g., noise and incorrect labels). Therefore, RGD might not present efficiency enhancements in eventualities the place coaching knowledge has a excessive quantity of corruptions. A possible strategy to deal with such eventualities is to use an outlier removing approach to the RGD algorithm. This outlier removing approach needs to be able to filtering out outliers from the mini-batch and sending the remaining factors to our algorithm.

    Conclusion

    RGD has been proven to be efficient on quite a lot of duties, together with out-of-domain generalization, tabular illustration studying, and sophistication imbalance. It is straightforward to implement and will be seamlessly built-in into current algorithms with simply two strains of code change. Overall, RGD is a promising approach for reinforcing the efficiency of DNNs, and will assist push the boundaries in varied domains.

    Acknowledgements

    The paper described on this weblog submit was written by Ramnath Kumar, Arun Sai Suggala, Dheeraj Nagaraj and Kushal Majmundar. We prolong our honest gratitude to the nameless reviewers, Prateek Jain, Pradeep Shenoy, Anshul Nasery, Lovish Madaan, and the quite a few devoted members of the machine studying and optimization crew at Google Research India for his or her invaluable suggestions and contributions to this work.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Crypto

    Will The Crypto Asset Reign Supreme?

    Bitcoin (BTC), the highest cryptocurrency, has emerged as a power to be reckoned with within…

    Gadgets

    The best Android games we’ve played

    We could earn income from the merchandise accessible on this web page and take part…

    Technology

    Q&A with researcher Tamara Kneese on her book Death Glitch and why tech companies need to improve how they handle posthumous digital remains on their platforms (Zoë Corbyn/The Guardian)

    Zoë Corbyn / The Guardian: Q&A with researcher Tamara Kneese on her book Death Glitch…

    The Future

    Cruise Pulls Robotaxis After California Says They’re ‘Not Safe’

    GM-owned Cruise suspended its driverless taxis, the corporate mentioned in a post on X, previously…

    Mobile

    Best Bluetooth headsets 2024 | Android Central

    Bluetooth headphones and earbuds are frequent as of late, and rightfully so. But in the…

    Our Picks
    AI

    Google Deepmind Research Introduces FunSearch: A New Artificial Intelligence Method to Search for New Solutions in Mathematics and Computer Science

    AI

    2023-24 Takeda Fellows: Advancing research at the intersection of AI and health | Ztoog

    Mobile

    Google Assistant gets a Wear OS Tile with command shortcuts

    Categories
    • AI (1,493)
    • Crypto (1,753)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,866)
    • Technology (1,802)
    • The Future (1,648)
    Most Popular
    Crypto

    Ethereum (ETH) Records Highest CEX Inflows In 2 Months

    Gadgets

    Unveil the hidden wonders of the micro-world with this pocket-sized LCD microscope, now $81.99

    Mobile

    Google Pixel 8’s retail box surfaces

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.