Close Menu
Ztoog
    What's Hot
    Crypto

    Bitcoin continues climbing, Block releases hardware wallet, Robinhood expands to EU and VCs may see some relief soon

    The Future

    Budgeting and Borrowing Wisely in the Digital Age

    Technology

    Mass extinction event 260 million years ago resulted from climate change, studies say

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

    • Technology

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » Neural network pruning with combinatorial optimization – Google Research Blog
    AI

    Neural network pruning with combinatorial optimization – Google Research Blog

    Facebook Twitter Pinterest WhatsApp
    Neural network pruning with combinatorial optimization – Google Research Blog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Posted by Hussein Hazimeh, Research Scientist, Athena Team, and Riade Benbaki, Graduate Student at MIT

    Modern neural networks have achieved spectacular efficiency throughout a wide range of functions, equivalent to language, mathematical reasoning, and imaginative and prescient. However, these networks typically use giant architectures that require a number of computational sources. This could make it impractical to serve such fashions to customers, particularly in resource-constrained environments like wearables and smartphones. A extensively used strategy to mitigate the inference prices of pre-trained networks is to prune them by eradicating a few of their weights, in a manner that doesn’t considerably have an effect on utility. In normal neural networks, every weight defines a connection between two neurons. So after weights are pruned, the enter will propagate by means of a smaller set of connections and thus requires much less computational sources.

    Original network vs. a pruned network.

    Pruning strategies may be utilized at totally different levels of the network’s coaching course of: publish, throughout, or earlier than coaching (i.e., instantly after weight initialization). In this publish, we concentrate on the post-training setting: given a pre-trained network, how can we decide which weights ought to be pruned? One common technique is magnitude pruning, which removes weights with the smallest magnitude. While environment friendly, this technique doesn’t instantly think about the impact of eradicating weights on the network’s efficiency. Another common paradigm is optimization-based pruning, which removes weights based mostly on how a lot their removing impacts the loss operate. Although conceptually interesting, most present optimization-based approaches appear to face a critical tradeoff between efficiency and computational necessities. Methods that make crude approximations (e.g., assuming a diagonal Hessian matrix) can scale properly, however have comparatively low efficiency. On the opposite hand, whereas strategies that make fewer approximations are likely to carry out higher, they seem like a lot much less scalable.

    In “Fast as CHITA: Neural Network Pruning with Combinatorial Optimization”, offered at ICML 2023, we describe how we developed an optimization-based strategy for pruning pre-trained neural networks at scale. CHITA (which stands for “Combinatorial Hessian-free Iterative Thresholding Algorithm”) outperforms present pruning strategies by way of scalability and efficiency tradeoffs, and it does so by leveraging advances from a number of fields, together with high-dimensional statistics, combinatorial optimization, and neural network pruning. For instance, CHITA may be 20x to 1000x sooner than state-of-the-art strategies for pruning ResNet and improves accuracy by over 10% in lots of settings.

    Overview of contributions

    CHITA has two notable technical enhancements over common strategies:

    • Efficient use of second-order info: Pruning strategies that use second-order info (i.e., referring to second derivatives) obtain the cutting-edge in lots of settings. In the literature, this info is often utilized by computing the Hessian matrix or its inverse, an operation that could be very troublesome to scale as a result of the Hessian measurement is quadratic with respect to the variety of weights. Through cautious reformulation, CHITA makes use of second-order info with out having to compute or retailer the Hessian matrix explicitly, thus permitting for extra scalability.
    • Combinatorial optimization: Popular optimization-based strategies use a easy optimization approach that prunes weights in isolation, i.e., when deciding to prune a sure weight they don’t consider whether or not different weights have been pruned. This might result in pruning essential weights as a result of weights deemed unimportant in isolation might turn out to be essential when different weights are pruned. CHITA avoids this challenge through the use of a extra superior, combinatorial optimization algorithm that takes under consideration how pruning one weight impacts others.

    In the sections beneath, we focus on CHITA’s pruning formulation and algorithms.

    A computation-friendly pruning formulation

    There are many attainable pruning candidates, that are obtained by retaining solely a subset of the weights from the unique network. Let okay be a user-specified parameter that denotes the variety of weights to retain. Pruning may be naturally formulated as a best-subset choice (BSS) downside: amongst all attainable pruning candidates (i.e., subsets of weights) with solely okay weights retained, the candidate that has the smallest loss is chosen.

    Pruning as a BSS downside: amongst all attainable pruning candidates with the identical whole variety of weights, the most effective candidate is outlined because the one with the least loss. This illustration reveals 4 candidates, however this quantity is mostly a lot bigger.

    Solving the pruning BSS downside on the unique loss operate is mostly computationally intractable. Thus, much like earlier work, equivalent to OBD and OBS, we approximate the loss with a quadratic operate through the use of a second-order Taylor collection, the place the Hessian is estimated with the empirical Fisher info matrix. While gradients may be usually computed effectively, computing and storing the Hessian matrix is prohibitively costly as a result of its sheer measurement. In the literature, it is not uncommon to deal with this problem by making restrictive assumptions on the Hessian (e.g., diagonal matrix) and likewise on the algorithm (e.g., pruning weights in isolation).

    CHITA makes use of an environment friendly reformulation of the pruning downside (BSS utilizing the quadratic loss) that avoids explicitly computing the Hessian matrix, whereas nonetheless utilizing all the data from this matrix. This is made attainable by exploiting the low-rank construction of the empirical Fisher info matrix. This reformulation may be seen as a sparse linear regression downside, the place every regression coefficient corresponds to a sure weight within the neural network. After acquiring an answer to this regression downside, coefficients set to zero will correspond to weights that ought to be pruned. Our regression information matrix is (n x p), the place n is the batch (sub-sample) measurement and p is the variety of weights within the authentic network. Typically n << p, so storing and working with this information matrix is rather more scalable than widespread pruning approaches that function with the (p x p) Hessian.

    CHITA reformulates the quadratic loss approximation, which requires an costly Hessian matrix, as a linear regression (LR) downside. The LR’s information matrix is linear in p, which makes the reformulation extra scalable than the unique quadratic approximation.

    Scalable optimization algorithms

    CHITA reduces pruning to a linear regression downside underneath the next sparsity constraint: at most okay regression coefficients may be nonzero. To get hold of an answer to this downside, we think about a modification of the well-known iterative laborious thresholding (IHT) algorithm. IHT performs gradient descent the place after every replace the next post-processing step is carried out: all regression coefficients outdoors the Top-okay (i.e., the okay coefficients with the biggest magnitude) are set to zero. IHT usually delivers an excellent answer to the issue, and it does so iteratively exploring totally different pruning candidates and collectively optimizing over the weights.

    Due to the dimensions of the issue, normal IHT with fixed studying price can undergo from very sluggish convergence. For sooner convergence, we developed a brand new line-search technique that exploits the issue construction to discover a appropriate studying price, i.e., one which results in a sufficiently giant lower within the loss. We additionally employed a number of computational schemes to enhance CHITA’s effectivity and the standard of the second-order approximation, resulting in an improved model that we name CHITA++.

    Experiments

    We examine CHITA’s run time and accuracy with a number of state-of-the-art pruning strategies utilizing totally different architectures, together with ResNet and MobileNet.

    Run time: CHITA is rather more scalable than comparable strategies that carry out joint optimization (versus pruning weights in isolation). For instance, CHITA’s speed-up can attain over 1000x when pruning ResNet.

    Post-pruning accuracy: Below, we examine the efficiency of CHITA and CHITA++ with magnitude pruning (MP), Woodfisher (WF), and Combinatorial Brain Surgeon (CBS), for pruning 70% of the mannequin weights. Overall, we see good enhancements from CHITA and CHITA++.

    Post-pruning accuracy of varied strategies on ResNet20. Results are reported for pruning 70% of the mannequin weights.
    Post-pruning accuracy of varied strategies on MobileNet. Results are reported for pruning 70% of the mannequin weights.

    Next, we report outcomes for pruning a bigger network: ResNet50 (on this network, a few of the strategies listed within the ResNet20 determine couldn’t scale). Here we examine with magnitude pruning and M-FAC. The determine beneath reveals that CHITA achieves higher check accuracy for a variety of sparsity ranges.

    Test accuracy of pruned networks, obtained utilizing totally different strategies.

    Conclusion, limitations, and future work

    We offered CHITA, an optimization-based strategy for pruning pre-trained neural networks. CHITA presents scalability and aggressive efficiency by effectively utilizing second-order info and drawing on concepts from combinatorial optimization and high-dimensional statistics.

    CHITA is designed for unstructured pruning by which any weight may be eliminated. In concept, unstructured pruning can considerably cut back computational necessities. However, realizing these reductions in observe requires particular software program (and presumably {hardware}) that help sparse computations. In distinction, structured pruning, which removes complete buildings like neurons, might supply enhancements which can be simpler to realize on general-purpose software program and {hardware}. It can be fascinating to increase CHITA to structured pruning.

    Acknowledgements

    This work is a part of a analysis collaboration between Google and MIT. Thanks to Rahul Mazumder, Natalia Ponomareva, Wenyu Chen, Xiang Meng, Zhe Zhao, and Sergei Vassilvitskii for his or her assist in making ready this publish and the paper. Also because of John Guilyard for creating the graphics on this publish.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    AI trained on millions of life stories can predict risk of early death

    Data masking your complete inhabitants of Denmark was used to coach an AI to predict…

    The Future

    Elon Musk’s AI chat with Rishi Sunak: Everything you need to know

    Elon Musk was interviewed by UK prime minister Rishi Sunak about the way forward for…

    Gadgets

    Spotify’s second price hike in 9 months will target audiobook listeners

    (*9*) Spotify Premium subscriptions embrace as much as 15 hours of audiobook listening. But beginning…

    AI

    Andrew Ng: How to be an innovator

    This essay is a part of MIT Technology Review’s 2023 Innovators Under 35 bundle. Meet this…

    AI

    Revolutionizing Real-Time 1080p Novel-View Synthesis: A Breakthrough with 3D Gaussians and Visibility-Aware Rendering

    Meshes and factors are the most typical 3D scene representations as a result of they’re…

    Our Picks
    Gadgets

    What Is 5G Home Internet? Here’s Everything You Need to Know (2024)

    AI

    Meet Tarsier: An Open Source Python Library to Enable Web Interaction with Multi-Modal LLMs like GPT4

    Gadgets

    Samsung’s new 83-inch TV could be a harbinger of consumer confusion

    Categories
    • AI (1,493)
    • Crypto (1,753)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,866)
    • Technology (1,802)
    • The Future (1,648)
    Most Popular
    Science

    A single meteorite smashed into Mars and created 2 billion craters

    AI

    Meet LMSYS-Chat-1M: A Large-Scale Dataset Containing One Million Real-World Conversations with 25 State-of-the-Art LLMs

    The Future

    Bambu Lab is recalling every A1 3D printer — don’t use them until you read this

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.