Close Menu
Ztoog
    What's Hot
    The Future

    Garmin Forerunner 265 Review: It Isn’t Cheap, But Has a Lot to Offer Serious(ish) Runners

    Mobile

    The Android 14 beta is the buggiest beta I’ve ever installed on my Pixels

    The Future

    Quantum memory device could stop unhackable networks from failing

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Disneyland’s 70th Anniversary Brings Cartoony Chaos to This Summer’s Celebration

      Story of military airfield in Afghanistan that Biden left in 2021

      Tencent hires WizardLM team, a Microsoft AI group with an odd history

      Today’s NYT Connections Hints, Answers for May 12, #701

      OPPO launches A5 Pro 5G: Premium features at a budget price

    • Technology

      Deep dive on the evolution of Microsoft's relationship with OpenAI, from its $1B investment in 2019 through Copilot rollouts and ChatGPT's launch to present day (Bloomberg)

      New leak reveals iPhone Fold won’t look like the Galaxy Z Fold 6 at all

      Apple will use AI and user data in iOS 19 to extend iPhone battery life

      Today’s NYT Wordle Hints, Answer and Help for May 12, #1423

      What It Is and Why It Matters—Part 1 – O’Reilly

    • Gadgets

      We Hand-Picked the 24 Best Deals From the 2025 REI Anniversary Sale

      “Google wanted that”: Nextcloud decries Android permissions as “gatekeeping”

      Google Tests Automatic Password-to-Passkey Conversion On Android

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

    • Mobile

      The iPhone Fold is now being tested with an under-display camera

      T-Mobile takes over one of golf’s biggest events, unleashes unique experiences

      Fitbit’s AI experiments just leveled up with 3 new health tracking features

      Motorola’s Moto Watch needs to start living up to the brand name

      Samsung Galaxy S25 Edge promo materials leak

    • Science

      Do these Buddhist gods hint at the purpose of China’s super-secret satellites?

      From Espresso to Eco-Brick: How Coffee Waste Fuels 3D-Printed Design

      Ancient three-eyed ‘sea moth’ used its butt to breathe

      Intelligence on Earth Evolved Independently at Least Twice

      Nothing is stronger than quantum connections – and now we know why

    • AI

      Google DeepMind’s new AI agent cracks real-world problems better than humans can

      Study shows vision-language models can’t handle queries with negation words | Ztoog

      How a new type of AI is helping police skirt facial recognition bans

      Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

      How to build a better AI benchmark

    • Crypto

      Is Bitcoin Bull Run Back? Daily RSI Shows Only Mild Bullish Momentum

      Robinhood grows its footprint in Canada by acquiring WonderFi

      HashKey Group Announces Launch of HashKey Global MENA with VASP License in UAE

      Ethereum Breaks Key Resistance In One Massive Move – Higher High Confirms Momentum

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

    Ztoog
    Home » This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving
    AI

    This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving

    Facebook Twitter Pinterest WhatsApp
    This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large Language Models are the newest introduction within the Artificial Intelligence neighborhood, which has taken the world by storm. These fashions, resulting from their unbelievable capabilities, are being utilized by everybody, be it researchers, scientists and even college students. With their human-imitating potential to reply questions, generate content material, summarise textual content, full codes and so on, these fashions have come a good distance. 

    LLMs are wanted in various domains, together with sentiment evaluation, clever chatbots, and content material creation. These fashions utilise lots of computational energy, due to which GPU sources are successfully used to extend throughput. This is finished by batching a number of consumer requests, and to additional enhance reminiscence effectivity and computing capability, LLM quantisation methods are used. However, present quantisation approaches, like 8-bit weight-activation quantisation, don’t actually reap the benefits of what newer GPUs can accomplish. Since the integer operators on these GPUs are 4-bit, the present quantisation methods aren’t designed for most effectivity. 

    To deal with this challenge, a group of researchers has launched Atom, a brand new technique that maximises the serving throughput of LLMs. Atom is a low-bit quantisation approach created to extend throughput considerably with out sacrificing precision. It makes use of low-bit operators and low-bit quantisation to cut back reminiscence utilization with a view to obtain this. It makes use of a particular mixture of fine-grained and mixed-precision quantisation to retain wonderful accuracy.

    The group has shared that Atom has been evaluated by way of 4-bit weight-activation quantisation configurations when serving. The outcomes demonstrated that Atom can preserve latency throughout the identical objective vary whereas bettering end-to-end throughput by as much as 7.73 instances when in comparison with the everyday 16-bit floating-point (FP16) method and 2.53 instances when in comparison with 8-bit integer (INT8) quantisation. This makes Atom a viable resolution for catering to the growing demand for their companies as a result of it maintains the specified stage of response time and vastly will increase the velocity at which LLMs can course of requests.

    The researchers have summarised the first contributions as follows.

    1. LLM serving has been completely analysed as step one within the examine’s efficiency evaluation. The vital efficiency advantages that come from utilizing low-bit weight-activation quantisation approaches have been recognized.
    1. A distinctive and exact low-bit weight-activation quantisation approach referred to as Atom has been introduced. 
    1. The group has shared that Atom employs quite a lot of methods to ensure peak efficiency. It makes use of blended precision, which makes use of decreased precision for the remaining key activations and weights whereas sustaining accuracy for the previous. Fine-grained group quantisation has been used to cut back errors through the quantisation course of.
    1. Atom employs dynamic activation quantisation, which reduces quantisation errors by adjusting to the distinctive distribution of every enter. To additional enhance total efficiency, the strategy moreover takes care of the KV-cache’s quantisation. 
    1. The analysis has additionally proposed an built-in framework for long-term administration (LLM) servicing. The group has codesigned an efficient inference system, developing low-bit GPU kernels and displaying off Atom’s helpful end-to-end throughput and latency in an precise setting.
    1. Atom’s efficiency has been completely assessed, which reveals that Atom vastly will increase LLM serving throughput, with throughput good points of as much as 7.7x attainable on the expense of a minuscule lack of accuracy.

    Check out the Paper. All credit score for this analysis goes to the researchers of this mission. Also, don’t overlook to hitch our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI tasks, and extra.

    If you want our work, you’ll love our publication..

    We are additionally on Telegram and WhatsApp.


    Tanya Malhotra is a remaining 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
    She is a Data Science fanatic with good analytical and essential pondering, together with an ardent curiosity in buying new abilities, main teams, and managing work in an organized method.


    🔥 Meet Retouch4me: A Family of Artificial Intelligence-Powered Plug-Ins for Photography Retouching

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    AI

    Study shows vision-language models can’t handle queries with negation words | Ztoog

    AI

    How a new type of AI is helping police skirt facial recognition bans

    AI

    Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    Google Maps Updates Enhance Navigation And AI Integration On Android And iOS

    Google Maps, the dominant participant within the navigation and mapping area, continues to evolve with…

    Mobile

    Summer House season 8: Release date, cast, and the latest rumors

    Bravo’s Summer House is ready to make its comeback with the eighth season subsequent month.…

    Mobile

    FluHorse malware attacks Android phones stealing personal data including passwords

    A “pressure” of malware infecting Android units known as FluHorse has been found by Check…

    Crypto

    Why Solana, Polygon and Aptos expect the enterprise to drive mass adoption

    Also: Is a multichain world the reply to a few of web3’s largest issues? Jacquelyn…

    AI

    Five ways criminals are using AI

    That’s as a result of AI corporations have put in place numerous safeguards to forestall…

    Our Picks
    Crypto

    Finance Guru Reveals Why Bitcoin Is The ‘Perfect Asset At The Right Time’

    Technology

    OnePlus 10R 5G, Poco F4 5G to Samsung Galaxy A34 5G- Technology News, Firstpost

    Gadgets

    “AI took my job, literally”—Gizmodo fires Spanish staff amid switch to AI translator

    Categories
    • AI (1,486)
    • Crypto (1,748)
    • Gadgets (1,799)
    • Mobile (1,843)
    • Science (1,858)
    • Technology (1,794)
    • The Future (1,640)
    Most Popular
    Crypto

    Ethereum Bearish Falling Wedge Pattern Appears, How Low Can Price Go?

    AI

    Text-to-image generation in any style – Google Research Blog

    Crypto

    Bitcoin Falls Under $35,000 But 88% Of Supply Remains Unmoved

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.