Close Menu
Ztoog
    What's Hot
    Crypto

    Is Cardano Poised for A Surge? A Look At Its Tight Consolidation

    Mobile

    Microsoft Office Pro for just $34.97 is the ultimate back-to-school deal

    Gadgets

    The new spreadsheet? OpenAI introduces ChatGPT Enterprise for businesses

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      iOS 19: All the rumored changes Apple could be bringing to its new operating system

      Today’s NYT Mini Crossword Answers for June 7

      ScanWatch Nova Brilliant – 30-day battery meets luxury design

      How to Get Bot Lobbies in Fortnite? (2025 Guide)

      Can work-life balance tracking improve well-being?

    • Technology

      I Played With the ROG Xbox Ally, the Upcoming Xbox Handheld

      Human-Centered AI, Spatial Intelligence, and the Future of Practice – O’Reilly

      Celebrating Engineering Pioneers at IEEE VIC Summit

      What does a millennial midlife crisis look like?

      Elon Musk tries to stick to spaceships

    • Gadgets

      6 Best Organic Sheets (2025), Tested and Reviewed

      Nintendo Switch 2’s faster chip can dramatically improve original Switch games

      Nothing Phone 3 Officially Set To Launch On July 1st

      Watch Apple’s WWDC 2025 keynote right here

      Future-proof your career by mastering AI skills for just $20

    • Mobile

      Catch huge YouTube TV deals for NFL Sunday Ticket before the 2025 season

      Huawei Watch 5 review – GSMArena.com news

      Follow these warnings from the FBI and New York Police so you don’t get scammed

      Samsung Galaxy S25 vs Google Pixel 9 deals

      YouTube is testing a leaderboard to show off top live stream fans

    • Science

      A New Law of Nature Attempts to Explain the Complexity of the Universe

      Could we build space-time computers that run on gravity?

      Why it’s taking a century to pin down the speed of the universe

      Some parts of Trump’s proposed budget for NASA are literally draconian

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

    • AI

      AI stirs up the recipe for concrete in MIT study | Ztoog

      Manus has kick-started an AI agent boom in China

      Teaching AI models what they don’t know | Ztoog

      Fueling seamless AI at scale

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

    • Crypto

      $106K Bitcoin A ‘Safer’ Buy Than $25K—XRP Lawyer Drops Bombshell

      JPMorgan Chase set to accept Bitcoin, crypto ETFs as loan collateral

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

    Ztoog
    Home » This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving
    AI

    This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving

    Facebook Twitter Pinterest WhatsApp
    This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large Language Models are the newest introduction within the Artificial Intelligence neighborhood, which has taken the world by storm. These fashions, resulting from their unbelievable capabilities, are being utilized by everybody, be it researchers, scientists and even college students. With their human-imitating potential to reply questions, generate content material, summarise textual content, full codes and so on, these fashions have come a good distance. 

    LLMs are wanted in various domains, together with sentiment evaluation, clever chatbots, and content material creation. These fashions utilise lots of computational energy, due to which GPU sources are successfully used to extend throughput. This is finished by batching a number of consumer requests, and to additional enhance reminiscence effectivity and computing capability, LLM quantisation methods are used. However, present quantisation approaches, like 8-bit weight-activation quantisation, don’t actually reap the benefits of what newer GPUs can accomplish. Since the integer operators on these GPUs are 4-bit, the present quantisation methods aren’t designed for most effectivity. 

    To deal with this challenge, a group of researchers has launched Atom, a brand new technique that maximises the serving throughput of LLMs. Atom is a low-bit quantisation approach created to extend throughput considerably with out sacrificing precision. It makes use of low-bit operators and low-bit quantisation to cut back reminiscence utilization with a view to obtain this. It makes use of a particular mixture of fine-grained and mixed-precision quantisation to retain wonderful accuracy.

    The group has shared that Atom has been evaluated by way of 4-bit weight-activation quantisation configurations when serving. The outcomes demonstrated that Atom can preserve latency throughout the identical objective vary whereas bettering end-to-end throughput by as much as 7.73 instances when in comparison with the everyday 16-bit floating-point (FP16) method and 2.53 instances when in comparison with 8-bit integer (INT8) quantisation. This makes Atom a viable resolution for catering to the growing demand for their companies as a result of it maintains the specified stage of response time and vastly will increase the velocity at which LLMs can course of requests.

    The researchers have summarised the first contributions as follows.

    1. LLM serving has been completely analysed as step one within the examine’s efficiency evaluation. The vital efficiency advantages that come from utilizing low-bit weight-activation quantisation approaches have been recognized.
    1. A distinctive and exact low-bit weight-activation quantisation approach referred to as Atom has been introduced. 
    1. The group has shared that Atom employs quite a lot of methods to ensure peak efficiency. It makes use of blended precision, which makes use of decreased precision for the remaining key activations and weights whereas sustaining accuracy for the previous. Fine-grained group quantisation has been used to cut back errors through the quantisation course of.
    1. Atom employs dynamic activation quantisation, which reduces quantisation errors by adjusting to the distinctive distribution of every enter. To additional enhance total efficiency, the strategy moreover takes care of the KV-cache’s quantisation. 
    1. The analysis has additionally proposed an built-in framework for long-term administration (LLM) servicing. The group has codesigned an efficient inference system, developing low-bit GPU kernels and displaying off Atom’s helpful end-to-end throughput and latency in an precise setting.
    1. Atom’s efficiency has been completely assessed, which reveals that Atom vastly will increase LLM serving throughput, with throughput good points of as much as 7.7x attainable on the expense of a minuscule lack of accuracy.

    Check out the Paper. All credit score for this analysis goes to the researchers of this mission. Also, don’t overlook to hitch our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI tasks, and extra.

    If you want our work, you’ll love our publication..

    We are additionally on Telegram and WhatsApp.


    Tanya Malhotra is a remaining 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
    She is a Data Science fanatic with good analytical and essential pondering, together with an ardent curiosity in buying new abilities, main teams, and managing work in an organized method.


    🔥 Meet Retouch4me: A Family of Artificial Intelligence-Powered Plug-Ins for Photography Retouching

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    AI stirs up the recipe for concrete in MIT study | Ztoog

    AI

    Manus has kick-started an AI agent boom in China

    AI

    Teaching AI models what they don’t know | Ztoog

    AI

    Fueling seamless AI at scale

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    Doctors have more difficulty diagnosing disease when looking at images of darker skin | Ztoog

    When diagnosing skin ailments primarily based solely on images of a affected person’s skin, medical…

    Technology

    George Santos used Congress to become the ultimate reality TV star

    Despite going through 23 federal costs for a litany of crimes, together with the variety…

    The Future

    Hallmark Channel: Watch the Valentine’s Day Lineup Without Cable

    If there’s one factor you may depend on the Hallmark Channel for, it is all…

    The Future

    Yes, in my backyard | Ztoog

    Welcome to the Ztoog Exchange, a weekly startups-and-markets e-newsletter. It’s impressed by the every day…

    AI

    Revolutionizing Panoptic Segmentation with FC-CLIP: A Unified Single-Stage Artificial Intelligence AI Framework

    Image segmentation is a elementary laptop imaginative and prescient activity the place a picture is…

    Our Picks
    Mobile

    Samsung announces the world’s first portable projector with cloud gaming built-in

    Technology

    OnePlus 12 could arrive globally in January with a surprise guest in tow –

    Mobile

    iQOO Z9 Lite’s processor and memory configuration revealed by Amazon

    Categories
    • AI (1,497)
    • Crypto (1,756)
    • Gadgets (1,809)
    • Mobile (1,856)
    • Science (1,871)
    • Technology (1,807)
    • The Future (1,653)
    Most Popular
    Mobile

    Google Pixel Tablet vs. OnePlus Pad: One’s utilitarian, the other is for productivity

    Gadgets

    33 Best Nintendo Switch Games for Every Player (2024)

    Science

    Lunar eclipse 2023: October blood moon captured in stunning images around the world

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.