Close Menu
Ztoog
    What's Hot
    Crypto

    Crypto Market Expert Identifies The Trigger For 600% Bitcoin Price Surge

    Crypto

    Gold Rush Over? $25 Billion Bitcoin ETFs Outshine Precious Metal, Luring Investors

    Mobile

    T-Mobile hacked in massive breach by China-linked hackers

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Are entangled qubits following a quantum Moore’s law?

      Disneyland’s 70th Anniversary Brings Cartoony Chaos to This Summer’s Celebration

      Story of military airfield in Afghanistan that Biden left in 2021

      Tencent hires WizardLM team, a Microsoft AI group with an odd history

      Today’s NYT Connections Hints, Answers for May 12, #701

    • Technology

      Crypto elite increasingly worried about their personal safety

      Deep dive on the evolution of Microsoft's relationship with OpenAI, from its $1B investment in 2019 through Copilot rollouts and ChatGPT's launch to present day (Bloomberg)

      New leak reveals iPhone Fold won’t look like the Galaxy Z Fold 6 at all

      Apple will use AI and user data in iOS 19 to extend iPhone battery life

      Today’s NYT Wordle Hints, Answer and Help for May 12, #1423

    • Gadgets

      We Hand-Picked the 24 Best Deals From the 2025 REI Anniversary Sale

      “Google wanted that”: Nextcloud decries Android permissions as “gatekeeping”

      Google Tests Automatic Password-to-Passkey Conversion On Android

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

    • Mobile

      The Forerunner 570 & 970 have made Garmin’s tiered strategy clearer than ever

      The iPhone Fold is now being tested with an under-display camera

      T-Mobile takes over one of golf’s biggest events, unleashes unique experiences

      Fitbit’s AI experiments just leveled up with 3 new health tracking features

      Motorola’s Moto Watch needs to start living up to the brand name

    • Science

      Do these Buddhist gods hint at the purpose of China’s super-secret satellites?

      From Espresso to Eco-Brick: How Coffee Waste Fuels 3D-Printed Design

      Ancient three-eyed ‘sea moth’ used its butt to breathe

      Intelligence on Earth Evolved Independently at Least Twice

      Nothing is stronger than quantum connections – and now we know why

    • AI

      With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

      Google DeepMind’s new AI agent cracks real-world problems better than humans can

      Study shows vision-language models can’t handle queries with negation words | Ztoog

      How a new type of AI is helping police skirt facial recognition bans

      Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

    • Crypto

      Is Bitcoin Bull Run Back? Daily RSI Shows Only Mild Bullish Momentum

      Robinhood grows its footprint in Canada by acquiring WonderFi

      HashKey Group Announces Launch of HashKey Global MENA with VASP License in UAE

      Ethereum Breaks Key Resistance In One Massive Move – Higher High Confirms Momentum

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

    Ztoog
    Home » This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving
    AI

    This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving

    Facebook Twitter Pinterest WhatsApp
    This AI Research Introduces Atom: A Low-Bit Quantization Technique for Efficient and Accurate Large Language Model (LLM) Serving
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large Language Models are the newest introduction within the Artificial Intelligence neighborhood, which has taken the world by storm. These fashions, resulting from their unbelievable capabilities, are being utilized by everybody, be it researchers, scientists and even college students. With their human-imitating potential to reply questions, generate content material, summarise textual content, full codes and so on, these fashions have come a good distance. 

    LLMs are wanted in various domains, together with sentiment evaluation, clever chatbots, and content material creation. These fashions utilise lots of computational energy, due to which GPU sources are successfully used to extend throughput. This is finished by batching a number of consumer requests, and to additional enhance reminiscence effectivity and computing capability, LLM quantisation methods are used. However, present quantisation approaches, like 8-bit weight-activation quantisation, don’t actually reap the benefits of what newer GPUs can accomplish. Since the integer operators on these GPUs are 4-bit, the present quantisation methods aren’t designed for most effectivity. 

    To deal with this challenge, a group of researchers has launched Atom, a brand new technique that maximises the serving throughput of LLMs. Atom is a low-bit quantisation approach created to extend throughput considerably with out sacrificing precision. It makes use of low-bit operators and low-bit quantisation to cut back reminiscence utilization with a view to obtain this. It makes use of a particular mixture of fine-grained and mixed-precision quantisation to retain wonderful accuracy.

    The group has shared that Atom has been evaluated by way of 4-bit weight-activation quantisation configurations when serving. The outcomes demonstrated that Atom can preserve latency throughout the identical objective vary whereas bettering end-to-end throughput by as much as 7.73 instances when in comparison with the everyday 16-bit floating-point (FP16) method and 2.53 instances when in comparison with 8-bit integer (INT8) quantisation. This makes Atom a viable resolution for catering to the growing demand for their companies as a result of it maintains the specified stage of response time and vastly will increase the velocity at which LLMs can course of requests.

    The researchers have summarised the first contributions as follows.

    1. LLM serving has been completely analysed as step one within the examine’s efficiency evaluation. The vital efficiency advantages that come from utilizing low-bit weight-activation quantisation approaches have been recognized.
    1. A distinctive and exact low-bit weight-activation quantisation approach referred to as Atom has been introduced. 
    1. The group has shared that Atom employs quite a lot of methods to ensure peak efficiency. It makes use of blended precision, which makes use of decreased precision for the remaining key activations and weights whereas sustaining accuracy for the previous. Fine-grained group quantisation has been used to cut back errors through the quantisation course of.
    1. Atom employs dynamic activation quantisation, which reduces quantisation errors by adjusting to the distinctive distribution of every enter. To additional enhance total efficiency, the strategy moreover takes care of the KV-cache’s quantisation. 
    1. The analysis has additionally proposed an built-in framework for long-term administration (LLM) servicing. The group has codesigned an efficient inference system, developing low-bit GPU kernels and displaying off Atom’s helpful end-to-end throughput and latency in an precise setting.
    1. Atom’s efficiency has been completely assessed, which reveals that Atom vastly will increase LLM serving throughput, with throughput good points of as much as 7.7x attainable on the expense of a minuscule lack of accuracy.

    Check out the Paper. All credit score for this analysis goes to the researchers of this mission. Also, don’t overlook to hitch our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI tasks, and extra.

    If you want our work, you’ll love our publication..

    We are additionally on Telegram and WhatsApp.


    Tanya Malhotra is a remaining 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
    She is a Data Science fanatic with good analytical and essential pondering, together with an ardent curiosity in buying new abilities, main teams, and managing work in an organized method.


    🔥 Meet Retouch4me: A Family of Artificial Intelligence-Powered Plug-Ins for Photography Retouching

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    AI

    Study shows vision-language models can’t handle queries with negation words | Ztoog

    AI

    How a new type of AI is helping police skirt facial recognition bans

    AI

    Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Technology

    Why Casey Left Substack, Elon Musk and Drugs, and an A.I. Antibiotic Discovery

    Listen and comply with ‘Hard Fork’Apple | Spotify | Amazon | YouTubeCasey is taking his…

    Science

    FBI raids home of prominent computer scientist who has gone incommunicado

    A prominent computer scientist who has spent 20 years publishing educational papers on cryptography, privateness,…

    Crypto

    MicroStrategy Spends Another $615 Million On Bitcoin, Do They Know Something You Don’t?

    In a strategic transfer that has successfully caught the eye of the crypto area, Microstrategy…

    The Future

    Nvidia to cap game streaming hours on GeForce Now instead of raising fees

    Nvidia is asserting some huge modifications to its GeForce Now cloud streaming service immediately. The…

    AI

    There’s never been a more important time for AI policy

    Thanks to the joy round generative AI, the expertise has turn into a kitchen desk…

    Our Picks
    Crypto

    Spot Bitcoin ETFs Rocked By Outflows, BTC Price Succumbs To Bears

    AI

    This AI Paper from ETH Zurich, Google, and Max Plank Proposes an Effective AI Strategy to Boost the Performance of Reward Models for RLHF (Reinforcement Learning from Human Feedback)

    Gadgets

    The Futuristic Healthcare Smaty Bed: AIGEM-Y300

    Categories
    • AI (1,487)
    • Crypto (1,748)
    • Gadgets (1,799)
    • Mobile (1,844)
    • Science (1,858)
    • Technology (1,795)
    • The Future (1,641)
    Most Popular
    Technology

    USA vs. Jamaica Livestream: How to Watch CONCACAF Gold Cup 2023 Soccer From Anywhere

    Technology

    NASA’s Starliner decision was the right one, but it’s a crushing blow for Boeing

    AI

    Robots that learn as they fail could unlock a new era of AI

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.