Close Menu
Ztoog
    What's Hot
    Science

    World’s tiniest particle accelerator fits on a coin

    Technology

    Amazon’s Galaxy Z Flip 5 pre-order deal delivers $270 of freebies

    The Future

    Samsung shows off Zoom Anyplace camera likely coming to Galaxy S24 Ultra

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      SEC Vs. Justin Sun Case Ends In $10M Settlement

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

    Ztoog
    Home » AI model speeds up high-resolution computer vision | Ztoog
    AI

    AI model speeds up high-resolution computer vision | Ztoog

    Facebook Twitter Pinterest WhatsApp
    AI model speeds up high-resolution computer vision | Ztoog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    An autonomous car should quickly and precisely acknowledge objects that it encounters, from an idling supply truck parked on the nook to a bike owner whizzing towards an approaching intersection.

    To do that, the car would possibly use a strong computer vision model to categorize each pixel in a high-resolution picture of this scene, so it doesn’t lose sight of objects that is likely to be obscured in a lower-quality picture. But this activity, generally known as semantic segmentation, is advanced and requires an enormous quantity of computation when the picture has excessive decision.

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a extra environment friendly computer vision model that vastly reduces the computational complexity of this activity. Their model can carry out semantic segmentation precisely in real-time on a tool with restricted {hardware} sources, such because the on-board computer systems that allow an autonomous car to make split-second choices.

    Recent state-of-the-art semantic segmentation fashions immediately study the interplay between every pair of pixels in a picture, so their calculations develop quadratically as picture decision will increase. Because of this, whereas these fashions are correct, they’re too sluggish to course of high-resolution photos in actual time on an edge machine like a sensor or cell phone.

    The MIT researchers designed a brand new constructing block for semantic segmentation fashions that achieves the identical skills as these state-of-the-art fashions, however with solely linear computational complexity and hardware-efficient operations.

    The result’s a brand new model sequence for high-resolution computer vision that performs up to 9 occasions quicker than prior fashions when deployed on a cell machine. Importantly, this new model sequence exhibited the identical or higher accuracy than these alternate options.

    Not solely might this system be used to assist autonomous autos make choices in real-time, it might additionally enhance the effectivity of different high-resolution computer vision duties, equivalent to medical picture segmentation.

    “While researchers have been using traditional vision transformers for quite a long time, and they give amazing results, we want people to also pay attention to the efficiency aspect of these models. Our work shows that it is possible to drastically reduce the computation so this real-time image segmentation can happen locally on a device,” says Song Han, an affiliate professor within the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and senior writer of the paper describing the brand new model.

    He is joined on the paper by lead writer Han Cai, an EECS graduate scholar; Junyan Li, an undergraduate at Zhejiang University; Muyan Hu, an undergraduate scholar at Tsinghua University; and Chuang Gan, a principal analysis employees member on the MIT-IBM Watson AI Lab. The analysis can be offered on the International Conference on Computer Vision.

    A simplified answer

    Categorizing each pixel in a high-resolution picture that will have tens of millions of pixels is a tough activity for a machine-learning model. A robust new sort of model, generally known as a vision transformer, has lately been used successfully.

    Transformers have been initially developed for pure language processing. In that context, they encode every phrase in a sentence as a token after which generate an consideration map, which captures every token’s relationships with all different tokens. This consideration map helps the model perceive context when it makes predictions.

    Using the identical idea, a vision transformer chops a picture into patches of pixels and encodes every small patch right into a token earlier than producing an consideration map. In producing this consideration map, the model makes use of a similarity perform that immediately learns the interplay between every pair of pixels. In this manner, the model develops what is named a world receptive discipline, which suggests it could possibly entry all of the related components of the picture.

    Since a high-resolution picture might include tens of millions of pixels, chunked into 1000’s of patches, the eye map rapidly turns into huge. Because of this, the quantity of computation grows quadratically because the decision of the picture will increase.

    In their new model sequence, known as EfficientViT, the MIT researchers used a less complicated mechanism to construct the eye map — changing the nonlinear similarity perform with a linear similarity perform. As such, they’ll rearrange the order of operations to cut back whole calculations with out altering performance and shedding the worldwide receptive discipline. With their model, the quantity of computation wanted for a prediction grows linearly because the picture decision grows.

    “But there is no free lunch. The linear attention only captures global context about the image, losing local information, which makes the accuracy worse,” Han says.

    To compensate for that accuracy loss, the researchers included two additional parts of their model, every of which provides solely a small quantity of computation.

    One of these parts helps the model seize native characteristic interactions, mitigating the linear perform’s weak point in native data extraction. The second, a module that allows multiscale studying, helps the model acknowledge each giant and small objects.

    “The most critical part here is that we need to carefully balance the performance and the efficiency,” Cai says.

    They designed EfficientViT with a hardware-friendly structure, so it might be simpler to run on several types of gadgets, equivalent to digital actuality headsets or the sting computer systems on autonomous autos. Their model may be utilized to different computer vision duties, like picture classification.

    Streamlining semantic segmentation

    When they examined their model on datasets used for semantic segmentation, they discovered that it carried out up to 9 occasions quicker on a Nvidia graphics processing unit (GPU) than different standard vision transformer fashions, with the identical or higher accuracy.

    “Now, we can get the best of both worlds and reduce the computing to make it fast enough that we can run it on mobile and cloud devices,” Han says.

    Building off these outcomes, the researchers wish to apply this system to hurry up generative machine-learning fashions, equivalent to these used to generate new photos. They additionally wish to proceed scaling up EfficientViT for different vision duties.

    “Efficient transformer models, pioneered by Professor Song Han’s team, now form the backbone of cutting-edge techniques in diverse computer vision tasks, including detection and segmentation,” says Lu Tian, senior director of AI algorithms at AMD, Inc., who was not concerned with this paper. “Their research not only showcases the efficiency and capability of transformers, but also reveals their immense potential for real-world applications, such as enhancing image quality in video games.”

    “Model compression and light-weight model design are crucial research topics toward efficient AI computing, especially in the context of large foundation models. Professor Song Han’s group has shown remarkable progress compressing and accelerating modern deep learning models, particularly vision transformers,” provides Jay Jackson, world vp of synthetic intelligence and machine studying at Oracle, who was not concerned with this analysis. “Oracle Cloud Infrastructure has been supporting his team to advance this line of impactful research toward efficient and green AI.”

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    Realme 12 Pro Plus 5G review

    Realme’s front-runner for its newest 12 collection is the 12 Pro+ 5G, and it’s simply…

    Crypto

    $1 Billion Erased In Liquidations As Bitcoin Dives To 2-Month Low

    Bitcoin and different cryptocurrencies plummeted early on Friday morning, with digital property persevering with to…

    The Future

    The Top College Towns for Mobile Gaming in the US, Ranked by Ookla

    As the educational 12 months kicks off for many American colleges, cell speed-tracking firm Ookla…

    The Future

    Mistral AI releases new model to rival GPT-4 and its own chat assistant

    Mistral AI, a Paris-based AI startup, has introduced its own different to OpenAI and Anthropic…

    AI

    The power of App Inventor: Democratizing possibilities for mobile applications | Ztoog

    In June 2007, Apple unveiled the primary iPhone. But the corporate made a strategic choice…

    Our Picks
    Technology

    Best Mini Fridge for Beer in 2023

    Crypto

    Ethereum Holds Multi-Year Bullish Structure – Time For A Comeback?

    Technology

    To use Nothing's new Nothing Chats, users must connect their iCloud account to send iMessages, run from a virtual Mac mini, which may weaken data security (Ryan McNeal/Android Authority)

    Categories
    • AI (1,560)
    • Crypto (1,827)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    The Future

    Should we worry AI will create deadly bioweapons? Not yet, but one day

    Gadgets

    How to Back Up Your Digital Life (2023): Hard Drives, Cloud-Based Tools, and Tips

    AI

    Researchers from the University of Amsterdam and Qualcomm AI Presents VeRA: A Novel Finetuning AI Method that Reduces the Number of Trainable Parameters by 10x Compared to LoRA

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.