Close Menu
Ztoog
    What's Hot
    Gadgets

    TV-focused YouTube update brings AI upscaling, shopping QR codes

    Science

    Physicists created an imaginary magnetic field in real life

    Gadgets

    Get a near-mint iPhone XR for just $249.99

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » Computer vision system marries image recognition and generation | Ztoog
    AI

    Computer vision system marries image recognition and generation | Ztoog

    Facebook Twitter Pinterest WhatsApp
    Computer vision system marries image recognition and generation | Ztoog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Computers possess two outstanding capabilities with respect to pictures: They can each establish them and generate them anew. Historically, these features have stood separate, akin to the disparate acts of a chef who is nice at creating dishes (generation), and a connoisseur who is nice at tasting dishes (recognition).

    Yet, one can’t assist however surprise: What would it not take to orchestrate a harmonious union between these two distinctive capacities? Both chef and connoisseur share a typical understanding within the style of the meals. Similarly, a unified vision system requires a deep understanding of the visible world.

    Now, researchers in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have educated a system to deduce the lacking components of an image, a process that requires deep comprehension of the image’s content material. In efficiently filling within the blanks, the system, often known as the Masked Generative Encoder (MAGE), achieves two targets on the similar time: precisely figuring out photographs and creating new ones with placing resemblance to actuality. 

    This dual-purpose system allows myriad potential purposes, like object identification and classification inside photographs, swift studying from minimal examples, the creation of photographs beneath particular circumstances like textual content or class, and enhancing current photographs.

    Unlike different methods, MAGE would not work with uncooked pixels. Instead, it converts photographs into what’s known as “semantic tokens,” that are compact, but abstracted, variations of an image part. Think of those tokens as mini jigsaw puzzle items, every representing a 16×16 patch of the unique image. Just as phrases kind sentences, these tokens create an abstracted model of an image that can be utilized for advanced processing duties, whereas preserving the knowledge within the authentic image. Such a tokenization step will be educated inside a self-supervised framework, permitting it to pre-train on giant image datasets with out labels. 

    Now, the magic begins when MAGE makes use of “masked token modeling.” It randomly hides a few of these tokens, creating an incomplete puzzle, and then trains a neural community to fill within the gaps. This approach, it learns to each perceive the patterns in an image (image recognition) and generate new ones (image generation).

    “One remarkable part of MAGE is its variable masking strategy during pre-training, allowing it to train for either task, image generation or recognition, within the same system,” says Tianhong Li, a PhD pupil in electrical engineering and laptop science at MIT, a CSAIL affiliate, and the lead creator on a paper concerning the analysis. “MAGE’s ability to work in the ‘token space’ rather than ‘pixel space’ results in clear, detailed, and high-quality image generation, as well as semantically rich image representations. This could hopefully pave the way for advanced and integrated computer vision models.” 

    Apart from its capacity to generate practical photographs from scratch, MAGE additionally permits for conditional image generation. Users can specify sure standards for the photographs they need MAGE to generate, and the software will cook dinner up the suitable image. It’s additionally able to image enhancing duties, comparable to eradicating components from an image whereas sustaining a sensible look.

    Recognition duties are one other sturdy swimsuit for MAGE. With its capacity to pre-train on giant unlabeled datasets, it will probably classify photographs utilizing solely the discovered representations. Moreover, it excels at few-shot studying, attaining spectacular outcomes on giant image datasets like ImageNet with solely a handful of labeled examples.

    The validation of MAGE’s efficiency has been spectacular. On one hand, it set new data in producing new photographs, outperforming earlier fashions with a major enchancment. On the opposite hand, MAGE topped in recognition duties, attaining an 80.9 % accuracy in linear probing and a 71.9 % 10-shot accuracy on ImageNet (this implies it appropriately recognized photographs in 71.9 % of instances the place it had solely 10 labeled examples from every class).

    Despite its strengths, the analysis workforce acknowledges that MAGE is a piece in progress. The strategy of changing photographs into tokens inevitably results in some lack of info. They are eager to discover methods to compress photographs with out dropping essential particulars in future work. The workforce additionally intends to check MAGE on bigger datasets. Future exploration would possibly embody coaching MAGE on bigger unlabeled datasets, doubtlessly resulting in even higher efficiency. 

    “It has been a long dream to achieve image generation and image recognition in one single system. MAGE is a groundbreaking research which successfully harnesses the synergy of these two tasks and achieves the state-of-the-art of them in one single system,” says Huisheng Wang, senior employees software program engineer of people and interactions within the Research and Machine Intelligence division at Google, who was not concerned within the work. “This innovative system has wide-ranging applications, and has the potential to inspire many future works in the field of computer vision.” 

    Li wrote the paper together with Dina Katabi, the Thuan and Nicole Pham Professor within the MIT Department of Electrical Engineering and Computer Science and a CSAIL principal investigator; Huiwen Chang, a senior analysis scientist at Google; Shlok Kumar Mishra, a University of Maryland PhD pupil and Google Research intern; Han Zhang, a senior analysis scientist at Google; and Dilip Krishnan, a employees analysis scientist at Google. Computational assets had been offered by Google Cloud Platform and the MIT-IBM Watson Research Collaboration. The workforce’s analysis was offered on the 2023 Conference on Computer Vision and Pattern Recognition.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Science

    NASA starts building ice-hunting Moon rover

    Enlarge / Artist’s idea of the VIPER rover working in lunar darkness. The seek for…

    The Future

    Macquarie cuts Paytm target on ‘serious risk of exodus of customers’

    Macquarie dramatically minimize its 12-month worth target on One97 Communications, the mum or dad firm…

    Mobile

    Samsung Galaxy A14 5G seems to be receiving One UI 6 update based on Android 14

    Samsung has been updating a great deal of smartphones to One UI 6 based on…

    Technology

    The US ITC says smartphones made by Lenovo's Motorola Mobility infringe 5G patents owned by Ericsson; a final ruling is scheduled for April 2025 (Blake Brittain/Reuters)

    Blake Brittain / Reuters: The US ITC says smartphones made by Lenovo’s Motorola Mobility infringe…

    Gadgets

    Motorola Edge Plus (2023) Hands-On

    The successor of the Edge Plus (2022) and the Edge 30 Ultra (2022), the brand…

    Our Picks
    Science

    The Race to Put Brain Implants in People Is Heating Up

    Gadgets

    The best early deals for Prime Day 2 in 2023: Robot vacs, AirPods, and more

    Crypto

    Is Bitcoin Dominance Reaching Its Peak? Analyst Forecasts An Altcoin Takeover In Current Cycle

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    Science

    Why Isaac Newton’s laws still give physicists a lot to think about

    AI

    Meet MovieChat: An Innovative Video Understanding System that Integrates Video Foundation Models and Large Language Models

    Crypto

    What is Cryptocurrency and How Does it Work?

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.