Close Menu
Ztoog
    What's Hot
    Science

    Chaotically bouncing planets could be a sign of advanced aliens

    Crypto

    SEC Throws Cold Water On Bitcoin ETF Hopes With Reissuance Of FOMO Warning

    Mobile

    Google officially shutters Google Podcasts in the US

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How to Get Bot Lobbies in Fortnite? (2025 Guide)

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

    • Technology

      What does a millennial midlife crisis look like?

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

    • Gadgets

      Watch Apple’s WWDC 2025 keynote right here

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

    • Mobile

      YouTube is testing a leaderboard to show off top live stream fans

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

    • Science

      Some parts of Trump’s proposed budget for NASA are literally draconian

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

    • AI

      Fueling seamless AI at scale

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » Researchers Shanghai AI Lab and SenseTime Propose MM-Grounding-DINO: An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
    AI

    Researchers Shanghai AI Lab and SenseTime Propose MM-Grounding-DINO: An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

    Facebook Twitter Pinterest WhatsApp
    Researchers Shanghai AI Lab and SenseTime Propose MM-Grounding-DINO: An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Object detection performs a significant function in multi-modal understanding programs, the place pictures are enter into fashions to generate proposals aligned with textual content. This course of is essential for state-of-the-art fashions dealing with Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). OVD fashions are skilled on base classes in zero-shot situations however should predict each base and novel classes inside a broad vocabulary. PG supplies a phrase to explain candidate classes and output corresponding packing containers, whereas REC precisely identifies a goal from textual content and outlines its place utilizing a bounding field. Grounding-DINO addresses OVD, PG, and REC, gaining widespread adoption for various purposes. 

    https://arxiv.org/abs/2401.02361v2

    Researchers from Shanghai AI Lab and SenseTime Research have developed MM-Grounding-DINO, a user-friendly and open-source pipeline created utilizing the MMDetection toolbox. It makes use of various imaginative and prescient datasets for pre-training and a spread of detection and grounding datasets for fine-tuning. A complete evaluation of reported outcomes and detailed settings for reproducibility are supplied. Through intensive experiments on benchmarks, MM-Grounding-DINO-Tiny surpasses the efficiency of the Grounding-DINO-Tiny baseline. 

    https://arxiv.org/abs/2401.02361v2

    MM-Grounding-DINO builds upon the muse of Grounding-DINO. It operates by aligning textual descriptions with corresponding generated bounding packing containers in pictures with assorted shapes. The predominant parts of the MM-Grounding-DINO embody a textual content spine accountable for extracting options from textual content, a picture spine for extracting options from pictures, a function enhancer for thorough fusion of picture and textual content options, a language-guided question choice module for initializing queries, and a cross-modality decoder for refining bounding packing containers.

    When introduced with an image-text pair, MM-Grounding-DINO employs a picture spine to extract options from the picture at varied scales. Simultaneously, a textual content spine extracts options from the accompanying textual content. These extracted options are enter right into a function enhancer module, facilitating cross-modality fusion. Within this module, textual content and picture options endure fusion by means of a Bi-Attention Block, encompassing text-to-image and image-to-text cross-attention layers. Subsequently, the fused options endure additional enhancement by means of vanilla self-attention and deformable self-attention layers, adopted by a Feedforward Network (FFN) layer.

    The examine presents an open, complete pipeline for unified object grounding and detection masking OVD, PG, and REC duties. The mannequin’s efficiency is evaluated by means of a visualization-based evaluation, which reveals inaccuracies within the ground-truth annotations of the analysis dataset. The MM-Grounding-DINO mannequin achieves state-of-the-art efficiency in zero-shot settings on COCO, with a imply common precision (mAP) of 52.5. The MM-Grounding-DINO mannequin additionally outperforms fine-tuned fashions in varied domains, together with marine objects, mind tumor detection, city avenue scenes, and individuals in work, setting new benchmarks for mAP. 

    https://arxiv.org/abs/2401.02361v2

    In conclusion, The examine introduces a complete and open pipeline for unified object grounding and detection, addressing duties like OVD, PG, and REC. The mannequin reveals notable enhancements in mAP throughout varied datasets, equivalent to COCO and LVIS, by means of fine-tuning. The mannequin’s predictions’ precision surpasses present annotations for particular objects. The authors suggest an in depth analysis framework facilitating systematic evaluation throughout various datasets, together with COCO, LVIS, RefCOCOg, Flickr30k Entities, ODinW1335, and Description Detection Dataset (D3).


    Check out the Paper and Github. All credit score for this analysis goes to the researchers of this challenge. Also, don’t neglect to comply with us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our publication..

    Don’t Forget to hitch our Telegram Channel


    Sana Hassan, a consulting intern at Marktechpost and dual-degree pupil at IIT Madras, is captivated with making use of expertise and AI to handle real-world challenges. With a eager curiosity in fixing sensible issues, he brings a recent perspective to the intersection of AI and real-life options.


    🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Fueling seamless AI at scale

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    The best guitar amps in 2023, tested and reviewed

    We could earn income from the merchandise accessible on this web page and take part…

    AI

    Matthew Kearney: Bringing AI and philosophy into dialogue | Ztoog

    Matthew Kearney was drawn to MIT by the tradition of its cross-country staff. Growing up…

    Science

    This new version of quantum theory is even stranger than the original

    “EINSTEIN attacks quantum theory.” That was the headline in The New York Times on 4…

    Crypto

    Illegal Cryptocurrency Mining Operation Shut Down in Malaysia

    Share this text A current crackdown by authorities in Miri, Borneo, led to the seizure…

    Science

    US picks the first two sites for carbon-capture hubs

    On Friday, the US Department of Energy introduced that it selected the first two sites…

    Our Picks
    Science

    Fermilab: Muons are still behaving oddly, which could break particle physics

    Technology

    A student's AI tool wins a global contest to read the text inside a carbonized Herculaneum scroll, which had been unreadable since a volcanic eruption in AD 79 (Jo Marchant/Nature)

    The Future

    Implantable battery is charged up by the body’s oxygen supply

    Categories
    • AI (1,494)
    • Crypto (1,754)
    • Gadgets (1,806)
    • Mobile (1,852)
    • Science (1,868)
    • Technology (1,804)
    • The Future (1,650)
    Most Popular
    The Future

    Google layoffs: Hundreds of employees face job cuts

    The Future

    Backbone teams up with Call of Duty: Warzone Mobile for ultimate gaming portability

    The Future

    Fun Father’s Day Crafts That Kids Can Make for Dad

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.