Close Menu
Ztoog
    What's Hot
    The Future

    The 5 Best Blood Pressure Monitors for 2023: Tested by Our Experts

    Gadgets

    Don’t sit and watch until after snagging one of these 14 early Amazon Prime Day television deals

    Mobile

    Price leak suggests Samsung Galaxy Tab S9 FE ain’t gonna be cheap

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

    • Technology

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » Researchers Shanghai AI Lab and SenseTime Propose MM-Grounding-DINO: An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
    AI

    Researchers Shanghai AI Lab and SenseTime Propose MM-Grounding-DINO: An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

    Facebook Twitter Pinterest WhatsApp
    Researchers Shanghai AI Lab and SenseTime Propose MM-Grounding-DINO: An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Object detection performs a significant function in multi-modal understanding programs, the place pictures are enter into fashions to generate proposals aligned with textual content. This course of is essential for state-of-the-art fashions dealing with Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). OVD fashions are skilled on base classes in zero-shot situations however should predict each base and novel classes inside a broad vocabulary. PG supplies a phrase to explain candidate classes and output corresponding packing containers, whereas REC precisely identifies a goal from textual content and outlines its place utilizing a bounding field. Grounding-DINO addresses OVD, PG, and REC, gaining widespread adoption for various purposes. 

    https://arxiv.org/abs/2401.02361v2

    Researchers from Shanghai AI Lab and SenseTime Research have developed MM-Grounding-DINO, a user-friendly and open-source pipeline created utilizing the MMDetection toolbox. It makes use of various imaginative and prescient datasets for pre-training and a spread of detection and grounding datasets for fine-tuning. A complete evaluation of reported outcomes and detailed settings for reproducibility are supplied. Through intensive experiments on benchmarks, MM-Grounding-DINO-Tiny surpasses the efficiency of the Grounding-DINO-Tiny baseline. 

    https://arxiv.org/abs/2401.02361v2

    MM-Grounding-DINO builds upon the muse of Grounding-DINO. It operates by aligning textual descriptions with corresponding generated bounding packing containers in pictures with assorted shapes. The predominant parts of the MM-Grounding-DINO embody a textual content spine accountable for extracting options from textual content, a picture spine for extracting options from pictures, a function enhancer for thorough fusion of picture and textual content options, a language-guided question choice module for initializing queries, and a cross-modality decoder for refining bounding packing containers.

    When introduced with an image-text pair, MM-Grounding-DINO employs a picture spine to extract options from the picture at varied scales. Simultaneously, a textual content spine extracts options from the accompanying textual content. These extracted options are enter right into a function enhancer module, facilitating cross-modality fusion. Within this module, textual content and picture options endure fusion by means of a Bi-Attention Block, encompassing text-to-image and image-to-text cross-attention layers. Subsequently, the fused options endure additional enhancement by means of vanilla self-attention and deformable self-attention layers, adopted by a Feedforward Network (FFN) layer.

    The examine presents an open, complete pipeline for unified object grounding and detection masking OVD, PG, and REC duties. The mannequin’s efficiency is evaluated by means of a visualization-based evaluation, which reveals inaccuracies within the ground-truth annotations of the analysis dataset. The MM-Grounding-DINO mannequin achieves state-of-the-art efficiency in zero-shot settings on COCO, with a imply common precision (mAP) of 52.5. The MM-Grounding-DINO mannequin additionally outperforms fine-tuned fashions in varied domains, together with marine objects, mind tumor detection, city avenue scenes, and individuals in work, setting new benchmarks for mAP. 

    https://arxiv.org/abs/2401.02361v2

    In conclusion, The examine introduces a complete and open pipeline for unified object grounding and detection, addressing duties like OVD, PG, and REC. The mannequin reveals notable enhancements in mAP throughout varied datasets, equivalent to COCO and LVIS, by means of fine-tuning. The mannequin’s predictions’ precision surpasses present annotations for particular objects. The authors suggest an in depth analysis framework facilitating systematic evaluation throughout various datasets, together with COCO, LVIS, RefCOCOg, Flickr30k Entities, ODinW1335, and Description Detection Dataset (D3).


    Check out the Paper and Github. All credit score for this analysis goes to the researchers of this challenge. Also, don’t neglect to comply with us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our publication..

    Don’t Forget to hitch our Telegram Channel


    Sana Hassan, a consulting intern at Marktechpost and dual-degree pupil at IIT Madras, is captivated with making use of expertise and AI to handle real-world challenges. With a eager curiosity in fixing sensible issues, he brings a recent perspective to the intersection of AI and real-life options.


    🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    Save hundreds on flights with this lifetime travel bundle, now only $59.97

    We could earn income from the merchandise obtainable on this web page and take part…

    Gadgets

    Intel’s CPU branding was already confusing, and today’s new CPUs made it worse

    Enlarge / Intel’s Core chips are right here, and they’ve dropped the i and the…

    AI

    Microsoft Azure AI Widens Model Selection with Llama 2 and GPT-4 Turbo with Vision

    In a latest transfer, Microsoft’s Azure AI platform has expanded its vary by introducing two…

    Technology

    Getting the Right Answer from ChatGPT – O’Reilly

    A few days in the past, I used to be fascinated about what you wanted…

    Technology

    Pixel 8 Pro’s Thermometer app, temperature sensor now work on people

    Ryan Haines / Android AuthorityTL;DR The Pixel 8 Pro’s temperature sensor can lastly measure human…

    Our Picks
    AI

    This AI Paper Introduces XAI-AGE: A Groundbreaking Deep Neural Network for Biological Age Prediction and Insight into Epigenetic Mechanisms

    Gadgets

    The best printers for small businesses to maximize productivity in 2024

    Technology

    TSMC reports Q4 revenue down 1.5% YoY to ~$19.62B and net income down 19.3% YoY to ~$7.56B, both above estimates on the back of weaker macroeconomic conditions (Sheila Chiang/CNBC)

    Categories
    • AI (1,493)
    • Crypto (1,754)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,867)
    • Technology (1,803)
    • The Future (1,649)
    Most Popular
    Technology

    PayPal’s PYUSD stablecoin is now available on Venmo

    Mobile

    The Galaxy Z Flip 5 has me almost ready for a foldable

    Science

    A blast of radio waves hit Earth after travelling for 8 billion years

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.