Close Menu
Ztoog
    What's Hot
    Crypto

    Sona launches its music streaming platform and marketplace to reward fans for buying ‘digital twins’ of songs

    Gadgets

    Best Registries for Weddings and Baby Showers (2023): Advice and Tips

    Mobile

    Google Pixel Tablet vs. OnePlus Pad: One’s utilitarian, the other is for productivity

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks
    AI

    Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks

    Facebook Twitter Pinterest WhatsApp
    Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    There has been a marked motion within the subject of AGI methods in the direction of utilizing pretrained, adaptable representations recognized for their task-agnostic advantages in varied purposes. Natural language processing (NLP) is a clear instance of this tendency since extra subtle fashions reveal adaptability by studying new duties and domains from scratch with solely fundamental directions. The success of pure language processing conjures up a comparable technique in pc imaginative and prescient. 

    One of the principle obstacles to common illustration for varied vision-related duties is the requirement for broad perceptual skill. In distinction to pure language processing (NLP), pc imaginative and prescient works with complicated visible information comparable to object location, masked contours, and properties. Mastery of varied difficult duties is required to attain common illustration in pc imaginative and prescient. Distinctiveness and extreme hurdles outline this endeavor. The lack of thorough visible annotations is a main impediment that stops us from constructing a fundamental mannequin that may seize the subtleties of spatial hierarchy and semantic granularity. A additional impediment is that there presently must be a unified pretraining framework in pc imaginative and prescient that makes use of a single community structure to combine semantic granularity and spatial hierarchy seamlessly.

    A group of Microsoft researchers introduces Florence-2, a novel imaginative and prescient basis mannequin with a unified, prompt-based illustration for a selection of pc imaginative and prescient and vision-language duties. This solves the issues of needing a constant structure and limiting complete information by creating a single, prompt-based illustration for all imaginative and prescient actions. Annotated information of top quality and broad scale is required for multitask studying. Using FLD-5B, the info engine generates a full visible dataset with a complete of 5.4B annotations for 126M photos—a important enchancment over labor-intensive guide annotation. The engine’s two processing modules are extremely environment friendly. Instead of utilizing a single particular person to annotate every picture, as was accomplished prior to now, the primary module employs specialised fashions to do it routinely and in collaboration. A extra reliable and goal image interpretation is achieved when quite a few fashions collaborate to achieve a consensus, reminiscent of the knowledge of crowds’ concepts. 

    The Florence-2 mannequin stands out for its distinctive options. It integrates a picture encoder and a multi-modality encoder-decoder into a sequence-to-sequence (seq2seq) structure, following the NLP neighborhood’s aim of creating versatile fashions with a constant framework. This structure can deal with a selection of imaginative and prescient duties with out requiring task-specific architectural alterations. The mannequin’s unified multitask studying approach with constant optimization, utilizing the identical loss operate because the purpose, is made attainable by uniformizing all annotations within the FLD-5B dataset into textual outputs. Florence-2 is a multi-purpose imaginative and prescient basis mannequin that may floor, caption, and detect objects utilizing only one mannequin and a customary set of parameters, activated by textual cues.

    Despite its compact dimension, Florence-2 stands tall within the subject, capable of compete with bigger specialised fashions. After fine-tuning utilizing publicly out there human-annotated information, Florence-2 achieves new state-of-the-art performances on the benchmarks on RefCOCO/+/g. This pre-trained mannequin outperforms supervised and self-supervised fashions on downstream duties, together with ADE20K semantic segmentation and COCO object detection and occasion segmentation. The outcomes converse for themselves, exhibiting important enhancements of 6.9, 5.5, and 5.9 factors on the COCO and ADE20K datasets utilizing Mask-RCNN, DIN, and the coaching effectivity is 4 instances higher than pre-trained fashions on ImageNet. This efficiency is a testomony to the effectiveness and reliability of Florence-2.

    Florence-2, with its pre-trained common illustration, has confirmed to be extremely efficient. The experimental outcomes reveal its prowess in enhancing a multitude of downstream duties, instilling confidence in its capabilities. 


    Check out the Paper and Model Card. All credit score for this analysis goes to the researchers of this venture. Also, don’t overlook to comply with us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you want our work, you’ll love our e-newsletter..

    Don’t Forget to affix our 45k+ ML SubReddit


    Dhanshree Shenwai is a Computer Science Engineer and has a good expertise in FinTech corporations protecting Financial, Cards & Payments and Banking area with eager curiosity in purposes of AI. She is passionate about exploring new applied sciences and developments in immediately’s evolving world making everybody’s life straightforward.

    🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    AI

    “Periodic table of machine learning” could fuel AI discovery | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Crypto

    Telegram is launching ad revenue sharing next month using toncoin

    Telegram CEO and founder Pavel Durov introduced right now that the corporate is launching its…

    Mobile

    Review: Stuffcool Ally is a 10,000mAh power bank that charges your Apple Watch

    Stuffcool is an Indian accent maker that’s doing all the appropriate issues. The model has…

    Mobile

    As a MagSafe fan, I can’t wait for Qi2 to come to Android

    Ryan Haines / Android Authority Qi2 wi-fi charging is taking its candy time to formally…

    AI

    AI helps robots manipulate objects with their whole bodies | Ztoog

    Imagine you need to carry a big, heavy field up a flight of stairs. You…

    Mobile

    Galaxy S24 might be even better at gaming then the iPhone 15 Pro!

    For a very long time, the cellular chipset race has had a transparent winner when…

    Our Picks
    Gadgets

    25 Best Early October Prime Day Deals (2023) on Headphones, Vacuums, and More

    Crypto

    Potential rejection of Ethereum spot ETFs is not a major setback, says expert

    Technology

    Apple and Google have made phones the key to your digital life. Here’s what to do if you lose it.

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Technology

    Does Technology Rule Our Sex and Dating Lives?

    Gadgets

    Bing Chat Tricked Into Solving CAPTCHAs By Exploiting An Unusual Request

    Mobile

    OPPO Find N3 could come to America, sort of

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.