Close Menu
Ztoog
    What's Hot
    Science

    Strange water wave can bounce a droplet thousands of times

    Mobile

    X (formerly Twitter) will support video calling soon

    Gadgets

    Pebble’s founder wants to relaunch the e-paper smartwatch for its fans

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

    • Technology

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

      Forget screens: more details emerge on the mysterious Jony Ive + OpenAI device

    • Science

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks
    AI

    Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks

    Facebook Twitter Pinterest WhatsApp
    Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    There has been a marked motion within the subject of AGI methods in the direction of utilizing pretrained, adaptable representations recognized for their task-agnostic advantages in varied purposes. Natural language processing (NLP) is a clear instance of this tendency since extra subtle fashions reveal adaptability by studying new duties and domains from scratch with solely fundamental directions. The success of pure language processing conjures up a comparable technique in pc imaginative and prescient. 

    One of the principle obstacles to common illustration for varied vision-related duties is the requirement for broad perceptual skill. In distinction to pure language processing (NLP), pc imaginative and prescient works with complicated visible information comparable to object location, masked contours, and properties. Mastery of varied difficult duties is required to attain common illustration in pc imaginative and prescient. Distinctiveness and extreme hurdles outline this endeavor. The lack of thorough visible annotations is a main impediment that stops us from constructing a fundamental mannequin that may seize the subtleties of spatial hierarchy and semantic granularity. A additional impediment is that there presently must be a unified pretraining framework in pc imaginative and prescient that makes use of a single community structure to combine semantic granularity and spatial hierarchy seamlessly.

    A group of Microsoft researchers introduces Florence-2, a novel imaginative and prescient basis mannequin with a unified, prompt-based illustration for a selection of pc imaginative and prescient and vision-language duties. This solves the issues of needing a constant structure and limiting complete information by creating a single, prompt-based illustration for all imaginative and prescient actions. Annotated information of top quality and broad scale is required for multitask studying. Using FLD-5B, the info engine generates a full visible dataset with a complete of 5.4B annotations for 126M photos—a important enchancment over labor-intensive guide annotation. The engine’s two processing modules are extremely environment friendly. Instead of utilizing a single particular person to annotate every picture, as was accomplished prior to now, the primary module employs specialised fashions to do it routinely and in collaboration. A extra reliable and goal image interpretation is achieved when quite a few fashions collaborate to achieve a consensus, reminiscent of the knowledge of crowds’ concepts. 

    The Florence-2 mannequin stands out for its distinctive options. It integrates a picture encoder and a multi-modality encoder-decoder into a sequence-to-sequence (seq2seq) structure, following the NLP neighborhood’s aim of creating versatile fashions with a constant framework. This structure can deal with a selection of imaginative and prescient duties with out requiring task-specific architectural alterations. The mannequin’s unified multitask studying approach with constant optimization, utilizing the identical loss operate because the purpose, is made attainable by uniformizing all annotations within the FLD-5B dataset into textual outputs. Florence-2 is a multi-purpose imaginative and prescient basis mannequin that may floor, caption, and detect objects utilizing only one mannequin and a customary set of parameters, activated by textual cues.

    Despite its compact dimension, Florence-2 stands tall within the subject, capable of compete with bigger specialised fashions. After fine-tuning utilizing publicly out there human-annotated information, Florence-2 achieves new state-of-the-art performances on the benchmarks on RefCOCO/+/g. This pre-trained mannequin outperforms supervised and self-supervised fashions on downstream duties, together with ADE20K semantic segmentation and COCO object detection and occasion segmentation. The outcomes converse for themselves, exhibiting important enhancements of 6.9, 5.5, and 5.9 factors on the COCO and ADE20K datasets utilizing Mask-RCNN, DIN, and the coaching effectivity is 4 instances higher than pre-trained fashions on ImageNet. This efficiency is a testomony to the effectiveness and reliability of Florence-2.

    Florence-2, with its pre-trained common illustration, has confirmed to be extremely efficient. The experimental outcomes reveal its prowess in enhancing a multitude of downstream duties, instilling confidence in its capabilities. 


    Check out the Paper and Model Card. All credit score for this analysis goes to the researchers of this venture. Also, don’t overlook to comply with us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you want our work, you’ll love our e-newsletter..

    Don’t Forget to affix our 45k+ ML SubReddit


    Dhanshree Shenwai is a Computer Science Engineer and has a good expertise in FinTech corporations protecting Financial, Cards & Payments and Banking area with eager curiosity in purposes of AI. She is passionate about exploring new applied sciences and developments in immediately’s evolving world making everybody’s life straightforward.

    🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    OpenAI’s latest blunder shows the challenges facing Chinese AI models

    In reality, amongst the few lengthy Chinese tokens in GPT-4o that aren’t both pornography or…

    Gadgets

    Waymo and Uber Eats start human-less food deliveries in Phoenix

    Enlarge / A Waymo Jaguar I-Pace.Waymo Your subsequent food supply driver could also be a…

    AI

    Deci AI Introduces DeciLM-7B: A Super Fast and Super Accurate 7 Billion-Parameter Large Language Model (LLM)

    In the ever-evolving area of technological developments, language fashions have develop into indispensable. These techniques,…

    Crypto

    Analyst Backs Bitcoin To Reach $34,500 In 2024 In New Prediction

    Bitcoin (BTC) has skilled little value motion this week and is up by solely 0.25%…

    Gadgets

    The best cheap projectors in 2024

    We could earn income from the merchandise obtainable on this web page and take part…

    Our Picks
    Gadgets

    Startup Synergy: IFEZ’s Role in Fostering Innovation and Economic Growth

    AI

    Nomic AI Introduces Nomic Embed: Text Embedding Model with an 8192 Context-Length that Outperforms OpenAI Ada-002 and Text-Embedding-3-Small on both Short and Long Context Tasks

    Science

    Infinity has long baffled mathematicians – have we now figured it out?

    Categories
    • AI (1,493)
    • Crypto (1,753)
    • Gadgets (1,805)
    • Mobile (1,850)
    • Science (1,866)
    • Technology (1,802)
    • The Future (1,648)
    Most Popular
    Technology

    KeeperFX keeps Dungeon Keeper alive by making it actually playable

    Technology

    OpenAI challenges NYT lawsuit, asserts fair use in AI model training

    The Future

    Netflix ends a three-year legal dispute over Squid Game traffic

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.