Close Menu
Ztoog
    What's Hot
    The Future

    Meet WebXray, a Search Engine That Tells You How You’re Being Tracked Online

    AI

    Using data to write songs for progress | Ztoog

    Technology

    Win Big Rewards Up to $10,000 USDT with Chimpzee NFT Passports – Here’s How You Can Join

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » Breaking New Grounds in AI: How Multimodal Large Language Models are Reshaping Age and Gender Estimation
    AI

    Breaking New Grounds in AI: How Multimodal Large Language Models are Reshaping Age and Gender Estimation

    Facebook Twitter Pinterest WhatsApp
    Breaking New Grounds in AI: How Multimodal Large Language Models are Reshaping Age and Gender Estimation
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    The fast growth of (MLLMs) has been noteworthy, significantly these integrating language and imaginative and prescient modalities (LVMs). Their development is attributed to excessive accuracy, generalization functionality, reasoning abilities, and sturdy efficiency, and these fashions are specialists in dealing with unexpected duties past their preliminary coaching scope. MLLMs are revolutionizing varied fields, prompting a re-evaluation of specialised fashions. Their swift evolution sparks curiosity in using them for laptop imaginative and prescient duties like object segmentation and integrating them into intricate pipelines like instruction-based picture enhancing.

    While fashions like ShareGPTV have their makes use of in duties like information annotation, their practicality in manufacturing is proscribed resulting from their excessive price. In distinction, specialised fashions like MiVOLO supply an economical answer. This paper compares the very best general-purpose MLLMs with technical fashions like MiVOLO to know their functionality to exchange them. Results point out important variations in computational prices and pace for some duties. This contains duties resembling labeling new information or filtering outdated datasets.

    The group of Researchers from SaluteDevices has introduced MiVOLOv2, a mannequin that not solely outperforms all specialised fashions like CNN, ResNet34, and GoogLeNet but additionally the primary model of MiVOLO. This second model, the state-of-the-art mannequin for gender and age willpower, makes use of superior analysis metrics resembling Mean Absolute Error (MAE) for age estimation, accuracy for gender prediction, and cumulative Score at 5 (CS@5) for age estimation. The group additionally carried out experiments to match the very best general-purpose MLLMs with specialised fashions, aiming to measure all SOTA MLLMs like LLaVA 1.5 and LLaVA-NeXT, ShareGPT4V and ChatGPT4V.

    MiVOLO makes use of face and physique crops for predictions, whereas different fashions make predictions based mostly on prompts and pictures of physique crops. It employs a transformer to estimate age and gender from these inputs. Additionally, we fine-tune an MLLM for gender and age estimation, contrasting it with a specialised mannequin. Authors discover the capabilities of multimodal ChatGPT (ChatGPT4V), evaluating its proficiency in predicting facial attributes and performing face recognition duties. With zero coaching, the mannequin outperformed a specialised age-recognition mannequin however carried out much less successfully in gender classification.

    For MiVOLOv2, the coaching dataset is prolonged by 40% from the earlier information used in MiVOLO, and it now accommodates greater than 807,694 samples: 390,730 male and 416,964 feminine. Most of the pictures have been chosen the place MiVOLOv1 made important errors. Production pipelines and some open-source information, like LAION-5B, are primarily used to realize this. Among the 2 datasets, LAGENDA is opted over IMDB. It minimizes the chance that MLLMs would supply right solutions not via age and gender estimation however due to their familiarity with well-known people, well-known films, and so forth. Despite missing floor truths, LAGENDA presents diminished threat and accelerates MiVOLOv2 to surpass all general-purpose MLLMs in age estimation. However, LLaVA-NeXT 34B leads in this space amongst open-source options, making fine-tuned specialised variations of LLaVA simpler.

    In conclusion, this paper aimed to evaluate the efficacy of MiVOLO2 in comparison with MLLMs for age and gender estimation duties. The second model of MiVOLO2 surpasses all general-purpose MLLMs in age estimation and succeeds in processing pictures of people. The outcomes inspired a complete analysis of neural networks’ potential, together with LLaVA and ShareGPT. 


    Check out the Paper. All credit score for this analysis goes to the researchers of this challenge. Also, don’t neglect to comply with us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our e-newsletter..

    Don’t Forget to affix our Telegram Channel

    You may like our FREE AI Courses….


    Sajjad Ansari is a last yr undergraduate from IIT Kharagpur. As a Tech fanatic, he delves into the sensible purposes of AI with a deal with understanding the affect of AI applied sciences and their real-world implications. He goals to articulate advanced AI ideas in a transparent and accessible method.


    🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    AI

    “Periodic table of machine learning” could fuel AI discovery | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    Google is giving Keep on Android a smart upgrade with Gemini

    What it is advisable knowThe “Help me create a checklist” characteristic in Google Keep goals…

    Crypto

    As SEC files motion to freeze Binance assets, crypto market remains green

    A day after the SEC filed 13 costs towards Binance and CEO Changpeng Zhao, in…

    Science

    Ancient shark called Kentucky home

    A bunch of paleontologists, park rangers, and geologists have found a brand new species of…

    Mobile

    Xiaomi Pad 6 Max 14 and Band 8 Pro also launching on August 14

    (*6*) Xiaomi is holding an enormous launch occasion on August 14 the place we’ll see…

    The Future

    Spatial Computing: Crafting New Realities in Work, Play, and Beyond

    Spatial computing shouldn’t be merely a technological evolution; it’s a paradigm shift heralding a brand…

    Our Picks
    Crypto

    A Leap into the Martian Metaverse – Official Early Access Date Announced – cryptocurrencynews.com

    Technology

    Google’s answer to AirDrop for Android and Windows exits beta

    Technology

    Steam Families opens up game libraries for sharing, with a few caveats

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Crypto

    Analyst Predicts $70,000 Target Soon

    Mobile

    Best Bluetooth headsets 2024 | Android Central

    Science

    The Atlantification of the Arctic Ocean Is Underway

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.