Close Menu
Ztoog
    What's Hot
    The Future

    Best Black Friday Robot Vacuum Deals: Score Early Savings on Roomba, Shark and More

    Science

    Warming oceans could thaw trapped ‘fire-ice’

    AI

    Evolving tables in the reasoning chain for table understanding – Google Research Blog

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How to Get Bot Lobbies in Fortnite? (2025 Guide)

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

    • Technology

      What does a millennial midlife crisis look like?

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

    • Gadgets

      Watch Apple’s WWDC 2025 keynote right here

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

    • Mobile

      YouTube is testing a leaderboard to show off top live stream fans

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

    • Science

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

    • AI

      Fueling seamless AI at scale

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages
    AI

    Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages

    Facebook Twitter Pinterest WhatsApp
    Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Significant developments in speech expertise have been remodeled the previous decade, permitting it to be included into varied client objects. It takes plenty of labeled knowledge, on this case, many 1000’s of hours of audio with transcriptions, to coach an excellent machine studying mannequin for such jobs. This data solely exists in some languages. For occasion, out of the 7,000+ languages in use right this moment, solely about 100 are supported by present voice recognition algorithms. 

    Recently, the quantity of labeled knowledge wanted to assemble speech techniques have been drastically lowered due to self-supervised speech representations. Despite progress, main present efforts nonetheless solely cowl round 100 languages. 

    Facebook’s Massively Multilingual Speech (MMS) undertaking combines wav2vec 2.0 with a brand new dataset that comprises labeled knowledge for over 1,100 languages and unlabeled knowledge for nearly 4,000 languages to deal with a few of these obstacles. Based on their findings, the Massively Multilingual Speech fashions are superior to the state-of-the-art strategies and assist ten instances as many languages. 

    🚀 JOIN the quickest ML Subreddit Community

    Since the best out there speech datasets solely embody as much as 100 languages, their preliminary objective was to gather audio knowledge for lots of of languages. As a outcome, they appeared to non secular writings just like the Bible, which have been translated into many languages and whose translations have been extensively examined for text-based language translation analysis. People have recorded themselves studying these translations and made the audio recordsdata out there on-line. This analysis compiled a group of New Testament readings in over 1,100 languages, yielding a median of 32 hours of information per language.

    Their investigation reveals that the proposed fashions carry out equally effectively for female and male voices, although this knowledge is from a selected area and is often learn by male audio system. Even although the recordings are spiritual, the analysis signifies that this doesn’t unduly bias the mannequin towards producing extra spiritual language. According to the researchers, it’s because they make use of a Connectionist Temporal Classification technique, which is extra restricted than giant language fashions (LLMs) or sequence-to-sequence fashions for voice recognition.

    The staff preprocessed tha knowledge by combining a extremely environment friendly compelled alignment strategy that may deal with recordings which might be 20 minutes or longer with an alignment mannequin that was skilled utilizing knowledge from over 100 totally different languages. To remove probably skewed data, they used quite a few iterations of this process plus a cross-validation filtering step primarily based on mannequin accuracy. They built-in the alignment approach into PyTorch and made the alignment mannequin publicly out there in order that different lecturers could use it to generate recent speech datasets.

    There is inadequate data to coach conventional supervised speech recognition fashions with solely 32 hours of information per language. The staff relied on wav2vec 2.0 to coach efficient techniques, drastically lowering the amount of beforehand required labeled knowledge. Specifically, they used over 1,400 distinctive languages to coach self-supervised fashions on over 500,000 hours of voice knowledge, roughly 5 instances extra languages than any earlier effort. 

    The researchers employed pre-existing benchmark datasets like FLEURS to evaluate the efficiency of fashions skilled on the Massively Multilingual Speech knowledge. Using a 1B parameter wav2vec 2.0 mannequin, they skilled a multilingual speech recognition system on over 1,100 languages. The efficiency degrades barely because the variety of languages grows: The character mistake price solely goes up by roughly 0.4% from 61 to 1,107 languages, whereas the language protection goes up by practically 18 instances.

    Comparing the Massively Multilingual Speech knowledge to OpenAI’s Whisper, the researchers found that fashions skilled on the previous obtain half the phrase error price. At the identical time, the latter covers 11 instances as many languages. This illustrates that the mannequin can compete favorably with the state-of-the-art in voice recognition.

    The staff additionally used their datasets and publicly out there datasets like FLEURS and CommonVoice to coach a language identification (LID) mannequin for greater than 4,000 languages. Then it examined it on the FLEURS LID problem. The findings present that efficiency remains to be wonderful even when 40 instances as many languages are supported. They additionally developed speech synthesis techniques for greater than 1,100 languages. The majority of present text-to-speech algorithms are skilled on single-speaker voice datasets. 

    The staff foresees a world the place one mannequin can deal with many speech duties throughout all languages. While they did prepare particular person fashions for every activity—recognition, synthesis, and identification of language—they imagine that sooner or later, a single mannequin will be capable to deal with all of those capabilities and extra, bettering efficiency in each space.


    Check out the Paper, Blog, and Github Link. Don’t neglect to hitch our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, the place we share the newest AI analysis information, cool AI initiatives, and extra. If you’ve any questions relating to the above article or if we missed something, be happy to e-mail us at Asif@marktechpost.com

    🚀 Check Out 100’s AI Tools in AI Tools Club


    Tanushree Shenwai is a consulting intern at MarktechPost. She is at present pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science fanatic and has a eager curiosity within the scope of utility of synthetic intelligence in varied fields. She is captivated with exploring the brand new developments in applied sciences and their real-life utility.


    ➡️ Ultimate Guide to Data Labeling in Machine Learning

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Fueling seamless AI at scale

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Science

    A scientific mission to save the sharks

    This article was initially featured on Knowable Magazine. A hammerhead shark lower than one meter lengthy…

    Mobile

    Samsung Foundry’s misleading process name change puts it ahead of TSMC at 2nm

    A shocking report out of Korea states that chips Samsung Foundry says will probably be…

    Technology

    Q&A with Uber CEO Dara Khosrowshahi about the Indian market, competition, achieving profitability, regrets about selling UberEats in India, regulations, more (The Economic Times)

    The Economic Times: Q&A with Uber CEO Dara Khosrowshahi about the Indian market, competitors, achieving…

    AI

    Artificial Intelligence (AI) Researchers from Cornell University Propose a Novel Neural Network Framework to Address the Video Matting Problem

    Image and video modifying are two of the hottest purposes for laptop customers. With the…

    Mobile

    Android 15 DP2 delivers built-in app archiving for more storage freedom

    TL;DR Built-in app archiving is reside within the Android 15 Developer (*15*) 2. App archiving…

    Our Picks
    The Future

    Pixar Staff Among Disney’s Recent Company Layoffs

    Science

    LHC breaks the record for heaviest antimatter nucleus ever seen

    The Future

    Samsung unveils Galaxy Tab S10 Ultra and S10+ with AI features

    Categories
    • AI (1,494)
    • Crypto (1,754)
    • Gadgets (1,806)
    • Mobile (1,852)
    • Science (1,867)
    • Technology (1,804)
    • The Future (1,650)
    Most Popular
    Crypto

    Will Ethereum Flip Bitcoin? Crypto Analyst Explains How

    Technology

    Intel has a new plan to curb greenhouse gas emissions during chip manufacturing

    AI

    Three Spanish MIT physics postdocs receive Botton Foundation fellowships | Ztoog

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.