Close Menu
Ztoog
    What's Hot
    Technology

    Israel-Hamas war: What we know about Israel’s ground offensive in Gaza

    Technology

    This week in AI: Big tech bets billions on machine learning tools

    Gadgets

    Bang & Olufsen Unveils Beosound A5 Spaced Aluminium Speaker

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

    • Technology

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages
    AI

    Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages

    Facebook Twitter Pinterest WhatsApp
    Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Significant developments in speech expertise have been remodeled the previous decade, permitting it to be included into varied client objects. It takes plenty of labeled knowledge, on this case, many 1000’s of hours of audio with transcriptions, to coach an excellent machine studying mannequin for such jobs. This data solely exists in some languages. For occasion, out of the 7,000+ languages in use right this moment, solely about 100 are supported by present voice recognition algorithms. 

    Recently, the quantity of labeled knowledge wanted to assemble speech techniques have been drastically lowered due to self-supervised speech representations. Despite progress, main present efforts nonetheless solely cowl round 100 languages. 

    Facebook’s Massively Multilingual Speech (MMS) undertaking combines wav2vec 2.0 with a brand new dataset that comprises labeled knowledge for over 1,100 languages and unlabeled knowledge for nearly 4,000 languages to deal with a few of these obstacles. Based on their findings, the Massively Multilingual Speech fashions are superior to the state-of-the-art strategies and assist ten instances as many languages. 

    🚀 JOIN the quickest ML Subreddit Community

    Since the best out there speech datasets solely embody as much as 100 languages, their preliminary objective was to gather audio knowledge for lots of of languages. As a outcome, they appeared to non secular writings just like the Bible, which have been translated into many languages and whose translations have been extensively examined for text-based language translation analysis. People have recorded themselves studying these translations and made the audio recordsdata out there on-line. This analysis compiled a group of New Testament readings in over 1,100 languages, yielding a median of 32 hours of information per language.

    Their investigation reveals that the proposed fashions carry out equally effectively for female and male voices, although this knowledge is from a selected area and is often learn by male audio system. Even although the recordings are spiritual, the analysis signifies that this doesn’t unduly bias the mannequin towards producing extra spiritual language. According to the researchers, it’s because they make use of a Connectionist Temporal Classification technique, which is extra restricted than giant language fashions (LLMs) or sequence-to-sequence fashions for voice recognition.

    The staff preprocessed tha knowledge by combining a extremely environment friendly compelled alignment strategy that may deal with recordings which might be 20 minutes or longer with an alignment mannequin that was skilled utilizing knowledge from over 100 totally different languages. To remove probably skewed data, they used quite a few iterations of this process plus a cross-validation filtering step primarily based on mannequin accuracy. They built-in the alignment approach into PyTorch and made the alignment mannequin publicly out there in order that different lecturers could use it to generate recent speech datasets.

    There is inadequate data to coach conventional supervised speech recognition fashions with solely 32 hours of information per language. The staff relied on wav2vec 2.0 to coach efficient techniques, drastically lowering the amount of beforehand required labeled knowledge. Specifically, they used over 1,400 distinctive languages to coach self-supervised fashions on over 500,000 hours of voice knowledge, roughly 5 instances extra languages than any earlier effort. 

    The researchers employed pre-existing benchmark datasets like FLEURS to evaluate the efficiency of fashions skilled on the Massively Multilingual Speech knowledge. Using a 1B parameter wav2vec 2.0 mannequin, they skilled a multilingual speech recognition system on over 1,100 languages. The efficiency degrades barely because the variety of languages grows: The character mistake price solely goes up by roughly 0.4% from 61 to 1,107 languages, whereas the language protection goes up by practically 18 instances.

    Comparing the Massively Multilingual Speech knowledge to OpenAI’s Whisper, the researchers found that fashions skilled on the previous obtain half the phrase error price. At the identical time, the latter covers 11 instances as many languages. This illustrates that the mannequin can compete favorably with the state-of-the-art in voice recognition.

    The staff additionally used their datasets and publicly out there datasets like FLEURS and CommonVoice to coach a language identification (LID) mannequin for greater than 4,000 languages. Then it examined it on the FLEURS LID problem. The findings present that efficiency remains to be wonderful even when 40 instances as many languages are supported. They additionally developed speech synthesis techniques for greater than 1,100 languages. The majority of present text-to-speech algorithms are skilled on single-speaker voice datasets. 

    The staff foresees a world the place one mannequin can deal with many speech duties throughout all languages. While they did prepare particular person fashions for every activity—recognition, synthesis, and identification of language—they imagine that sooner or later, a single mannequin will be capable to deal with all of those capabilities and extra, bettering efficiency in each space.


    Check out the Paper, Blog, and Github Link. Don’t neglect to hitch our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, the place we share the newest AI analysis information, cool AI initiatives, and extra. If you’ve any questions relating to the above article or if we missed something, be happy to e-mail us at Asif@marktechpost.com

    🚀 Check Out 100’s AI Tools in AI Tools Club


    Tanushree Shenwai is a consulting intern at MarktechPost. She is at present pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science fanatic and has a eager curiosity within the scope of utility of synthetic intelligence in varied fields. She is captivated with exploring the brand new developments in applied sciences and their real-life utility.


    ➡️ Ultimate Guide to Data Labeling in Machine Learning

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    Buy now, pay later platform Tabby nabs $200M in Series D funding at $1.5B valuation

    The previous 12 months and a half have witnessed a number of startups dealing with…

    AI

    Celebrating the impact of IDSS | Ztoog

    The “interdisciplinary approach” is one thing that has been lauded for many years for its…

    The Future

    Electrically charged mist could help capture carbon from power plants

    There could also be a brand new option to capture carbon emissions from industrial amenitieskamilpetran/Shutterstock…

    Gadgets

    Google’s latest feature drop includes new Gemini extensions and accessibility features

    Google stated Thursday it’s introducing a number of updates for Pixel and Android units as…

    Science

    Humans are living longer than ever no matter where they come from 

    Most of us wish to keep on this planet so long as attainable. While there…

    Our Picks
    AI

    Top Identity Verification Platforms (2023)

    Gadgets

    17 Gifts for People Who Really Need Some Sleep

    Crypto

    All Hype? BALD Meme Coin’s Volume Shaved By 96%

    Categories
    • AI (1,493)
    • Crypto (1,754)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,867)
    • Technology (1,803)
    • The Future (1,649)
    Most Popular
    AI

    Google DeepMind wins joint Nobel Prize in Chemistry for protein prediction AI

    The Future

    Fate of Japan’s ‘Lunar Sniper’ in Doubt After Precision Landing Attempt

    Mobile

    Under oath, employee reveals that Google spends billions so Android can compete with iOS

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.