Close Menu
Ztoog
    What's Hot
    Mobile

    Bose announces Ultra Open Earbuds

    Technology

    Holiday Wish List: Current Meta Quest 3 Deals including Free Game and 6 Months of Meta Quest Plus

    AI

    Orthogonal Paths: Simplifying Jailbreaks in Language Models

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Can work-life balance tracking improve well-being?

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

    • Technology

      Elon Musk tries to stick to spaceships

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Deals: the Galaxy S25 series comes with a free tablet, Google Pixels heavily discounted

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

    • Science

      June skygazing: A strawberry moon, the summer solstice… and Asteroid Day!

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

    • AI

      Fueling seamless AI at scale

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    • Crypto

      Bitcoin Maxi Isn’t Buying Hype Around New Crypto Holding Firms

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

    Ztoog
    Home » This Paper from NYU and Google Explains How Joint Speech-Text Encoders Overcome Sequence-Length Mismatch in Cross-Modal Representations
    AI

    This Paper from NYU and Google Explains How Joint Speech-Text Encoders Overcome Sequence-Length Mismatch in Cross-Modal Representations

    Facebook Twitter Pinterest WhatsApp
    This Paper from NYU and Google Explains How Joint Speech-Text Encoders Overcome Sequence-Length Mismatch in Cross-Modal Representations
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    It is changing into more and more obvious that very massive fashions skilled on large unsupervised corpora in a single modality can obtain outstanding outcomes. This has been proved each in the audio area, the place a single mannequin has been proven to adapt to a shock big range of acoustic duties and in the textual content area, the place language fashions have attained distinctive zero-shot capabilities. Similar achievements have prompted the inquiry into the right way to make use of comparable methods for conditions combining two modalities, which have historically relied on manually paired information.

    One fascinating method is to coach an enormous encoder on each modalities in order that both one will be offered as an unpaired instance and the encoder will study to map the 2 to comparable locations in illustration house. Achievable and able to state-of-the-art efficiency on quite a few image and textual content comprehension duties utilizing a single mannequin, such a illustration has been demonstrated to be possible in the picture/text-domain.

    New analysis by the New York University and Google investigates whether or not the efficiency positive aspects discovered with the express alignments could also be achieved by making use of consistency regularization to the implicit alignments discovered that in upsampling methods. They obtain this by creating a way, motivated by dynamic time warping, that optimally aligns the encoder’s illustration of a speech and textual content instance. In the absence of an express alignment mannequin, the crew exhibit that the optimum alignment is not only acquired throughout coaching but in addition improves as one progresses via the community’s layers. 

    To facilitate pretraining on unpaired voice and textual content information, there was a current development towards fashions with a joint speech and textual content encoder in the sector of speech recognition. The lengthier sequence used to signify speech gives a novel issue for speech recognition because it includes two sequence modalities. Because of this, evaluating an encoder’s speech illustration to its textual content illustration frame-by-frame turns into a tougher course of, though each modalities are represented in the identical embedding house.

    Finally, the work demonstrates that, in a monolingual and multilingual setting, important WER enhancements will be achieved towards robust, semi-supervised baselines with none discovered alignment mannequin by modifying the standards of the consistency regularization to encourage consistency beneath some alignment moderately than a direct frame-wise comparability. Based on their findings, it seems that tolerating misalignment is all that’s wanted to implement consistency in cross-modal representations.


    Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t overlook to hitch our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI tasks, and extra.


    Dhanshree Shenwai is a Computer Science Engineer and has a superb expertise in FinTech corporations masking Financial, Cards & Payments and Banking area with eager curiosity in functions of AI. She is smitten by exploring new applied sciences and developments in at present’s evolving world making everybody’s life straightforward.


    🔥 Use SQL to foretell the long run (Sponsored)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Fueling seamless AI at scale

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    Microsoft delays Recall again, won’t debut it with new Copilot+ PCs after all

    Enlarge / Recall is a part of Microsoft’s Copilot+ PC program.Microsoft Microsoft will likely be…

    Science

    5 aerospace breakthroughs of 2024

    The previous yr in aerospace was so full of thrilling developments that we had a…

    Gadgets

    13 Great Deals on Headphones, Wireless Earbuds, and Gaming Headsets

    Arguably, the perfect time to get new audio gear is … properly, Black Friday and…

    Science

    A NASA astronaut will finally spend a full year in space

    Enlarge / NASA astronaut Frank Rubio observes the conduct of a free-flying water bubble contained…

    Mobile

    Samsung Galaxy S24 Ultra review

    Every yr, it looks like telephone bulletins — notably within the U.S. — get much…

    Our Picks
    Gadgets

    Neoplants Neo Px Review: This Plant Isn’t as Good as an Air Purifier

    Technology

    DARPA and NASA Aim to Test Nuclear Rocket by 2026

    Gadgets

    17 Gifts for People Who Really Need Some Sleep

    Categories
    • AI (1,494)
    • Crypto (1,754)
    • Gadgets (1,805)
    • Mobile (1,851)
    • Science (1,867)
    • Technology (1,803)
    • The Future (1,649)
    Most Popular
    Technology

    Elon Musk Has a Giant Charity. Its Money Stays Close to Home.

    AI

    What to expect from the coming year in AI

    Science

    How asteroids can help us understand our place in the cosmos

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.