Close Menu
Ztoog
    What's Hot
    Science

    Planetary alignment: How to see five planets line up in the sky this weekend

    Science

    When two stars orbit each other, gravity gets weird

    Technology

    Snapchat could let you send ‘Tiny Snaps’ while typing (APK teardown)

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      OPPO launches A5 Pro 5G: Premium features at a budget price

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

    • Technology

      What It Is and Why It Matters—Part 1 – O’Reilly

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Nothing is stronger than quantum connections – and now we know why

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

    • AI

      Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

    • Crypto

      Ethereum Breaks Key Resistance In One Massive Move – Higher High Confirms Momentum

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

    Ztoog
    Home » This Paper from NYU and Google Explains How Joint Speech-Text Encoders Overcome Sequence-Length Mismatch in Cross-Modal Representations
    AI

    This Paper from NYU and Google Explains How Joint Speech-Text Encoders Overcome Sequence-Length Mismatch in Cross-Modal Representations

    Facebook Twitter Pinterest WhatsApp
    This Paper from NYU and Google Explains How Joint Speech-Text Encoders Overcome Sequence-Length Mismatch in Cross-Modal Representations
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    It is changing into more and more obvious that very massive fashions skilled on large unsupervised corpora in a single modality can obtain outstanding outcomes. This has been proved each in the audio area, the place a single mannequin has been proven to adapt to a shock big range of acoustic duties and in the textual content area, the place language fashions have attained distinctive zero-shot capabilities. Similar achievements have prompted the inquiry into the right way to make use of comparable methods for conditions combining two modalities, which have historically relied on manually paired information.

    One fascinating method is to coach an enormous encoder on each modalities in order that both one will be offered as an unpaired instance and the encoder will study to map the 2 to comparable locations in illustration house. Achievable and able to state-of-the-art efficiency on quite a few image and textual content comprehension duties utilizing a single mannequin, such a illustration has been demonstrated to be possible in the picture/text-domain.

    New analysis by the New York University and Google investigates whether or not the efficiency positive aspects discovered with the express alignments could also be achieved by making use of consistency regularization to the implicit alignments discovered that in upsampling methods. They obtain this by creating a way, motivated by dynamic time warping, that optimally aligns the encoder’s illustration of a speech and textual content instance. In the absence of an express alignment mannequin, the crew exhibit that the optimum alignment is not only acquired throughout coaching but in addition improves as one progresses via the community’s layers. 

    To facilitate pretraining on unpaired voice and textual content information, there was a current development towards fashions with a joint speech and textual content encoder in the sector of speech recognition. The lengthier sequence used to signify speech gives a novel issue for speech recognition because it includes two sequence modalities. Because of this, evaluating an encoder’s speech illustration to its textual content illustration frame-by-frame turns into a tougher course of, though each modalities are represented in the identical embedding house.

    Finally, the work demonstrates that, in a monolingual and multilingual setting, important WER enhancements will be achieved towards robust, semi-supervised baselines with none discovered alignment mannequin by modifying the standards of the consistency regularization to encourage consistency beneath some alignment moderately than a direct frame-wise comparability. Based on their findings, it seems that tolerating misalignment is all that’s wanted to implement consistency in cross-modal representations.


    Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t overlook to hitch our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI tasks, and extra.


    Dhanshree Shenwai is a Computer Science Engineer and has a superb expertise in FinTech corporations masking Financial, Cards & Payments and Banking area with eager curiosity in functions of AI. She is smitten by exploring new applied sciences and developments in at present’s evolving world making everybody’s life straightforward.


    🔥 Use SQL to foretell the long run (Sponsored)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    Best Samsung Galaxy Z Fold 5 cases

    Samsung did not really feel the necessity to repair one thing that is not damaged…

    Crypto

    Nine crypto VCs on why Q1 investments were so hot and how it compares to previous bull market

    If the 2023 crypto enterprise panorama was an ice-cold pot of water, the primary quarter…

    The Future

    What To Do When Your Apple Pencil Not Working?

    The first Apple Pencil launched with the unique iPad Pro in 2015, however now there…

    Mobile

    With the Pixel 8, Google just won the AI war

    When you see the phrases Artificial Intelligence, you in all probability consider some cryptic laptop…

    The Future

    An AI-generated ‘South Park’ episode, Microsoft’s security woes, and Tesla’s first Cybertruck build

    Hey, people, welcome to Week in Review (WiR), Ztoog’s common roundup of the previous week…

    Our Picks
    Crypto

    What is Cardano?

    Mobile

    Google Messages could soon feature end-to-end encryption for cross-platform messaging

    Technology

    Compare Electricity Rates in New York

    Categories
    • AI (1,483)
    • Crypto (1,745)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,854)
    • Technology (1,790)
    • The Future (1,636)
    Most Popular
    Technology

    FTC: Most smart device makers are breaking the law by not informing consumers of software support terms

    Technology

    iMessage gets a major makeover that puts it on equal footing with Signal

    The Future

    Sam Altman gives up control of OpenAI Startup Fund, resolving unusual corporate venture structure

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.