Close Menu
Ztoog
    What's Hot
    Technology

    How Will AI Affect the Semiconductor Industry?

    Mobile

    Samsung Galaxy S24 series listed on 3C database with same charging speeds as its predecessors

    Technology

    Buyer of Trump’s Truth Social Gets More Time to Complete Merger

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

      Common Security Mistakes Made By Businesses and How to Avoid Them

    • Technology

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

      How To Come Back After A Layoff

    • Gadgets

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

      The market’s down, but this OpenAI for the stock market can help you trade up

    • Mobile

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

      Forget screens: more details emerge on the mysterious Jony Ive + OpenAI device

    • Science

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

      AI Is Eating Data Center Power Demand—and It’s Only Getting Worse

    • AI

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

      How AI is introducing errors into courtrooms

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » Meet MeLoDy: An Efficient Text-to-Audio Diffusion Model For Music Synthesis
    AI

    Meet MeLoDy: An Efficient Text-to-Audio Diffusion Model For Music Synthesis

    Facebook Twitter Pinterest WhatsApp
    Meet MeLoDy: An Efficient Text-to-Audio Diffusion Model For Music Synthesis
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Music is an artwork composed of concord, melody, and rhythm that permeates each side of human life. With the blossoming of deep generative fashions, music era has drawn a lot consideration lately. As a outstanding class of generative fashions, language fashions (LMs) confirmed extraordinary modeling functionality in modeling advanced relationships throughout long-term contexts. In mild of this, AudioLM and plenty of follow-up works efficiently utilized LMs to audio synthesis. Concurrent with the LM-based approaches, diffusion probabilistic fashions (DPMs), as one other aggressive class of generative fashions, have additionally demonstrated distinctive skills in synthesizing speech, sounds, and music.

    However, producing music from free-form textual content stays difficult because the permissible music descriptions will be various and relate to genres, devices, tempo, situations, and even some subjective emotions. 

    Traditional text-to-music era fashions usually deal with particular properties comparable to audio continuation or quick sampling, whereas some fashions prioritize sturdy testing, which is often carried out by specialists within the subject, comparable to music producers. Furthermore, most are educated on large-scale music datasets and demonstrated state-of-the-art generative performances with excessive constancy and adherence to varied facets of textual content prompts. 

    🔥 Unleash the ability of Live Proxies: Private, undetectable residential and cellular IPs.

    Yet, the success of those strategies, comparable to MusicLM or Noise2Music, comes with excessive computational prices, which might severely impede their practicalities. In comparability, different approaches constructed upon DPMs made environment friendly samplings of high-quality music attainable. Nevertheless, their demonstrated instances have been comparatively small and confirmed restricted in-sample dynamics. Aiming for a possible music creation device, a excessive effectivity of the generative mannequin is crucial because it facilitates interactive creation with human suggestions being taken under consideration, as in a earlier research.

    While LMs and DPMs each confirmed promising outcomes, the related query is just not whether or not one needs to be most well-liked over one other however whether or not it’s attainable to leverage some great benefits of each approaches concurrently. 

    According to the talked about motivation, an strategy termed MeLoDy has been developed. The overview of the technique is introduced within the determine beneath.

    After analyzing the success of MusicLM, the authors leverage the highest-level LM in MusicLM, termed semantic LM, to mannequin the semantic construction of music, figuring out the general association of melody, rhythm, dynamics, timbre, and tempo. Conditional on this semantic LM, they exploit the non-autoregressive nature of DPMs to mannequin the acoustics effectively and successfully with the assistance of a profitable sampling acceleration method.

    Furthermore, the authors suggest the so-called dual-path diffusion (DPD) mannequin as an alternative of adopting the basic diffusion course of. Indeed, engaged on the uncooked knowledge would exponentially enhance the computational bills. The proposed resolution is to scale back the uncooked knowledge to a low-dimensional latent illustration. Reducing the dimensionality of the information hinders its impression on the operations and, therefore, decreases the mannequin working time. Afterward, the uncooked knowledge will be reconstructed from the latent illustration by way of a pre-trained autoencoder.

    Some output samples produced by the mannequin can be found on the following hyperlink: https://efficient-melody.github.io/. The code has but to be obtainable, which signifies that, in the meanwhile, it’s not attainable to strive it out, both on-line or regionally.

    This was the abstract of MeLoDy, an environment friendly LM-guided diffusion mannequin that generates music audios of state-of-the-art high quality. If you have an interest, you may study extra about this system within the hyperlinks beneath.


    Check Out The Paper. Don’t neglect to hitch our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, the place we share the most recent AI analysis information, cool AI initiatives, and extra. If you’ve any questions relating to the above article or if we missed something, be at liberty to e-mail us at Asif@marktechpost.com


    Featured Tools From AI Tools Club

    🚀 Check Out 100’s AI Tools in AI Tools Club


    Daniele Lorenzi obtained his M.Sc. in ICT for Internet and Multimedia Engineering in 2021 from the University of Padua, Italy. He is a Ph.D. candidate on the Institute of Information Technology (ITEC) on the Alpen-Adria-Universität (AAU) Klagenfurt. He is presently working within the Christian Doppler Laboratory ATHENA and his analysis pursuits embody adaptive video streaming, immersive media, machine studying, and QoS/QoE analysis.


    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    AI

    Study shows vision-language models can’t handle queries with negation words | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Technology

    Asia emerges as a promising haven amid the crypto winter

    Walking by means of Token2049, it was exhausting to inform that the crypto business was…

    AI

    ByteDance AI Research Unveils Reinforced Fine-Tuning (ReFT) Method to Enhance the Generalizability of Learning LLMs for Reasoning with Math Problem Solving as an Example

    One efficient technique to enhance the reasoning abilities of LLMs is to make use of…

    The Future

    Luc Besson Will Take a Stab at Directing Dracula

    Image: Francois G. Durand/Getty Images (Getty Images)There’ve been a lot of variations of Dracula working…

    Crypto

    Ethereum End Of Month Challenge: Can ETH Hit $2,000?

    Ethereum (ETH) is poised for a major breakthrough because it inches nearer to the important…

    Crypto

    Bitcoin Liquidations Top $500 Million Amid $1 Billion Crypto Decimation

    Bitcoin liquidations have been ramping up over the past day following the market crash that…

    Our Picks
    Mobile

    Weekly poll: do you use Android Auto or Apple CarPlay?

    Technology

    Today’s NYT Connections Hints, Answer and Help for June 16, #371

    Crypto

    Can Upcoming ETH Futures-Based ETFs Turn The Tables?

    Categories
    • AI (1,492)
    • Crypto (1,753)
    • Gadgets (1,804)
    • Mobile (1,850)
    • Science (1,865)
    • Technology (1,801)
    • The Future (1,647)
    Most Popular
    The Future

    Exploring the Diverse World of Laser Applications

    Technology

    The WGA and AMPTP reach a tentative deal to end the strike; sources: the proposed three-year contract adds new AI rules, increases streaming residuals, and more (Los Angeles Times)

    The Future

    Google Chrome Brings Better AI Brains to the Web

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.