Close Menu
Ztoog
    What's Hot
    The Future

    The best Black Friday deals on headphones and earbuds

    Science

    Inside the small world of simulating other worlds

    Science

    Robotic chemist discovers how to make oxygen from Martian minerals

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

      Common Security Mistakes Made By Businesses and How to Avoid Them

    • Technology

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

      How To Come Back After A Layoff

    • Gadgets

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

      The market’s down, but this OpenAI for the stock market can help you trade up

    • Mobile

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

      Forget screens: more details emerge on the mysterious Jony Ive + OpenAI device

    • Science

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

      AI Is Eating Data Center Power Demand—and It’s Only Getting Worse

    • AI

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

      How AI is introducing errors into courtrooms

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » This AI Paper from CMU and Apple Unveils WRAP: A Game-Changer for Pre-training Language Models with Synthetic Data
    AI

    This AI Paper from CMU and Apple Unveils WRAP: A Game-Changer for Pre-training Language Models with Synthetic Data

    Facebook Twitter Pinterest WhatsApp
    This AI Paper from CMU and Apple Unveils WRAP: A Game-Changer for Pre-training Language Models with Synthetic Data
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large Language Models (LLMs) have gathered a large quantity of consideration and reputation among the many Artificial Intelligence (AI) neighborhood in latest months. These fashions have demonstrated nice capabilities in duties together with textual content summarization, query answering, code completion, content material technology, and many others. 

    LLMs are steadily educated on insufficient web-scraped information. Most of the time, this information is loud, unstructured, and not essentially expressed clearly. Following the present scaling rules, which point out that as the scale of the mannequin will increase, computational energy and information amount must also improve proportionately, comes as a problem.

    There are two important limitations. Firstly, there’s the numerous computational price and time concerned in pre-training. Secondly, there’s the approaching drawback of the shortage of high-quality information accessible on the Internet. In latest analysis, a workforce of researchers from Apple and Carnegie Mellon University has addressed these points by introducing the concept of Web Rephrase Augmented Pre-training (WRAP). 

    WRAP is an progressive technique that makes use of an already-existing, instruction-tuned LLM. This LLM is used to paraphrase on-line pages into explicit kinds, together with mimicking the tone of Wikipedia or changing textual content into an answer-question format. The important purpose of WRAP is to enhance LLMs’ pre-training by including each real and artificially rephrased information. 

    The major options of WRAP are as follows:

    1. Pre-training Efficiency: Applying WRAP to the noisy C4 dataset significantly quickens pre-training, round 3 times sooner. This effectiveness is crucial in lowering the excessive bills and time dedication often associated to LLM coaching.
    1. Enhancement of Model Performance: WRAP makes the mannequin carry out higher when run inside the similar computational price range. Using completely different subsets of the Pile, a large-scale dataset used for coaching and assessing LLMs reduces ambiguity by greater than 10%. It improves zero-shot question-answer accuracy by over 2% for 13 completely different actions.
    1. Rephrasing Web Documents: WRAP makes use of a medium-sized LLM to paraphrase paperwork from the online into a number of kinds. This technique is completely different from creating new information as a result of it improves already-existing content material whereas preserving the unique data’s high quality and range.

    There are two important advantages to the artificial information produced by WRAP. Firstly, it features a vary of kinds that mirror the variety of languages utilized in purposes farther down the road. With this range, the LLM is healthier ready for a greater variety of real-world occasions. Secondly, the artificial information rephrased is of a better high quality than the uncooked web-scraped information. This high quality enhancement outcomes from language that’s extra ordered and cohesive, as this promotes extra environment friendly mannequin studying.

    In conclusion, WRAP is an enormous development within the discipline of LLM pre-training. Through the usage of superior-quality, different-style artificial information, WRAP not solely expedites the coaching course of but additionally improves the general efficiency of LLMs. Given the abundance of low-quality net information and the resource-intensive nature of basic LLM coaching approaches, this method presents a potential approach ahead. 


    Check out the Paper. All credit score for this analysis goes to the researchers of this mission. Also, don’t neglect to comply with us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our publication..

    Don’t Forget to hitch our Telegram Channel


    Tanya Malhotra is a remaining yr undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
    She is a Data Science fanatic with good analytical and crucial pondering, alongside with an ardent curiosity in buying new expertise, main teams, and managing work in an organized method.


    🎯 [FREE AI WEBINAR] ‘Using ANN for Vector Search at Speed & Scale (Demo on AWS)’ (Feb 5, 2024)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    AI

    Study shows vision-language models can’t handle queries with negation words | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    Luxury On The Waves: Lexus Unveils The LY 680 Yacht

    Lexus has ventured past land-based luxurious by introducing the LY 680 yacht, marking its foray…

    Technology

    SmileDirectClub Shut Down: What We Know About Payments and Finding New Treatment

    People who sought straighter tooth with invisible aligners from SmileDirectMembership may very well be in…

    AI

    Unlocking the Secrets of Human-Machine Interaction: This AI Research from Spain Introduces a Comprehensive Dataset for Advancing Adaptive Interface Design

    The programs or applied sciences that permit interplay between people and machines are referred to…

    The Future

    Naruto’s Live-Action Movie is In the Works Again

    The success of One Piece’s live-action Netflix sequence means western studios are going to be…

    Mobile

    Xiaomi Redmi Note 13 Turbo/Poco F6 leaked specs reveal extremely fast charging

    Xiaomi is making ready to launch a (*13*) Note 13 Turbo smartphone that may promote…

    Our Picks
    Mobile

    Overplay turns your videos into mobile video games even without any coding knowledge

    AI

    Apple Announces MM1: A Family of Multimodal LLMs Up To 30B Parameters that are SoTA in Pre-Training Metrics and Perform Competitively after Fine-Tuning

    Crypto

    Bitcoin Braces For $50 Billion Influx, Bitwise CEO Says

    Categories
    • AI (1,492)
    • Crypto (1,753)
    • Gadgets (1,804)
    • Mobile (1,850)
    • Science (1,865)
    • Technology (1,801)
    • The Future (1,647)
    Most Popular
    Gadgets

    Infineon And Wolfspeed Expand Silicon Carbide Wafer Supply Agreement

    AI

    Suddenly, everyone wants to talk about how to regulate AI

    Mobile

    Samsung teases big next steps for Galaxy AI’s ‘multimodal’ Sketch to Image

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.