Close Menu
Ztoog
    What's Hot
    Technology

    Today’s Wordle Hints and Answer: Help for April 28, #1044

    Technology

    Brave appears to be selling copyrighted data for AI training and giving third parties the "rights" to that data, while not disclosing its own robot crawler (Alex Ivanovs/Stack Diary)

    AI

    ByteDance AI Research Unveils Reinforced Fine-Tuning (ReFT) Method to Enhance the Generalizability of Learning LLMs for Reasoning with Math Problem Solving as an Example

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » Exploring New Frontiers in AI: Google DeepMind’s Research on Advancing Machine Learning with ReSTEM Self-Training Beyond Human-Generated Data
    AI

    Exploring New Frontiers in AI: Google DeepMind’s Research on Advancing Machine Learning with ReSTEM Self-Training Beyond Human-Generated Data

    Facebook Twitter Pinterest WhatsApp
    Exploring New Frontiers in AI: Google DeepMind’s Research on Advancing Machine Learning with ReSTEM Self-Training Beyond Human-Generated Data
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large Language Models (LLMs) are reworking deep studying by demonstrating astounding powers to supply textual content of human caliber and carry out a variety of language duties. Getting high-quality human knowledge is a significant barrier, even whereas supervised fine-tuning (SFT) utilizing human-collected knowledge additional improves their efficiency on duties of curiosity. This is very taxing on intricate problem-solving assignments requiring substantial sources and specialised data. To overcome this impediment, model-generated artificial knowledge reveals promise as a scalable and inexpensive answer if its high quality might be assured. 

    Researchers from Google Deepmind and Mila in this research examine a extra easy state of affairs in which an exterior scalar suggestions sign features as a top quality indicator for every generated pattern, even when LLMs can self-evaluate created knowledge. The analysis group proposes an easy but efficient self-training approach for language fashions, which includes solely two expertise: 1) creating samples from the mannequin and a couple of) assessing these samples utilizing a scoring mechanism. This method permits us to check coaching on knowledge created by the mannequin. The analysis group makes use of the nomenclature of Reinforced Self-Training and refers to this system as ReST to attain uniformity and readability. The analysis group demonstrates how ReST could also be considered utilizing expectation maximization for reinforcement studying. 

    In specific, ReST switches between the phases for expectation and maximization in the next method: 1. Generate (E-step): For each enter context, the language mannequin produces a number of output samples. After that, the analysis group gathers the coaching dataset by filtering these samples utilizing a binary reward. 2. Improve (M-step): The authentic language mannequin is supervised and fine-tuned utilizing the coaching dataset from the previous Generate section. The subsequent Generate section then makes use of the adjusted mannequin. ReST and its variants have demonstrated efficacy in enhancing language fashions in many fields, equivalent to machine translation, semantic parsing, and choice alignment.

    ReST was principally employed in earlier research on very small language fashions (as much as 7B parameters), with restricted scalability for larger fashions. Their work intends to enhance these efforts by evaluating the scalability and effectiveness of artificial knowledge created by fashions to human-provided knowledge in two difficult however understudied domains: code era (APPS) and competition-level mathematical problem-solving (MATH). Their findings exhibit that making use of ReST to PaLM 2 fashions at numerous sizes considerably improves mathematical reasoning and code era expertise.

    Surprisingly, fashions refined on synthetic knowledge produced by the mannequin outperform these skilled on knowledge provided by people by a big margin. Furthermore, the development diminishes after a couple of cycles of ReST, indicating the potential for overfitting on a restricted variety of coaching instances. Moreover, fashions optimized utilizing ReST improve move@okay and majority voting capabilities. Lastly, these refined fashions exhibit enhanced efficiency on related however distinct benchmarks, together with Big-Bench Hard duties, coding (HumanEval), and arithmetic issues (GSM8K and Hungarian HS finals). Lastly, ablation research are carried out to analyze the consequences of coaching issues, iterations, and the quantity of model-generated options on ReST fine-tuning.


    Check out the Paper. All credit score for this analysis goes to the researchers of this mission. Also, don’t neglect to hitch our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, the place we share the newest AI analysis information, cool AI initiatives, and extra.

    If you want our work, you’ll love our e-newsletter..


    Aneesh Tickoo is a consulting intern at MarktechPost. He is at the moment pursuing his undergraduate diploma in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on initiatives geared toward harnessing the ability of machine studying. His analysis curiosity is picture processing and is enthusiastic about constructing options round it. He loves to attach with folks and collaborate on fascinating initiatives.


    🐝 [Free Webinar] Alexa, Upgrade my App: Integrating Voice AI into Your Strategy (Dec 15 2023)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    Logitech’s original G Pro X Superlight is on sale for nearly 40 percent off

    These days, there are ample choices to contemplate when selecting a gaming mouse — we…

    The Future

    A Great Speakerphone, but Do You Need It?

    Rating: 8/10 ? 1 – Does not work 2 – Barely purposeful 3 – Severely…

    Gadgets

    The best e-bike conversion kits in 2024

    We might earn income from the merchandise accessible on this web page and take part…

    Science

    Daily Telescope: A brilliant shot of a comet as it nears the Sun

    Enlarge / Comet 12P/Pons-Brooks and the nice Andromeda Galaxy. Welcome to the Daily Telescope. There…

    Mobile

    Google just let the Pixel 8a cat out of the bag

    What it’s essential to knowGoogle casually confirms the Pixel 8a’s existence, hinting at the return…

    Our Picks
    The Future

    QR codes can be phishing scams in disguise, warns the FTC

    Mobile

    10 best emulators for Chromebook

    AI

    Researchers from UCL and Google Propose AudioSlots: A Slot-Centric Generative Model For Audio Domain Blind Source Separation

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    Technology

    Prosecutors Turn Sam Bankman-Fried’s Own Words Against Him

    Crypto

    Ethereum Breaches $2,200, Investors Expect $3,000 This Week

    AI

    What’s next for generative video

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.