Close Menu
Ztoog
    What's Hot
    Technology

    Trump-Biden debate shows how a democracy dies

    Science

    Room-temperature superconductors: Here’s everything you need to know

    Technology

    Ask Sophie: How does my immigration status affect export control licensing to build in space tech?

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      OPPO launches A5 Pro 5G: Premium features at a budget price

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

    • Technology

      What It Is and Why It Matters—Part 1 – O’Reilly

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Motorola’s Moto Watch needs to start living up to the brand name

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

    • Science

      Nothing is stronger than quantum connections – and now we know why

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

    • AI

      Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

    • Crypto

      Ethereum Breaks Key Resistance In One Massive Move – Higher High Confirms Momentum

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

    Ztoog
    Home » Researchers From Stanford And DeepMind Come Up With The Idea of Using Large Language Models LLMs as a Proxy Reward Function
    AI

    Researchers From Stanford And DeepMind Come Up With The Idea of Using Large Language Models LLMs as a Proxy Reward Function

    Facebook Twitter Pinterest WhatsApp
    Researchers From Stanford And DeepMind Come Up With The Idea of Using Large Language Models LLMs as a Proxy Reward Function
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    With the event of computing and knowledge, autonomous brokers are gaining energy. The want for people to have some say over the insurance policies realized by brokers and to verify that they align with their targets turns into all of the extra obvious in gentle of this.

    Currently, customers both 1) create reward capabilities for desired actions or 2) present intensive labeled knowledge. Both methods current difficulties and are unlikely to be carried out in follow. Agents are weak to reward hacking, making it difficult to design reward capabilities that strike a stability between competing targets. Yet, a reward perform might be realized from annotated examples. However, monumental quantities of labeled knowledge are wanted to seize the subtleties of particular person customers’ tastes and targets, which has confirmed costly. Furthermore, reward capabilities should be redesigned, or the dataset ought to be re-collected for a new consumer inhabitants with completely different targets.

    New analysis by Stanford University and DeepMind goals to design a system that makes it less complicated for customers to share their preferences, with an interface that’s extra pure than writing a reward perform and a cost-effective method to outline these preferences utilizing solely a few situations. Their work makes use of giant language fashions (LLMs) which were skilled on large quantities of textual content knowledge from the web and have confirmed adept at studying in context with no or only a few coaching examples. According to the researchers, LLMs are glorious contextual learners as a result of they’ve been skilled on a giant sufficient dataset to include necessary commonsense priors about human conduct.

    🚀 Build high-quality coaching datasets with Kili Technology and resolve NLP machine studying challenges to develop highly effective ML purposes

    The researchers examine the right way to make use of a prompted LLM as a stand-in reward perform for coaching RL brokers utilizing knowledge supplied by the tip consumer. Using a conversational interface, the proposed technique has the consumer outline a objective. When defining an goal, one would possibly use a few situations like “versatility” or one sentence if the subject is widespread data. They outline a reward perform utilizing the immediate and LLM to coach an RL agent. An RL episode’s trajectory and the consumer’s immediate are fed into the LLM, and the rating (e.g., “No” or “0”) for whether or not the trajectory satisfies the consumer’s goal is output as an integer reward for the RL agent. One profit of utilizing LLMs as a proxy reward perform is that customers can specify their preferences intuitively by way of language relatively than having to supply dozens of examples of fascinating behaviors.

    Users report that the proposed agent is way more in keeping with their objective than an agent skilled with a completely different objective. By using its prior data of widespread targets, the LLM will increase the proportion of objective-aligned reward indicators generated in response to zero-shot prompting by a mean of 48% for a common ordering of matrix recreation outcomes and by 36% for a scrambled order. In the Ultimatum Game, the DEALORNODEAL negotiation job, and the MatrixGames, the staff solely use a number of prompts to information gamers by way of the method. Ten precise individuals have been used within the pilot research. 

    An LLM can acknowledge widespread targets and ship reinforcement indicators that align with these targets, even in a one-shot scenario. So, RL brokers aligned with their targets might be skilled utilizing LLMs that solely detect one of two appropriate outcomes. The ensuing RL brokers usually tend to be correct than these skilled utilizing labels as a result of they only must study a single proper final result.


    Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t neglect to hitch our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, the place we share the newest AI analysis information, cool AI tasks, and extra.


    Tanushree Shenwai is a consulting intern at MarktechPost. She is at present pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science fanatic and has a eager curiosity within the scope of software of synthetic intelligence in numerous fields. She is captivated with exploring the brand new developments in applied sciences and their real-life software.


    🔥 Gain a aggressive
    edge with knowledge: Actionable market intelligence for world manufacturers, retailers, analysts, and traders. (Sponsored)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    Bambu Lab release highly-anticipated new A1 3D printer

    Popular 3D printer producer Bambu Lab has introduced its model new A1 printer, a bigger…

    Mobile

    Google will alert you when your personal data appears online and will remove it from Google Search

    How would you wish to obtain an alert warning you each time that your personal…

    AI

    Scaling multimodal understanding to long videos – Google Research Blog

    Posted by Isaac Noble, Software Engineer, Google Research, and Anelia Angelova, Research Scientist, Google DeepMind

    AI

    Meet MambaFormer: The Fusion of Mamba and Attention Blocks in a Hybrid AI Model for Enhanced Performance

    One of essentially the most thrilling developments in this area is the investigation of state-space…

    Science

    How indefinite causality could lead us to a theory of quantum gravity

    Dropping the cup leads to it smashing – or does it?Sunny/Getty Images You drop a cup…

    Our Picks
    Technology

    IEEE President’s Note: Connecting the Unconnected

    Science

    States Are Lining Up to Outlaw Lab-Grown Meat

    The Future

    Life in BASIC | Ztoog

    Categories
    • AI (1,483)
    • Crypto (1,745)
    • Gadgets (1,796)
    • Mobile (1,840)
    • Science (1,854)
    • Technology (1,790)
    • The Future (1,636)
    Most Popular
    Mobile

    Samsung Galaxy Z Flip6 to have a bigger battery than the Flip5

    Gadgets

    Valve gives Steam its biggest update and redesign in years

    Science

    Pentagon calls for tighter integration between military and commercial space

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.