Close Menu
Ztoog
    What's Hot
    Mobile

    How long does it take to charge the Google Pixel 8 and 8 Pro?

    Gadgets

    Oh hey, Google just announced the Pixel Fold

    Technology

    Brain Implants Helped 5 People Recover From Traumatic Injuries

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » Language to rewards for robotic skill synthesis – Google Research Blog
    AI

    Language to rewards for robotic skill synthesis – Google Research Blog

    Facebook Twitter Pinterest WhatsApp
    Language to rewards for robotic skill synthesis – Google Research Blog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Posted by Wenhao Yu and Fei Xia, Research Scientists, Google

    Empowering end-users to interactively educate robots to carry out novel duties is a vital functionality for their profitable integration into real-world purposes. For instance, a consumer might want to educate a robotic canine to carry out a brand new trick, or educate a manipulator robotic how to set up a lunch field based mostly on consumer preferences. The current developments in massive language fashions (LLMs) pre-trained on intensive web knowledge have proven a promising path in direction of reaching this objective. Indeed, researchers have explored various methods of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing brokers.

    While these strategies impart new modes of compositional generalization, they give attention to utilizing language to hyperlink collectively new behaviors from an current library of management primitives which are both manually engineered or discovered a priori. Despite having inner information about robotic motions, LLMs wrestle to instantly output low-level robotic instructions due to the restricted availability of related coaching knowledge. As a outcome, the expression of those strategies are bottlenecked by the breadth of the accessible primitives, the design of which regularly requires intensive skilled information or large knowledge assortment.

    In “Language to Rewards for Robotic Skill Synthesis”, we suggest an strategy to allow customers to educate robots novel actions via pure language enter. To accomplish that, we leverage reward features as an interface that bridges the hole between language and low-level robotic actions. We posit that reward features present a perfect interface for such duties given their richness in semantics, modularity, and interpretability. They additionally present a direct connection to low-level insurance policies via black-box optimization or reinforcement studying (RL). We developed a language-to-reward system that leverages LLMs to translate pure language consumer directions into reward-specifying code after which applies MuJoCo MPC to discover optimum low-level robotic actions that maximize the generated reward perform. We show our language-to-reward system on quite a lot of robotic management duties in simulation utilizing a quadruped robotic and a dexterous manipulator robotic. We additional validate our methodology on a bodily robotic manipulator.

    The language-to-reward system consists of two core parts: (1) a Reward Translator, and (2) a Motion Controller. The Reward Translator maps pure language instruction from customers to reward features represented as python code. The Motion Controller optimizes the given reward perform utilizing receding horizon optimization to discover the optimum low-level robotic actions, resembling the quantity of torque that ought to be utilized to every robotic motor.

    LLMs can’t instantly generate low-level robotic actions due to lack of information in pre-training dataset. We suggest to use reward features to bridge the hole between language and low-level robotic actions, and allow novel advanced robotic motions from pure language directions.

    Reward Translator: Translating consumer directions to reward features

    The Reward Translator module was constructed with the objective of mapping pure language consumer directions to reward features. Reward tuning is very domain-specific and requires skilled information, so it was not stunning to us after we discovered that LLMs educated on generic language datasets are unable to instantly generate a reward perform for a selected {hardware}. To tackle this, we apply the in-context studying skill of LLMs. Furthermore, we cut up the Reward Translator into two sub-modules: Motion Descriptor and Reward Coder.

    Motion Descriptor

    First, we design a Motion Descriptor that interprets enter from a consumer and expands it right into a pure language description of the specified robotic movement following a predefined template. This Motion Descriptor turns probably ambiguous or imprecise consumer directions into extra particular and descriptive robotic motions, making the reward coding process extra steady. Moreover, customers work together with the system via the movement description area, so this additionally offers a extra interpretable interface for customers in contrast to instantly exhibiting the reward perform.

    To create the Motion Descriptor, we use an LLM to translate the consumer enter into an in depth description of the specified robotic movement. We design prompts that information the LLMs to output the movement description with the correct amount of particulars and format. By translating a imprecise consumer instruction right into a extra detailed description, we’re ready to extra reliably generate the reward perform with our system. This thought may also be probably utilized extra usually past robotics duties, and is related to Inner-Monologue and chain-of-thought prompting.

    Reward Coder

    In the second stage, we use the identical LLM from Motion Descriptor for Reward Coder, which interprets generated movement description into the reward perform. Reward features are represented utilizing python code to profit from the LLMs’ information of reward, coding, and code construction.

    Ideally, we want to use an LLM to instantly generate a reward perform R (s, t) that maps the robotic state s and time t right into a scalar reward worth. However, producing the proper reward perform from scratch remains to be a difficult downside for LLMs and correcting the errors requires the consumer to perceive the generated code to present the fitting suggestions. As such, we pre-define a set of reward phrases which are generally used for the robotic of curiosity and permit LLMs to composite completely different reward phrases to formulate the ultimate reward perform. To obtain this, we design a immediate that specifies the reward phrases and information the LLM to generate the proper reward perform for the duty.

    The inner construction of the Reward Translator, which is tasked to map consumer inputs to reward features.

    Motion Controller: Translating reward features to robotic actions

    The Motion Controller takes the reward perform generated by the Reward Translator and synthesizes a controller that maps robotic commentary to low-level robotic actions. To do that, we formulate the controller synthesis downside as a Markov choice course of (MDP), which might be solved utilizing completely different methods, together with RL, offline trajectory optimization, or mannequin predictive management (MPC). Specifically, we use an open-source implementation based mostly on the MuJoCo MPC (MJPC).

    MJPC has demonstrated the interactive creation of various behaviors, resembling legged locomotion, greedy, and finger-gaiting, whereas supporting a number of planning algorithms, resembling iterative linear–quadratic–Gaussian (iLQG) and predictive sampling. More importantly, the frequent re-planning in MJPC empowers its robustness to uncertainties within the system and allows an interactive movement synthesis and correction system when mixed with LLMs.

    Examples

    Robot canine

    In the primary instance, we apply the language-to-reward system to a simulated quadruped robotic and educate it to carry out varied abilities. For every skill, the consumer will present a concise instruction to the system, which can then synthesize the robotic movement by utilizing reward features as an intermediate interface.

    Dexterous manipulator

    We then apply the language-to-reward system to a dexterous manipulator robotic to carry out quite a lot of manipulation duties. The dexterous manipulator has 27 levels of freedom, which may be very difficult to management. Many of those duties require manipulation abilities past greedy, making it troublesome for pre-designed primitives to work. We additionally embody an instance the place the consumer can interactively instruct the robotic to place an apple inside a drawer.

    Validation on actual robots

    We additionally validate the language-to-reward methodology utilizing a real-world manipulation robotic to carry out duties resembling choosing up objects and opening a drawer. To carry out the optimization in Motion Controller, we use AprilTag, a fiducial marker system, and F-VLM, an open-vocabulary object detection device, to determine the place of the desk and objects being manipulated.

    Conclusion

    In this work, we describe a brand new paradigm for interfacing an LLM with a robotic via reward features, powered by a low-level mannequin predictive management device, MuJoCo MPC. Using reward features because the interface allows LLMs to work in a semantic-rich house that performs to the strengths of LLMs, whereas making certain the expressiveness of the ensuing controller. To additional enhance the efficiency of the system, we suggest to use a structured movement description template to higher extract inner information about robotic motions from LLMs. We show our proposed system on two simulated robotic platforms and one actual robotic for each locomotion and manipulation duties.

    Acknowledgements

    We would love to thank our co-authors Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, and Yuval Tassa for their assist and help in varied points of the undertaking. We would additionally like to acknowledge Ken Caluwaerts, Kristian Hartikainen, Steven Bohez, Carolina Parada, Marc Toussaint, and the groups at Google DeepMind for their suggestions and contributions.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Technology

    A.I. Avatars and the Brave New Frontier of Life After Death

    When Matt, Peter Listro’s son, grasped that his father’s death was imminent, the shock was…

    Crypto

    Ethereum Bears Set Their Sights On Next Downside Target If $1,700 Support Breaks

    The value of Ethereum is presently on a downtrend and is approaching a big assist…

    Mobile

    New York bans addictive social media feeds for kids and teens

    Joe Hindy / Android AuthorityTL;DR New York is introducing two new legal guidelines designed to…

    AI

    MIT researchers make language models scalable self-learners | Ztoog

    Socrates as soon as stated: “It is not the size of a thing, but the…

    AI

    New tool helps people choose the right method for evaluating AI models | Ztoog

    When machine-learning models are deployed in real-world conditions, maybe to flag potential illness in X-rays…

    Our Picks
    Mobile

    Google One passes landmark subscribers milestone

    Crypto

    As SBF’s trial heads into its second week, here’s what we know so far

    AI

    How this grassroots effort could make AI voices more diverse

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    Technology

    IEEE President’s Note: Young Technologists Grow Their Careers With IEEE

    Crypto

    Silent Bitcoin Storm: Glassnode’s Beneath-the-Surface Revelations

    Technology

    The art and science of swearing

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.