Close Menu
Ztoog
    What's Hot
    Crypto

    Bitcoin Set To Lead A New Crypto Surge As Downside Factors Get Exhausted

    Gadgets

    Samsung Galaxy Trifold Rumors: Release Date, Design, And Price Insights

    Technology

    ChatGPT is leaking passwords from private conversations of its users, Ars reader says

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » World scale inverse reinforcement learning in Google Maps – Google Research Blog
    AI

    World scale inverse reinforcement learning in Google Maps – Google Research Blog

    Facebook Twitter Pinterest WhatsApp
    World scale inverse reinforcement learning in Google Maps – Google Research Blog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Posted by Matt Barnes, Software Engineer, Google Research

    Routing in Google Maps stays one in every of our most useful and regularly used options. Determining the most effective route from A to B requires making advanced trade-offs between components together with the estimated time of arrival (ETA), tolls, directness, floor situations (e.g., paved, unpaved roads), and consumer preferences, which fluctuate throughout transportation mode and native geography. Often, probably the most pure visibility we now have into vacationers’ preferences is by analyzing real-world journey patterns.

    Learning preferences from noticed sequential choice making habits is a traditional software of inverse reinforcement learning (IRL). Given a Markov choice course of (MDP) — a formalization of the highway community — and a set of demonstration trajectories (the traveled routes), the purpose of IRL is to get well the customers’ latent reward operate. Although previous analysis has created more and more basic IRL options, these haven’t been efficiently scaled to world-sized MDPs. Scaling IRL algorithms is difficult as a result of they sometimes require fixing an RL subroutine at each replace step. At first look, even making an attempt to suit a world-scale MDP into reminiscence to compute a single gradient step seems infeasible because of the massive variety of highway segments and restricted excessive bandwidth reminiscence. When making use of IRL to routing, one wants to contemplate all cheap routes between every demonstration’s origin and vacation spot. This implies that any try to interrupt the world-scale MDP into smaller parts can not contemplate parts smaller than a metropolitan space.

    To this finish, in “Massively Scalable Inverse Reinforcement Learning in Google Maps”, we share the results of a multi-year collaboration amongst Google Research, Maps, and Google DeepMind to surpass this IRL scalability limitation. We revisit traditional algorithms in this area, and introduce advances in graph compression and parallelization, together with a brand new IRL algorithm known as Receding Horizon Inverse Planning (RHIP) that gives fine-grained management over efficiency trade-offs. The ultimate RHIP coverage achieves a 16–24% relative enchancment in international route match fee, i.e., the share of de-identified traveled routes that precisely match the recommended route in Google Maps. To the most effective of our information, this represents the most important occasion of IRL in an actual world setting thus far.

    Google Maps enhancements in route match fee relative to the present baseline, when utilizing the RHIP inverse reinforcement learning coverage.

    The advantages of IRL

    A refined however essential element in regards to the routing drawback is that it’s purpose conditioned, that means that each vacation spot state induces a barely totally different MDP (particularly, the vacation spot is a terminal, zero-reward state). IRL approaches are effectively fitted to these kind of issues as a result of the realized reward operate transfers throughout MDPs, and solely the vacation spot state is modified. This is in distinction to approaches that straight be taught a coverage, which generally require an additional issue of S parameters, the place S is the variety of MDP states.

    Once the reward operate is realized by way of IRL, we reap the benefits of a robust inference-time trick. First, we consider all the graph’s rewards as soon as in an offline batch setting. This computation is carried out totally on servers with out entry to particular person journeys, and operates solely over batches of highway segments in the graph. Then, we save the outcomes to an in-memory database and use a quick on-line graph search algorithm to search out the very best reward path for routing requests between any origin and vacation spot. This circumvents the necessity to carry out on-line inference of a deeply parameterized mannequin or coverage, and vastly improves serving prices and latency.

    Reward mannequin deployment utilizing batch inference and quick on-line planners.

    Receding Horizon Inverse Planning

    To scale IRL to the world MDP, we compress the graph and shard the worldwide MDP utilizing a sparse Mixture of Experts (MoE) based mostly on geographic areas. We then apply traditional IRL algorithms to unravel the native MDPs, estimate the loss, and ship gradients again to the MoE. The worldwide reward graph is computed by decompressing the ultimate MoE reward mannequin. To present extra management over efficiency traits, we introduce a brand new generalized IRL algorithm known as Receding Horizon Inverse Planning (RHIP).

    IRL reward mannequin coaching utilizing MoE parallelization, graph compression, and RHIP.

    RHIP is impressed by individuals’s tendency to carry out in depth native planning (“What am I doing for the subsequent hour?”) and approximate long-term planning (“What will my life appear to be in 5 years?”). To reap the benefits of this perception, RHIP makes use of strong but costly stochastic insurance policies in the native area surrounding the demonstration path, and switches to cheaper deterministic planners past some horizon. Adjusting the horizon H permits controlling computational prices, and infrequently permits the invention of the efficiency candy spot. Interestingly, RHIP generalizes many traditional IRL algorithms and offers the novel perception that they are often considered alongside a stochastic vs. deterministic spectrum (particularly, for H=∞ it reduces to MaxEnt, for H=1 it reduces to BIRL, and for H=0 it reduces to MMP).

    Given an indication from so to sd, (1) RHIP follows a strong but costly stochastic coverage in the native area surrounding the demonstration (blue area). (2) Beyond some horizon H, RHIP switches to following a less expensive deterministic planner (crimson traces). Adjusting the horizon permits fine-grained management over efficiency and computational prices.

    Routing wins

    The RHIP coverage offers a 15.9% and 24.1% raise in international route match fee for driving and two-wheelers (e.g., scooters, bikes, mopeds) relative to the well-tuned Maps baseline, respectively. We’re particularly enthusiastic about the advantages to extra sustainable transportation modes, the place components past journey time play a considerable position. By tuning RHIP’s horizon H, we’re in a position to obtain a coverage that’s each extra correct than all different IRL insurance policies and 70% sooner than MaxEnt.

    Our 360M parameter reward mannequin offers intuitive wins for Google Maps customers in dwell A/B experiments. Examining highway segments with a big absolute distinction between the realized rewards and the baseline rewards may also help enhance sure Google Maps routes. For instance:

    Nottingham, UK. The most well-liked route (blue) was beforehand marked as personal property because of the presence of a giant gate, which indicated to our programs that the highway could also be closed at occasions and wouldn’t be preferrred for drivers. As a end result, Google Maps routed drivers by way of an extended, alternate detour as an alternative (crimson). However, as a result of real-world driving patterns confirmed that customers recurrently take the popular route with out a difficulty (because the gate is sort of by no means closed), IRL now learns to route drivers alongside the popular route by putting a big constructive reward on this highway section.

    Conclusion

    Increasing efficiency by way of elevated scale – each in phrases of dataset dimension and mannequin complexity – has confirmed to be a persistent development in machine learning. Similar beneficial properties for inverse reinforcement learning issues have traditionally remained elusive, largely because of the challenges with dealing with virtually sized MDPs. By introducing scalability developments to traditional IRL algorithms, we’re now in a position to practice reward fashions on issues with lots of of hundreds of thousands of states, demonstration trajectories, and mannequin parameters, respectively. To the most effective of our information, that is the most important occasion of IRL in a real-world setting thus far. See the paper to be taught extra about this work.

    Acknowledgements

    This work is a collaboration throughout a number of groups at Google. Contributors to the mission embody Matthew Abueg, Oliver Lange, Matt Deeds, Jason Trader, Denali Molitor, Markus Wulfmeier, Shawn O’Banion, Ryan Epp, Renaud Hartert, Rui Song, Thomas Sharp, Rémi Robert, Zoltan Szego, Beth Luan, Brit Larabee and Agnieszka Madurska.

    We’d additionally like to increase our due to Arno Eigenwillig, Jacob Moorman, Jonathan Spencer, Remi Munos, Michael Bloesch and Arun Ahuja for useful discussions and ideas.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    Will the Samsung Galaxy Z Fold 4 cases fit the Samsung Galaxy Z Fold 5?

    Will the Samsung Galaxy Z Fold 4 cases fit the Samsung Galaxy Z Fold 5?Best…

    Crypto

    Ethereum Dencun Upgrade Launch Boosts ETH Price, Eyes 90% Fee Reduction

    Ethereum (ETH) has made vital strides in its 2024 roadmap with the profitable launch of…

    The Future

    AI comes up with battery design that uses 70 per cent less lithium

    A researcher exams batteries that use a brand new materials designed by AIDan DeLong for…

    Gadgets

    Apple refreshes 13- and 15-inch MacBook Airs with faster M3 chip

    Enlarge / Apple is refreshing the MacEbook Air with M3 chips however leaving every thing…

    Mobile

    Why US lawmakers want TikTok to go away

    TikTok, the wildly common video-sharing app, has turn out to be a cultural phenomenon. But…

    Our Picks
    Science

    These are the exciting space missions slated for launch in 2024

    Science

    What happens when you donate your body to science? 

    The Future

    Cruise Pulls Robotaxis After California Says They’re ‘Not Safe’

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    AI

    Meet LaunchFlow: An AI Dev Startup that Provides an Infrastructure-from-Code Tool to Improve Your Developer Experience on GCP and AWS Dramatically

    Technology

    Today’s NYT Connections Hints, Answer and Help for June 23, #378

    Gadgets

    Apple Beats an Apple Watch Lawsuit

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.