Close Menu
Ztoog
    What's Hot
    Science

    Qianfan: China’s answer to SpaceX’s Starlink mega constellation is also threatening astronomy

    Technology

    In Big Election Year, A.I.’s Architects Move Against Its Misuse

    Technology

    Take-Advantage-of-Immersive-Experiences-for-Business – ReadWrite

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

    • Technology

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

      Forget screens: more details emerge on the mysterious Jony Ive + OpenAI device

    • Science

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » A more effective way to train machines for uncertain, real-world situations | Ztoog
    AI

    A more effective way to train machines for uncertain, real-world situations | Ztoog

    Facebook Twitter Pinterest WhatsApp
    A more effective way to train machines for uncertain, real-world situations | Ztoog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Someone studying to play tennis may rent a trainer to assist them be taught quicker. Because this trainer is (hopefully) an ideal tennis participant, there are occasions when attempting to precisely mimic the trainer gained’t assist the coed be taught. Perhaps the trainer leaps excessive into the air to deftly return a volley. The pupil, unable to copy that, may as a substitute attempt a number of different strikes on her personal till she has mastered the talents she wants to return volleys.

    Computer scientists can even use “teacher” techniques to train one other machine to full a process. But similar to with human studying, the coed machine faces a dilemma of realizing when to comply with the trainer and when to discover by itself. To this finish, researchers from MIT and Technion, the Israel Institute of Technology, have developed an algorithm that mechanically and independently determines when the coed ought to mimic the trainer (often known as imitation studying) and when it ought to as a substitute be taught by means of trial and error (often known as reinforcement studying).

    Their dynamic strategy permits the coed to diverge from copying the trainer when the trainer is both too good or not ok, however then return to following the trainer at a later level within the coaching course of if doing so would obtain higher outcomes and quicker studying.

    When the researchers examined this strategy in simulations, they discovered that their mixture of trial-and-error studying and imitation studying enabled college students to be taught duties more successfully than strategies that used just one sort of studying.

    This methodology may assist researchers enhance the coaching course of for machines that will probably be deployed in unsure real-world situations, like a robotic being skilled to navigate inside a constructing it has by no means seen earlier than.

    “This combination of learning by trial-and-error and following a teacher is very powerful. It gives our algorithm the ability to solve very difficult tasks that cannot be solved by using either technique individually,” says Idan Shenfeld {an electrical} engineering and pc science (EECS) graduate pupil and lead writer of a paper on this method.

    Shenfeld wrote the paper with coauthors Zhang-Wei Hong, an EECS graduate pupil; Aviv Tamar; assistant professor {of electrical} engineering and pc science at Technion; and senior writer Pulkit Agrawal, director of Improbable AI Lab and an assistant professor within the Computer Science and Artificial Intelligence Laboratory. The analysis will probably be offered on the International Conference on Machine Learning.

    Striking a steadiness

    Many current strategies that search to strike a steadiness between imitation studying and reinforcement studying accomplish that by means of brute power trial-and-error. Researchers choose a weighted mixture of the 2 studying strategies, run the whole coaching process, after which repeat the method till they discover the optimum steadiness. This is inefficient and infrequently so computationally costly it isn’t even possible.

    “We want algorithms that are principled, involve tuning of as few knobs as possible, and achieve high performance — these principles have driven our research,” says Agrawal.

    To obtain this, the group approached the issue in a different way than prior work. Their answer entails coaching two college students: one with a weighted mixture of reinforcement studying and imitation studying, and a second that may solely use reinforcement studying to be taught the identical process.

    The fundamental concept is to mechanically and dynamically modify the weighting of the reinforcement and imitation studying goals of the primary pupil. Here is the place the second pupil comes into play. The researchers’ algorithm regularly compares the 2 college students. If the one utilizing the trainer is doing higher, the algorithm places more weight on imitation studying to train the coed, but when the one utilizing solely trial and error is beginning to get higher outcomes, it would focus more on studying from reinforcement studying.

    By dynamically figuring out which methodology achieves higher outcomes, the algorithm is adaptive and may choose the most effective method all through the coaching course of. Thanks to this innovation, it’s in a position to more successfully educate college students than different strategies that aren’t adaptive, Shenfeld says.

    “One of the main challenges in developing this algorithm was that it took us some time to realize that we should not train the two students independently. It became clear that we needed to connect the agents to make them share information, and then find the right way to technically ground this intuition,” Shenfeld says.

    Solving robust issues

    To take a look at their strategy, the researchers arrange many simulated teacher-student coaching experiments, corresponding to navigating by means of a maze of lava to attain the opposite nook of a grid. In this case, the trainer has a map of the whole grid whereas the coed can solely see a patch in entrance of it. Their algorithm achieved an nearly good success fee throughout all testing environments, and was a lot quicker than different strategies.

    To give their algorithm a good more troublesome take a look at, they arrange a simulation involving a robotic hand with contact sensors however no imaginative and prescient, that should reorient a pen to the proper pose. The trainer had entry to the precise orientation of the pen, whereas the coed may solely use contact sensors to decide the pen’s orientation.

    Their methodology outperformed others that used both solely imitation studying or solely reinforcement studying.

    Reorienting objects is one amongst many manipulation duties {that a} future dwelling robotic would want to carry out, a imaginative and prescient that the Improbable AI lab is working towards, Agrawal provides.

    Teacher-student studying has efficiently been utilized to train robots to carry out complicated object manipulation and locomotion in simulation after which switch the discovered expertise into the real-world. In these strategies, the trainer has privileged data accessible from the simulation that the coed gained’t have when it’s deployed in the true world. For instance, the trainer will know the detailed map of a constructing that the coed robotic is being skilled to navigate utilizing solely pictures captured by its digicam.

    “Current methods for student-teacher learning in robotics don’t account for the inability of the student to mimic the teacher and thus are performance-limited. The new method paves a path for building superior robots,” says Agrawal.

    Apart from higher robots, the researchers consider their algorithm has the potential to enhance efficiency in various functions the place imitation or reinforcement studying is getting used. For instance, massive language fashions corresponding to GPT-4 are superb at engaging in a variety of duties, so maybe one may use the big mannequin as a trainer to train a smaller, pupil mannequin to be even “better” at one explicit process. Another thrilling path is to examine the similarities and variations between machines and people studying from their respective academics. Such evaluation may assist enhance the educational expertise, the researchers say.

    “What’s interesting about [this method] compared to related methods is how robust it seems to various parameter choices, and the variety of domains it shows promising results in,” says Abhishek Gupta, an assistant professor on the University of Washington, who was not concerned with this work. “While the current set of results are largely in simulation, I am very excited about the future possibilities of applying this work to problems involving memory and reasoning with different modalities such as tactile sensing.” 

    “This work presents an interesting approach to reuse prior computational work in reinforcement learning. Particularly, their proposed method can leverage suboptimal teacher policies as a guide while avoiding careful hyperparameter schedules required by prior methods for balancing the objectives of mimicking the teacher versus optimizing the task reward,” provides Rishabh Agarwal, a senior analysis scientist at Google Brain, who was additionally not concerned on this analysis. “Hopefully, this work would make reincarnating reinforcement learning with learned policies less cumbersome.”  

    This analysis was supported, partly, by the MIT-IBM Watson AI Lab, Hyundai Motor Company, the DARPA Machine Common Sense Program, and the Office of Naval Research.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Rationale engineering generates a compact new tool for gene therapy | Ztoog

    AI

    The AI Hype Index: College students are hooked on ChatGPT

    AI

    Learning how to predict rare kinds of failures | Ztoog

    AI

    Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

    AI

    AI learns how vision and sound are connected, without human intervention | Ztoog

    AI

    How AI is introducing errors into courtrooms

    AI

    With AI, researchers predict the location of virtually any protein within a human cell | Ztoog

    AI

    Google DeepMind’s new AI agent cracks real-world problems better than humans can

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    27 Best Labor Day Mattress and Sheet Deals: Hybrid Beds, Budget, Innerspring

    Labor Day is nearly right here and mattress firms have kicked off their gross sales.…

    AI

    JPMorgan AI Research Introduces DocLLM: A Lightweight Extension to Traditional Large Language Models Tailored for Generative Reasoning Over Documents with Rich Layouts

    Enterprise paperwork like contracts, reviews, invoices, and receipts come with intricate layouts. These paperwork could…

    Mobile

    Best Galaxy S24 Plus screen protectors

    Samsung upgraded the Galaxy S24 Plus with an extremely brilliant show similar to the remainder…

    Gadgets

    Dealmaster: Sparkling TVs, noise canceling cans, Lenovo laptops, and much more

    Samsung’s S90C OLED TV. As November approaches, we discover ourselves hurtling into TV season, which…

    Gadgets

    Sony Xperia 1 VI: New Dimensions Revealed On Recent Leaks

    Sony’s forthcoming Xperia 1 VI is anticipated to bear slight alterations in dimensions in comparison…

    Our Picks
    Mobile

    The Sony WF-1000XM5 are coming next week

    Mobile

    Telegram now lets anyone transcribe voice messages for free

    Crypto

    Did Kanye West just expose how celeb X hacks are used to pump fake meme coins?

    Categories
    • AI (1,493)
    • Crypto (1,753)
    • Gadgets (1,805)
    • Mobile (1,850)
    • Science (1,866)
    • Technology (1,802)
    • The Future (1,648)
    Most Popular
    Technology

    The 8 Best Eco-Friendly iPhone 12 and 12 Pro Cases

    Gadgets

    Be prepared while on the road with this fast-charging jump starter & power bank, now on sale for $69.99

    Science

    Hikaru Utada Would Rather Play CERN Than Coachella

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.