Close Menu
Ztoog
    What's Hot
    Mobile

    Samsung Galaxy Watch4 and Galaxy Watch4 Classic get Wear OS 4 update

    Technology

    The Battle Against Identity Fraud: Are You Prepared?

    The Future

    Best Holiday Deals on E-Bikes: Save Hundreds on Top Brands Like BirdBike, Velowave and More

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data
    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    Facebook Twitter Pinterest WhatsApp
    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Building simulators for robots has been a long run problem. Traditional engines require guide coding of physics and excellent 3D fashions. NVIDIA is altering this with DreamDojo, a completely open-source, generalizable robotic world mannequin. Instead of utilizing a physics engine, DreamDojo ‘goals’ the outcomes of robotic actions straight in pixels.

    https://arxiv.org/pdf/2602.06949

    Scaling Robotics with 44k+ Hours of Human Experience

    The greatest hurdle for AI in robotics is knowledge. Collecting robot-specific knowledge is dear and gradual. DreamDojo solves this by studying from 44k+ hours of selfish human movies. This dataset, referred to as DreamDojo-HV, is the biggest of its sort for world mannequin pretraining.

    • It options 6,015 distinctive duties throughout 1M+ trajectories.
    • The knowledge covers 9,869 distinctive scenes and 43,237 distinctive objects.
    • Pretraining used 100,000 NVIDIA H100 GPU hours to construct 2B and 14B mannequin variants.

    Humans have already mastered complicated physics, equivalent to pouring liquids or folding garments. DreamDojo makes use of this human knowledge to offer robots a ‘frequent sense’ understanding of how the world works.

    https://arxiv.org/pdf/2602.06949

    Bridging the Gap with Latent Actions

    Human movies should not have robotic motor instructions. To make these movies ‘robot-readable,’ NVIDIA’s analysis staff launched steady latent actions. This system makes use of a spatiotemporal Transformer VAE to extract actions straight from pixels.

    • The VAE encoder takes 2 consecutive frames and outputs a 32-dimensional latent vector.
    • This vector represents essentially the most crucial movement between frames.
    • The design creates an info bottleneck that disentangles motion from visible context.
    • This permits the mannequin to study physics from people and apply them to totally different robotic our bodies.
    https://arxiv.org/pdf/2602.06949

    Better Physics by way of Architecture

    DreamDojo is predicated on the Cosmos-Predict2.5 latent video diffusion mannequin. It makes use of the WAN2.2 tokenizer, which has a temporal compression ratio of 4. The staff improved the structure with 3 key options:

    1. Relative Actions: The mannequin makes use of joint deltas as an alternative of absolute poses. This makes it simpler for the mannequin to generalize throughout totally different trajectories.
    2. Chunked Action Injection: It injects 4 consecutive actions into every latent body. This aligns the actions with the tokenizer’s compression ratio and fixes causality confusion.
    3. Temporal Consistency Loss: A brand new loss perform matches predicted body velocities to ground-truth transitions. This reduces visible artifacts and retains objects bodily constant.

    Distillation for 10.81 FPS Real-Time Interaction

    A simulator is simply helpful whether it is quick. Standard diffusion fashions require too many denoising steps for real-time use. NVIDIA staff used a Self Forcing distillation pipeline to unravel this.

    • The distillation coaching was carried out on 64 NVIDIA H100 GPUs.
    • The ‘scholar’ mannequin reduces denoising from 35 steps all the way down to 4 steps.
    • The closing mannequin achieves a real-time pace of 10.81 FPS.
    • It is steady for steady rollouts of 60 seconds (600 frames).

    Unlocking Downstream Applications

    DreamDojo’s pace and accuracy allow a number of superior purposes for AI engineers.

    1. Reliable Policy Evaluation

    Testing robots in the actual world is dangerous. DreamDojo acts as a high-fidelity simulator for benchmarking.

    • Its simulated success charges present a Pearson correlation of (Pearson =0.995) with real-world outcomes.
    • The Mean Maximum Rank Violation (MMRV) is simply 0.003.

    2. Model-Based Planning

    Robots can use DreamDojo to ‘look forward.’ A robotic can simulate a number of motion sequences and choose the most effective one.

    • In a fruit-packing process, this improved real-world success charges by 17%.
    • Compared to random sampling, it supplied a 2x enhance in success.

    3. Live Teleoperation

    Developers can teleoperate digital robots in actual time. NVIDIA staff demonstrated this utilizing a PICO VR controller and an area desktop with an NVIDIA RTX 5090. This permits for secure and speedy knowledge assortment.

    Summary of Model Performance

    Metric DREAMDOJO-2B DREAMDOJO-14B
    Physics Correctness 62.50% 73.50%
    Action Following 63.45% 72.55%
    FPS (Distilled) 10.81 N/A

    NVIDIA has launched all weights, coaching code, and analysis benchmarks. This open-source launch means that you can post-train DreamDojo on your individual robotic knowledge right this moment.

    Key Takeaways

    • Massive Scale and Diversity: DreamDojo is pretrained on DreamDojo-HV, the biggest selfish human video dataset to this point, that includes 44,711 hours of footage throughout 6,015 distinctive duties and 9,869 scenes.
    • Unified Latent Action Proxy: To overcome the dearth of motion labels in human movies, the mannequin makes use of steady latent actions extracted by way of a spatiotemporal Transformer VAE, which serves as a hardware-agnostic management interface.
    • Optimized Training and Architecture: The mannequin achieves high-fidelity physics and exact controllability by using relative motion transformations, chunked motion injection, and a specialised temporal consistency loss.
    • Real-Time Performance by way of Distillation: Through a Self Forcing distillation pipeline, the mannequin is accelerated to 10.81 FPS, enabling interactive purposes like stay teleoperation and steady, long-horizon simulations for over 1 minute.
    • Reliable for Downstream Tasks: DreamDojo capabilities as an correct simulator for coverage analysis, displaying a 0.995 Pearson correlation with real-world success charges, and might enhance real-world efficiency by 17% when used for model-based planning.

    Check out the Paper and Codes. Also, be happy to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you may be a part of us on telegram as effectively.

    The put up NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data appeared first on MarkTechPost.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    Science

    Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    Technology

    Google’s Cloud AI lead on the three frontiers of model capability

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    The specs that matter, those that don’t

    Robert Triggs / Android AuthorityWhile it’s laborious to go improper with any of right now’s…

    Mobile

    Huawei Mate 60 to come with a 50 MP triple camera on the circular island

    The Huawei Mate 60 collection is in the works, and in accordance to Digital Chat…

    AI

    A platform for computer vision accessibility technology – Google Research Blog

    Posted by Dave Hawkey, Software Engineer, Google Research

    Crypto

    $1 Billion Erased In Liquidations As Bitcoin Dives To 2-Month Low

    Bitcoin and different cryptocurrencies plummeted early on Friday morning, with digital property persevering with to…

    Technology

    Robot Videos: Weekly Collection of Robotics Videos

    Video Friday is your weekly choice of superior robotics movies, collected by your folks at…

    Our Picks
    The Future

    Francis Ford Coppola’s Megalopolis is Finally Coming Out

    Crypto

    Dogecoin Wagers Surge to $2 Billion as Price Hits Highest Level Since 2021

    Mobile

    You can now have custom action buttons on Wear OS with a new Google Assistant tile

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    Science

    It’s cutting calories—not intermittent fasting—that drops weight, study suggests

    Technology

    S.E.C. Sues Elon Musk to Compel Him to Testify on Twitter Purchase

    Gadgets

    Ubergizmo’s Best of CES 2024

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.