Close Menu
Ztoog
    What's Hot
    Science

    A hunk of junk from the International Space Station hurtles back to Earth

    Crypto

    Illegal Cryptocurrency Mining Operation Shut Down in Malaysia

    Science

    Welcome to the Republic of Cows

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Blockchain Integration in Modern Online Card Games

      Drivers in fatal Ford BlueCruise crashes were likely distracted before impact

      Livestream FA Cup Soccer: Watch Newcastle vs. Man City From Anywhere

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

    • Technology

      Stop Editing Manually: 5 AI Tools in Photoshop You Should Be Using

      Laser 3D Printing Could Build Lunar Base Structures

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

    • Gadgets

      Goal Zero Yeti 1500 6G review: A rugged portable power station that isn’t afraid to get dirty

      How to Run Ethernet Cables to Your Router and Keep Them Tidy

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

    • Mobile

      New Phone Bettor Habits Across MENA What Early 2026 Data Shows

      Samsung managed to tie Apple for first place in this one 2025 smartphone market report

      Need a power station? These two Anker ones are nearly half off

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

    • Science

      Anduril, the autonomous weapons maker, doubles the size of its space unit

      Florida can’t decide if its official saltwater mammal is a dolphin or a porpoise

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

    • AI

      NVIDIA Releases Nemotron 3 Super: A 120B Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI

      A “ChatGPT for spreadsheets” helps solve difficult engineering challenges faster | Ztoog

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

    • Crypto

      Pundit Reveals Why Bitcoin Is Headed For Another Crash To $42,000

      Ethereum co-founder Jeffrey Wilcke sends $157M in ETH to Kraken after months of wallet silence

      SEC Vs. Justin Sun Case Ends In $10M Settlement

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

    Ztoog
    Home » Solving a machine-learning mystery | Ztoog
    AI

    Solving a machine-learning mystery | Ztoog

    Facebook Twitter Pinterest WhatsApp
    Solving a machine-learning mystery | Ztoog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large language fashions like OpenAI’s GPT-3 are large neural networks that may generate human-like textual content, from poetry to programming code. Trained utilizing troves of web knowledge, these machine-learning fashions take a small little bit of enter textual content after which predict the textual content that’s more likely to come subsequent.

    But that’s not all these fashions can do. Researchers are exploring a curious phenomenon often called in-context studying, by which a giant language mannequin learns to perform a activity after seeing solely a few examples — even if it wasn’t skilled for that activity. For occasion, somebody might feed the mannequin a number of instance sentences and their sentiments (constructive or destructive), then immediate it with a new sentence, and the mannequin may give the right sentiment.

    Typically, a machine-learning mannequin like GPT-3 would should be retrained with new knowledge for this new activity. During this coaching course of, the mannequin updates its parameters because it processes new data to study the duty. But with in-context studying, the mannequin’s parameters aren’t up to date, so it looks like the mannequin learns a new activity with out studying something in any respect.

    Scientists from MIT, Google Research, and Stanford University are striving to unravel this mystery. They studied fashions which can be similar to giant language fashions to see how they will study with out updating parameters.

    The researchers’ theoretical outcomes present that these large neural community fashions are able to containing smaller, easier linear fashions buried inside them. The giant mannequin might then implement a easy studying algorithm to coach this smaller, linear mannequin to finish a new activity, utilizing solely data already contained inside the bigger mannequin. Its parameters stay fastened.

    An vital step towards understanding the mechanisms behind in-context studying, this analysis opens the door to extra exploration across the studying algorithms these giant fashions can implement, says Ekin Akyürek, a laptop science graduate scholar and lead creator of a paper exploring this phenomenon. With a higher understanding of in-context studying, researchers might allow fashions to finish new duties with out the necessity for pricey retraining.

    “Usually, if you wish to fine-tune these fashions, it’s worthwhile to acquire domain-specific knowledge and do some advanced engineering. But now we will simply feed it an enter, 5 examples, and it accomplishes what we wish. So, in-context studying is an unreasonably environment friendly studying phenomenon that must be understood,” Akyürek says.

    Joining Akyürek on the paper are Dale Schuurmans, a analysis scientist at Google Brain and professor of computing science on the University of Alberta; in addition to senior authors Jacob Andreas, the X Consortium Assistant Professor within the MIT Department of Electrical Engineering and Computer Science and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL); Tengyu Ma, an assistant professor of laptop science and statistics at Stanford; and Danny Zhou, principal scientist and analysis director at Google Brain. The analysis might be introduced on the International Conference on Learning Representations.

    A mannequin inside a mannequin

    In the machine-learning analysis group, many scientists have come to imagine that giant language fashions can carry out in-context studying due to how they’re skilled, Akyürek says.

    For occasion, GPT-3 has lots of of billions of parameters and was skilled by studying large swaths of textual content on the web, from Wikipedia articles to Reddit posts. So, when somebody reveals the mannequin examples of a new activity, it has doubtless already seen one thing very related as a result of its coaching dataset included textual content from billions of internet sites. It repeats patterns it has seen throughout coaching, fairly than studying to carry out new duties.

    Akyürek hypothesized that in-context learners aren’t simply matching beforehand seen patterns, however as a substitute are literally studying to carry out new duties. He and others had experimented by giving these fashions prompts utilizing artificial knowledge, which they may not have seen anyplace earlier than, and located that the fashions might nonetheless study from simply a few examples. Akyürek and his colleagues thought that maybe these neural community fashions have smaller machine-learning fashions inside them that the fashions can prepare to finish a new activity.

    “That could explain almost all of the learning phenomena that we have seen with these large models,” he says.

    To take a look at this speculation, the researchers used a neural community mannequin referred to as a transformer, which has the identical structure as GPT-3, however had been particularly skilled for in-context studying.

    By exploring this transformer’s structure, they theoretically proved that it might probably write a linear mannequin inside its hidden states. A neural community consists of many layers of interconnected nodes that course of knowledge. The hidden states are the layers between the enter and output layers.

    Their mathematical evaluations present that this linear mannequin is written someplace within the earliest layers of the transformer. The transformer can then replace the linear mannequin by implementing easy studying algorithms.

    In essence, the mannequin simulates and trains a smaller model of itself.

    Probing hidden layers

    The researchers explored this speculation utilizing probing experiments, the place they appeared within the transformer’s hidden layers to try to get better a sure amount.

    “In this case, we tried to recover the actual solution to the linear model, and we could show that the parameter is written in the hidden states. This means the linear model is in there somewhere,” he says.

    Building off this theoretical work, the researchers could possibly allow a transformer to carry out in-context studying by including simply two layers to the neural community. There are nonetheless many technical particulars to work out earlier than that might be potential, Akyürek cautions, but it surely might assist engineers create fashions that may full new duties with out the necessity for retraining with new knowledge.

    “The paper sheds light on one of the most remarkable properties of modern large language models — their ability to learn from data given in their inputs, without explicit training. Using the simplified case of linear regression, the authors show theoretically how models can implement standard learning algorithms while reading their input, and empirically which learning algorithms best match their observed behavior,” says Mike Lewis, a analysis scientist at Facebook AI Research who was not concerned with this work. “These results are a stepping stone to understanding how models can learn more complex tasks, and will help researchers design better training methods for language models to further improve their performance.”

    Moving ahead, Akyürek plans to proceed exploring in-context studying with capabilities which can be extra advanced than the linear fashions they studied on this work. They might additionally apply these experiments to giant language fashions to see whether or not their behaviors are additionally described by easy studying algorithms. In addition, he desires to dig deeper into the sorts of pretraining knowledge that may allow in-context studying.

    “With this work, people can now visualize how these models can learn from exemplars. So, my hope is that it changes some people’s views about in-context learning,” Akyürek says. “These models are not as dumb as people think. They don’t just memorize these tasks. They can learn new tasks, and we have shown how that can be done.”

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    NVIDIA Releases Nemotron 3 Super: A 120B Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI

    AI

    A “ChatGPT for spreadsheets” helps solve difficult engineering challenges faster | Ztoog

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Science

    Studying coronal mass ejections during the total solar eclipse

    A total solar eclipse is coming to North America on 8 April, providing researchers a…

    Technology

    T-Mobile could be preparing to hit prepaid customers with new fees

    Eric Zeman / Android AuthorityTL;DR Leaked paperwork counsel T-Mobile could be preparing to introduce new…

    Gadgets

    Qualcomm And Meta Forge The Future Of XR and AR With Next-Gen Platforms

    Qualcomm Technologies has unveiled its next-generation XR (Extended Reality) and AR (Augmented Reality) platforms, marking…

    Science

    Remnants of ancient biota found in ocean rocks

    Ancient organisms that bobbed by Earth’s waterways at the least 1.6 billion years in the…

    Gadgets

    The Asus Zenfone 11 Ultra abandons the small-phone market

    The Asus Zenfone 11 Ultra. Asus (*11*) The back and front of the cellphone. Asus…

    Our Picks
    Technology

    End of an era: Microsoft bids farewell to WordPad after nearly 30 years

    Gadgets

    Best Theraguns and Other Therabody Tools (2024): Massage Guns, SmartGoggles, and TheraFace

    The Future

    Ikea’s smart lights can now automatically adjust throughout the day

    Categories
    • AI (1,562)
    • Crypto (1,829)
    • Gadgets (1,872)
    • Mobile (1,913)
    • Science (1,941)
    • Technology (1,864)
    • The Future (1,719)
    Most Popular
    Crypto

    Crypto Analyst Who Called Bitcoin’s Parabolic Rally Picks Altcoin Set To Pop

    AI

    Mozilla Launches MemoryCache: An On-Device Machine Learning Browser Add-On Bridging Personalized Web Experiences and Privacy

    Mobile

    We wish this rollable phone concept was a thing you could buy

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.