Close Menu
Ztoog
    What's Hot
    The Future

    SEC sues Richard Heart and his projects Hex, PulseChain and Pulse X for fraud, securities violations

    Technology

    Pakistan’s election is both chaotic and predictable

    AI

    Meet BarbNet: A Specialized Deep Learning Model Designed for the Automated Detection and Phenotyping of Barbs in Microscopic Images of Awns

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      SEC Vs. Justin Sun Case Ends In $10M Settlement

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

    Ztoog
    Home » A decoder-only foundation model for time-series forecasting – Google Research Blog
    AI

    A decoder-only foundation model for time-series forecasting – Google Research Blog

    Facebook Twitter Pinterest WhatsApp
    A decoder-only foundation model for time-series forecasting – Google Research Blog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Posted by Rajat Sen and Yichen Zhou, Google Research

    Time-series forecasting is ubiquitous in varied domains, similar to retail, finance, manufacturing, healthcare and pure sciences. In retail use circumstances, for instance, it has been noticed that enhancing demand forecasting accuracy can meaningfully cut back stock prices and enhance income. Deep studying (DL) fashions have emerged as a preferred strategy for forecasting wealthy, multivariate, time-series information as a result of they’ve confirmed to carry out properly in a wide range of settings (e.g., DL fashions dominated the M5 competitors leaderboard).

    At the identical time, there was fast progress in giant foundation language fashions used for pure language processing (NLP) duties, similar to translation, retrieval-augmented era, and code completion. These fashions are skilled on large quantities of textual information derived from a wide range of sources like frequent crawl and open-source code that enables them to establish patterns in languages. This makes them very highly effective zero-shot instruments; for occasion, when paired with retrieval, they’ll reply questions on and summarize present occasions.

    Despite DL-based forecasters largely outperforming conventional strategies and progress being made in decreasing coaching and inference prices, they face challenges: most DL architectures require lengthy and concerned coaching and validation cycles earlier than a buyer can take a look at the model on a brand new time-series. A foundation model for time-series forecasting, in distinction, can present respectable out-of-the-box forecasts on unseen time-series information with no extra coaching, enabling customers to deal with refining forecasts for the precise downstream activity like retail demand planning.

    To that finish, in “A decoder-only foundation model for time-series forecasting”, we introduce TimesFM, a single forecasting model pre-trained on a big time-series corpus of 100 billion actual world time-points. Compared to the most recent giant language fashions (LLMs), TimesFM is far smaller (200M parameters), but we present that even at such scales, its zero-shot efficiency on a wide range of unseen datasets of various domains and temporal granularities come near the state-of-the-art supervised approaches skilled explicitly on these datasets. Later this yr we plan to make this model obtainable for exterior prospects in Google Cloud Vertex AI.

    A decoder-only foundation model for time-series forecasting

    LLMs are normally skilled in a decoder-only vogue that entails three steps. First, textual content is damaged down into subwords referred to as tokens. Then, the tokens are fed into stacked causal transformer layers that produce an output corresponding to every enter token (it can’t attend to future tokens). Finally, the output similar to the i-th token summarizes all the knowledge from earlier tokens and predicts the (i+1)-th token. During inference, the LLM generates the output one token at a time. For instance, when prompted with “What is the capital of France?”, it would generate the token “The”, then situation on “What is the capital of France? The” to generate the subsequent token “capital” and so forth till it generates the whole reply: “The capital of France is Paris”.

    A foundation model for time-series forecasting ought to adapt to variable context (what we observe) and horizon (what we question the model to forecast) lengths, whereas having sufficient capability to encode all patterns from a big pretraining dataset. Similar to LLMs, we use stacked transformer layers (self-attention and feedforward layers) as the principle constructing blocks for the TimesFM model. In the context of time-series forecasting, we deal with a patch (a gaggle of contiguous time-points) as a token that was popularized by a current long-horizon forecasting work. The activity then is to forecast the (i+1)-th patch of time-points given the i-th output on the finish of the stacked transformer layers.

    However, there are a number of key variations from language fashions. Firstly, we’d like a multilayer perceptron block with residual connections to transform a patch of time-series right into a token that may be enter to the transformer layers together with positional encodings (PE). For that, we use a residual block much like our prior work in long-horizon forecasting. Secondly, on the different finish, an output token from the stacked transformer can be utilized to foretell an extended size of subsequent time-points than the enter patch size, i.e., the output patch size will be bigger than the enter patch size.

    Consider a time-series of size 512 time-points getting used to coach a TimesFM model with enter patch size 32 and output patch size 128. During coaching, the model is concurrently skilled to make use of the primary 32 time-points to forecast the subsequent 128 time-points, the primary 64 time-points to forecast time-points 65 to 192, the primary 96 time-points to forecast time-points 97 to 224 and so forth. During inference, suppose the model is given a brand new time-series of size 256 and tasked with forecasting the subsequent 256 time-points into the long run. The model will first generate the long run predictions for time-points 257 to 384, then situation on the preliminary 256 size enter plus the generated output to generate time-points 385 to 512. On the opposite hand, if in our model the output patch size was equal to the enter patch size of 32 then for the identical activity we must undergo eight era steps as a substitute of simply the 2 above. This will increase the possibilities of extra errors accumulating and due to this fact, in apply, we see {that a} longer output patch size yields higher efficiency for long-horizon forecasting

    TimesFM structure.

    Pretraining information

    Just like LLMs get higher with extra tokens, TimesFM requires a big quantity of reliable time sequence information to study and enhance. We have spent an excellent period of time creating and assessing our coaching datasets, and the next is what we’ve got discovered works finest:

    Synthetic information helps with the fundamentals. Meaningful artificial time-series information will be generated utilizing statistical fashions or bodily simulations. These primary temporal patterns can educate the model the grammar of time sequence forecasting.

    Real-world information provides real-world taste. We comb by means of obtainable public time sequence datasets, and selectively put collectively a big corpus of 100 billion time-points. Among these datasets there are Google Trends and Wikipedia Pageviews, which observe what individuals are serious about, and that properly mirrors traits and patterns in lots of different real-world time sequence. This helps TimesFM perceive the larger image and generalize higher when supplied with domain-specific contexts not seen throughout coaching.

    Zero-shot analysis outcomes

    We consider TimesFM zero-shot on information not seen throughout coaching utilizing fashionable time-series benchmarks. We observe that TimesFM performs higher than most statistical strategies like ARIMA, ETS and might match or outperform highly effective DL fashions like DeepAR, PatchTST which have been explicitly skilled on the goal time-series.

    We used the Monash Forecasting Archive to guage TimesFM’s out-of-the-box efficiency. This archive comprises tens of hundreds of time-series from varied domains like visitors, climate, and demand forecasting protecting frequencies starting from jiffy to yearly information. Following current literature, we examine the imply absolute error (MAE) appropriately scaled in order that it may be averaged throughout the datasets. We see that zero-shot (ZS) TimesFM is healthier than most supervised approaches, together with current deep studying fashions. We additionally evaluate TimesFM to GPT-3.5 for forecasting utilizing a selected prompting method proposed by llmtime(ZS). We reveal that TimesFM performs higher than llmtime(ZS) regardless of being orders of magnitude smaller.

    Scaled MAE (the decrease the higher) of TimesFM(ZS) in opposition to different supervised and zero-shot approaches on Monash datasets.

    Most of the Monash datasets are brief or medium horizon, i.e., the prediction size is just not too lengthy. We additionally take a look at TimesFM on fashionable benchmarks for lengthy horizon forecasting in opposition to a current state-of-the-art baseline PatchTST (and different long-horizon forecasting baselines). In the subsequent determine, we plot the MAE on ETT datasets for the duty of predicting 96 and 192 time-points into the long run. The metric has been calculated on the final take a look at window of every dataset (as accomplished by the llmtime paper). We see that TimesFM not solely surpasses the efficiency of llmtime(ZS) but additionally matches that of the supervised PatchTST model explicitly skilled on the respective datasets.

    Last window MAE (the decrease the higher) of TimesFM(ZS) in opposition to llmtime(ZS) and long-horizon forecasting baselines on ETT datasets.

    Conclusion

    We practice a decoder-only foundation model for time-series forecasting utilizing a big pretraining corpus of 100B actual world time-points, the vast majority of which was search curiosity time-series information derived from Google Trends and pageviews from Wikipedia. We present that even a comparatively small 200M parameter pretrained model that makes use of our TimesFM structure shows spectacular zero-shot efficiency on a wide range of public benchmarks from completely different domains and granularities.

    Acknowledgements

    This work is the results of a collaboration between a number of people throughout Google Research and Google Cloud, together with (in alphabetical order): Abhimanyu Das, Weihao Kong, Andrew Leach, Mike Lawrence, Alex Martin, Rajat Sen, Yang Yang and Yichen Zhou.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    MacBooks, Chromebooks lead losers in laptop repairability analysis

    Enlarge / A stack of damaged Chromebook laptops at Cell Mechanic Inc. electronics restore store…

    Science

    China’s Chang’e 6 returns with first rocks from far side of the moon

    (*6*)The Chang’e 6 probe being retrieved in Siziwang Banner in Inner Mongolia, ChinaXinhua/Shutterstock China’s Chang’e…

    Gadgets

    17 Great Black Friday Soundbar Deals to Pump Up the Volume (2024)

    You might not understand it, however your TV is gloomy. It’s unhappy as a result…

    Crypto

    Bitcoin Hater Peter Schiff Scoffs At Recent Rally, Warns Impending Crash

    Renowned Bitcoin hater Peter Schiff has as soon as once more forged doubt on the…

    AI

    Making property assessments as simple as snapping a picture | Ztoog

    Property assessments sit on the middle of residence value determinations, insurance coverage claims, renovation tasks,…

    Our Picks
    Gadgets

    A Roku Terms of Service Update Locks Up Your TV Until You Agree

    Technology

    This Week in AI: Let us not forget the humble data annotator

    AI

    Meet GETMusic: A Unified Representation and Diffusion Framework that can Generate any Music Tracks with a Unified Representation and Diffusion Framework

    Categories
    • AI (1,560)
    • Crypto (1,827)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    The Future

    Amazon Prime May Offer Cheap or Free Cellular Plans – Review Geek

    The Future

    Valve officially announces Deadlock, a game ‘in early development’

    Crypto

    Bitcoin Short-Term Holders Go On 1.2 Million BTC Buying Spree, Is Retail Finally Here?

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.