Probabilistic time series forecasting with compositional bayesian neural networks

Posted by Urs Köster, Software Engineer, Google Research

Time series issues are ubiquitous, from forecasting climate and site visitors patterns to understanding financial tendencies. Bayesian approaches begin with an assumption concerning the information’s patterns (prior chance), accumulating proof (e.g., new time series information), and repeatedly updating that assumption to kind a posterior chance distribution. Traditional Bayesian approaches like Gaussian processes (GPs) and Structural Time Series are extensively used for modeling time series information, e.g., the generally used Mauna Loa CO2 dataset. However, they usually depend on area specialists to painstakingly choose acceptable mannequin parts and could also be computationally costly. Alternatives equivalent to neural networks lack interpretability, making it obscure how they generate forecasts, and do not produce dependable confidence intervals.

To that finish, we introduce AutoBNN, a brand new open-source package deal written in JAX. AutoBNN automates the invention of interpretable time series forecasting fashions, supplies high-quality uncertainty estimates, and scales successfully to be used on giant datasets. We describe how AutoBNN combines the interpretability of conventional probabilistic approaches with the scalability and adaptability of neural networks.

AutoBNN

AutoBNN relies on a line of analysis that over the previous decade has yielded improved predictive accuracy by modeling time series utilizing GPs with discovered kernel buildings. The kernel perform of a GP encodes assumptions concerning the perform being modeled, such because the presence of tendencies, periodicity or noise. With discovered GP kernels, the kernel perform is outlined compositionally: it’s both a base kernel (equivalent to Linear, Quadratic, Periodic, Matérn or ExponentiatedQuadratic) or a composite that mixes two or extra kernel features utilizing operators equivalent to Addition, Multiplication, or ChangePoint. This compositional kernel construction serves two associated functions. First, it’s easy sufficient {that a} person who’s an skilled about their information, however not essentially about GPs, can assemble an inexpensive prior for his or her time series. Second, strategies like Sequential Monte Carlo can be utilized for discrete searches over small buildings and may output interpretable outcomes.

AutoBNN improves upon these concepts, changing the GP with Bayesian neural networks (BNNs) whereas retaining the compositional kernel construction. A BNN is a neural community with a chance distribution over weights somewhat than a set set of weights. This induces a distribution over outputs, capturing uncertainty within the predictions. BNNs carry the next benefits over GPs: First, coaching giant GPs is computationally costly, and conventional coaching algorithms scale because the dice of the variety of information factors within the time series. In distinction, for a set width, coaching a BNN will usually be roughly linear within the variety of information factors. Second, BNNs lend themselves higher to GPU and TPU {hardware} acceleration than GP coaching operations. Third, compositional BNNs will be simply mixed with conventional deep BNNs, which have the flexibility to do characteristic discovery. One may think about “hybrid” architectures, by which customers specify a top-level construction of Add(Linear, Periodic, Deep), and the deep BNN is left to be taught the contributions from probably high-dimensional covariate data.

How would possibly one translate a GP with compositional kernels right into a BNN then? A single layer neural community will usually converge to a GP because the variety of neurons (or “width”) goes to infinity. More just lately, researchers have found a correspondence within the different course — many common GP kernels (equivalent to Matern, ExponentiatedQuadratic, Polynomial or Periodic) will be obtained as infinite-width BNNs with appropriately chosen activation features and weight distributions. Furthermore, these BNNs stay near the corresponding GP even when the width could be very a lot lower than infinite. For instance, the figures under present the distinction within the covariance between pairs of observations, and regression outcomes of the true GPs and their corresponding width-10 neural community variations.

Comparison of Gram matrices between true GP kernels (high row) and their width 10 neural community approximations (backside row).

Comparison of regression outcomes between true GP kernels (high row) and their width 10 neural community approximations (backside row).

Finally, the interpretation is accomplished with BNN analogues of the Addition and Multiplication operators over GPs, and enter warping to provide periodic kernels. BNN addition is straightforwardly given by including the outputs of the element BNNs. BNN multiplication is achieved by multiplying the activations of the hidden layers of the BNNs after which making use of a shared dense layer. We are due to this fact restricted to solely multiplying BNNs with the identical hidden width.

Using AutoBNN

The AutoBNN package deal is accessible inside Tensorflow Probability. It is carried out in JAX and makes use of the flax.linen neural community library. It implements all the base kernels and operators mentioned thus far (Linear, Quadratic, Matern, ExponentiatedQuadratic, Periodic, Addition, Multiplication) plus one new kernel and three new operators:

a OneLayer kernel, a single hidden layer ReLU BNN,
a ChangePoint operator that enables easily switching between two kernels,
a LearnableChangePoint operator which is similar as ChangePoint besides place and slope are given prior distributions and will be learnt from the info, and
a WeightedSum operator.

WeightedSum combines two or extra BNNs with learnable mixing weights, the place the learnable weights observe a Dirichlet prior. By default, a flat Dirichlet distribution with focus 1.0 is used.

WeightedSums permit a “mushy” model of construction discovery, i.e., coaching a linear mixture of many attainable fashions without delay. In distinction to construction discovery with discrete buildings, equivalent to in AutoGP, this enables us to make use of normal gradient strategies to be taught buildings, somewhat than utilizing costly discrete optimization. Instead of evaluating potential combinatorial buildings in series, WeightedSum permits us to judge them in parallel.

To simply allow exploration, AutoBNN defines plenty of mannequin buildings that include both top-level or inner WeightedSums. The names of those fashions can be utilized as the primary parameter in any of the estimator constructors, and embrace issues like sum_of_stumps (the WeightedSum over all the bottom kernels) and sum_of_shallow (which provides all attainable combos of base kernels with all operators).

Illustration of the sum_of_stumps mannequin. The bars within the high row present the quantity by which every base kernel contributes, and the underside row reveals the perform represented by the bottom kernel. The ensuing weighted sum is proven on the suitable.

The determine under demonstrates the strategy of construction discovery on the N374 (a time series of yearly monetary information ranging from 1949) from the M3 dataset. The six base buildings have been ExponentiatedQuadratic (which is similar because the Radial Basis Function kernel, or RBF for brief), Matern, Linear, Quadratic, OneLayer and Periodic kernels. The determine reveals the MAP estimates of their weights over an ensemble of 32 particles. All of the excessive chance particles gave a big weight to the Periodic element, low weights to Linear, Quadratic and OneLayer, and a big weight to both RBF or Matern.

Parallel coordinates plot of the MAP estimates of the bottom kernel weights over 32 particles. The sum_of_stumps mannequin was educated on the N374 series from the M3 dataset (insert in blue). Darker traces correspond to particles with increased likelihoods.

By utilizing WeightedSums because the inputs to different operators, it’s attainable to specific wealthy combinatorial buildings, whereas preserving fashions compact and the variety of learnable weights small. As an instance, we embrace the sum_of_products mannequin (illustrated within the determine under) which first creates a pairwise product of two WeightedSums, after which a sum of the 2 merchandise. By setting among the weights to zero, we will create many various discrete buildings. The complete variety of attainable buildings on this mannequin is 2¹⁶, since there are 16 base kernels that may be turned on or off. All these buildings are explored implicitly by coaching simply this one mannequin.

Illustration of the “sum_of_products” mannequin. Each of the 4 WeightedSums have the identical construction because the “sum_of_stumps” mannequin.

We have discovered, nonetheless, that sure combos of kernels (e.g., the product of Periodic and both the Matern or ExponentiatedQuadratic) result in overfitting on many datasets. To stop this, we’ve outlined mannequin lessons like sum_of_safe_shallow that exclude such merchandise when performing construction discovery with WeightedSums.

For coaching, AutoBNN supplies AutoBnnMapEstimator and AutoBnnMCMCEstimator to carry out MAP and MCMC inference, respectively. Either estimator will be mixed with any of the six chance features, together with 4 primarily based on regular distributions with completely different noise traits for steady information and two primarily based on the damaging binomial distribution for rely information.

Result from working AutoBNN on the Mauna Loa CO2 dataset in our instance colab. The mannequin captures the pattern and seasonal element within the information. Extrapolating into the longer term, the imply prediction barely underestimates the precise pattern, whereas the 95% confidence interval progressively will increase.

To match a mannequin like within the determine above, all it takes is the next 10 traces of code, utilizing the scikit-learn–impressed estimator interface:

import autobnn as ab

mannequin = ab.operators.Add(
    bnns=(ab.kernels.PeriodicBNN(width=50),
          ab.kernels.LinearBNN(width=50),
          ab.kernels.MaternBNN(width=50)))

estimator = ab.estimators.AutoBnnMapEstimator(
    mannequin, 'normal_likelihood_logistic_noise', jax.random.PRNGKey(42),
    durations=[12])

estimator.match(my_training_data_xs, my_training_data_ys)
low, mid, excessive = estimator.predict_quantiles(my_training_data_xs)

Conclusion

AutoBNN supplies a robust and versatile framework for constructing refined time series prediction fashions. By combining the strengths of BNNs and GPs with compositional kernels, AutoBNN opens a world of potentialities for understanding and forecasting complicated information. We invite the neighborhood to strive the colab, and leverage this library to innovate and remedy real-world challenges.

Acknowledgements

AutoBNN was written by Colin Carroll, Thomas Colthurst, Urs Köster and Srinivas Vasudevan. We wish to thank Kevin Murphy, Brian Patton and Feras Saad for his or her recommendation and suggestions.

What's Hot

Important Pages:

Probabilistic time series forecasting with compositional bayesian neural networks – Google Research Blog

AutoBNN

Using AutoBNN

Conclusion

Acknowledgements

Related Posts