Bayesian Layers

We follow the paper "Weight Uncertainty in Neural Networks" to implement Bayesian Layers. The paper proposes replacing point estimates of weights in the weight matrix with trainable distributions. So instead of optimizing for weights directly, we optimize the parameters of the distributions from which we sample weights at every forward pass. The predictive uncertainty is given by approximating the integral over the weight distribution using Monte Carlo sampling.

The first component of creating Bayesian layers is trainable distributions, we use DistributionsAD to backprop through distributions using Zygote, the flux autodiff framework. A trainable distribution should be a subtype of the type AbstractTrainableDist

DeepUncertainty.TrainableMvNormalType
TrainableMvNormal(shape;
                init=glorot_normal, 
                device=cpu) <: AbstractTrainableDist
TrainableMvNormal(mean, stddev, sample, shape)

A Multivariate Normal distribution with trainable mean and stddev.

Fields

  • mean: Trainable mean vector of the distribution
  • stddev: Trainable standard deviation vector of the distibution
  • sample: The latest sample from the distribution, used in calculating loglikelhood loss
  • shape::Tuple: The shape of the sample to be returned

Arguments

  • shape::Tuple: The shape of the sample to returned from the distribution
  • init: glorot_normal; to initialize the mean and stddev trainable params
  • device: cpu; the device to move the sample to, used for convinience while using both GPU and CPU
source