Bayesian Layers
We follow the paper "Weight Uncertainty in Neural Networks" to implement Bayesian Layers. The paper proposes replacing point estimates of weights in the weight matrix with trainable distributions. So instead of optimizing for weights directly, we optimize the parameters of the distributions from which we sample weights at every forward pass. The predictive uncertainty is given by approximating the integral over the weight distribution using Monte Carlo sampling.
The first component of creating Bayesian layers is trainable distributions, we use DistributionsAD to backprop through distributions using Zygote, the flux autodiff framework. A trainable distribution should be a subtype of the type AbstractTrainableDist
DeepUncertainty.TrainableMvNormal
— TypeTrainableMvNormal(shape;
init=glorot_normal,
device=cpu) <: AbstractTrainableDist
TrainableMvNormal(mean, stddev, sample, shape)
A Multivariate Normal distribution with trainable mean and stddev.
Fields
mean
: Trainable mean vector of the distributionstddev
: Trainable standard deviation vector of the distibutionsample
: The latest sample from the distribution, used in calculating loglikelhood lossshape::Tuple
: The shape of the sample to be returned
Arguments
shape::Tuple
: The shape of the sample to returned from the distributioninit
: glorot_normal; to initialize the mean and stddev trainable paramsdevice
: cpu; the device to move the sample to, used for convinience while using both GPU and CPU