Variational Inference layers
DeepUncertainty.VariationalDense
— TypeVariationalDense(in, out, σ=identity;
weight_init=TrainableDistribution,
bias_init=TrainableDistribution,
bias=true)
VariationalDense(weight_sampler, bias_sampler, act)
Creates a variational dense layer. Computes variational bayesian approximation to the distribution over the parameters of the dense layer. The stochasticity is during the forward pass, instead of using point estimates for weights and biases, we sample from the distribution over weights and biases. Gradients of the distribution's learnable parameters are trained using the reparameterization trick.
Reference - https://arxiv.org/abs/1505.05424 We use DistributionsAD - https://github.com/TuringLang/DistributionsAD.jl to help us with backprop.
Fields
weight_sampler
: A trainable distribution from which weights are sampled in every forward passbias_sampler
: A trainable distribution from which biases are sampled in every forward passact
: Activation function, applies to logits after layer transformation
Arguents
in::Integer
: Input dimension sizeout::Integer
: Output dimension sizeσ
: Acivation function, defaults to identityinit
: Distribution parameters Initialization, defaults to glorot_normalweight_dist
: Weight distribution, defaults to a trainable multivariate normalbias_dist
: Bias distribution, defaults to trainable multivariate normal
DeepUncertainty.VariationalConv
— TypeVariationalConv(filter, in => out, σ = identity;
stride = 1, pad = 0, dilation = 1,
groups = 1,
weight_dist = TrainableMvNormal,
bias_dist = TrainableMvNormal,
[bias, weight, init])
VariationalConvBE(σ, weight_sampler, bias_sampler,
stride, pad, dilation, groups)
Creates a variational conv layer. Computes variational bayesian approximation to the distribution over the parameters of the conv layer. The stochasticity is during the forward pass, instead of using point estimates for weights and biases, we sample from the distribution over weights and biases. Gradients of the distribution's learnable parameters are trained using the reparameterization trick.
Reference - https://arxiv.org/abs/1505.05424 We use DistributionsAD - https://github.com/TuringLang/DistributionsAD.jl to help us with backprop.
Fields
σ
: Activation function, applies to logits after layer transformationweight_sampler
: A trainable distribution from which weights are sampled in every forward passbias_sampler
: A trainable distribution from which biases are sampled in every forward passstride
: Convolution stridepad
dilation
groups
Arguments
filter::NTuple{N,Integer}
: Kernel dimensions, eg, (5, 5)ch::Pair{<:Integer,<:Integer}
: Input channels => output channelsσ::F=identity
: Activation of the dense layer, defaults to identityweight_dist=TrainableMvNormal
: Initialization function for weights.bias_dist=TrainableMvNormal
: Initialization function for biases.