Variational Inference layers

DeepUncertainty.VariationalDenseType
VariationalDense(in, out, σ=identity;
                weight_init=TrainableDistribution, 
                bias_init=TrainableDistribution, 
                bias=true)
VariationalDense(weight_sampler, bias_sampler, act)

Creates a variational dense layer. Computes variational bayesian approximation to the distribution over the parameters of the dense layer. The stochasticity is during the forward pass, instead of using point estimates for weights and biases, we sample from the distribution over weights and biases. Gradients of the distribution's learnable parameters are trained using the reparameterization trick.

Reference - https://arxiv.org/abs/1505.05424 We use DistributionsAD - https://github.com/TuringLang/DistributionsAD.jl to help us with backprop.

Fields

  • weight_sampler: A trainable distribution from which weights are sampled in every forward pass
  • bias_sampler: A trainable distribution from which biases are sampled in every forward pass
  • act: Activation function, applies to logits after layer transformation

Arguents

  • in::Integer: Input dimension size
  • out::Integer: Output dimension size
  • σ: Acivation function, defaults to identity
  • init: Distribution parameters Initialization, defaults to glorot_normal
  • weight_dist: Weight distribution, defaults to a trainable multivariate normal
  • bias_dist: Bias distribution, defaults to trainable multivariate normal
source
DeepUncertainty.VariationalConvType
VariationalConv(filter, in => out, σ = identity;
                stride = 1, pad = 0, dilation = 1, 
                groups = 1, 
                weight_dist = TrainableMvNormal, 
                bias_dist = TrainableMvNormal,
                [bias, weight, init])
VariationalConvBE(σ, weight_sampler, bias_sampler,
                stride, pad, dilation, groups)

Creates a variational conv layer. Computes variational bayesian approximation to the distribution over the parameters of the conv layer. The stochasticity is during the forward pass, instead of using point estimates for weights and biases, we sample from the distribution over weights and biases. Gradients of the distribution's learnable parameters are trained using the reparameterization trick.

Reference - https://arxiv.org/abs/1505.05424 We use DistributionsAD - https://github.com/TuringLang/DistributionsAD.jl to help us with backprop.

Fields

  • σ: Activation function, applies to logits after layer transformation
  • weight_sampler: A trainable distribution from which weights are sampled in every forward pass
  • bias_sampler: A trainable distribution from which biases are sampled in every forward pass
  • stride: Convolution stride
  • pad
  • dilation
  • groups

Arguments

  • filter::NTuple{N,Integer}: Kernel dimensions, eg, (5, 5)
  • ch::Pair{<:Integer,<:Integer}: Input channels => output channels
  • σ::F=identity: Activation of the dense layer, defaults to identity
  • weight_dist=TrainableMvNormal: Initialization function for weights.
  • bias_dist=TrainableMvNormal: Initialization function for biases.
source