BatchEnsemble layers

DeepUncertainty.DenseBEType
DenseBE(in, out, rank, 
        ensemble_size, 
        σ=identity; 
        bias=true,
        init=glorot_normal, 
        alpha_init=glorot_normal, 
        gamma_init=glorot_normal)
DenseBE(layer, alpha, gamma, ensemble_bias, ensemble_act, rank)

Creates a dense BatchEnsemble layer. Batch ensemble is a memory efficient alternative for deep ensembles. In deep ensembles, if the ensemble size is N, N different models are trained, making the time and memory complexity O(N * complexity of one network). BatchEnsemble generates weight matrices for each member in the ensemble using a couple of rank 1 vectors R (alpha), S (gamma), RS' and multiplying the result with weight matrix W element wise. We also call R and S as fast weights.

Reference - https://arxiv.org/abs/2002.06715

During both training and testing, we repeat the samples along the batch dimension N times, where N is the ensemble_size. For example, if each mini batch has 10 samples and our ensemble size is 4, then the actual input to the layer has 40 samples. The output of the layer has 40 samples as well, and each 10 samples can be considered as the output of an esnemble member.

Fields

  • layer: The dense layer which transforms the pertubed input to output
  • alpha: The first Fast weight of size (indim, ensemblesize)
  • gamma: The second Fast weight of size (outdim, ensemblesize)
  • ensemble_bias: Bias added to the ensemble output, separate from dense layer bias
  • ensemble_act: The activation function to be applied on ensemble output
  • rank: Rank of the fast weights (rank > 1 doesn't work on GPU for now)

Arguments

  • in::Integer: Input dimension of features
  • out::Integer: Output dimension of features
  • rank::Integer: Rank of the fast weights
  • ensemble_size::Integer: Number of models in the ensemble
  • σ::F=identity: Activation of the dense layer, defaults to identity
  • init=glorot_normal: Initialization function, defaults to glorot_normal
  • alpha_init=glorot_normal: Initialization function for the alpha fast weight, defaults to glorot_normal
  • gamma_init=glorot_normal: Initialization function for the gamma fast weight, defaults to glorot_normal
  • bias::Bool=true: Toggle the usage of bias in the dense layer
  • ensemble_bias::Bool=true: Toggle the usage of ensemble bias
  • ensemble_act::F=identity: Activation function for enseble outputs
source
DeepUncertainty.ConvBEType
ConvBE(filter, in => out, rank, 
        ensemble_size, σ = identity;
        stride = 1, pad = 0, dilation = 1, 
        groups = 1, [bias, weight, init])
ConvBE(layer, alpha, gamma, ensemble_bias, ensemble_act, rank)

Creates a conv BatchEnsemble layer. Batch ensemble is a memory efficient alternative for deep ensembles. In deep ensembles, if the ensemble size is N, N different models are trained, making the time and memory complexity O(N * complexity of one network). BatchEnsemble generates weight matrices for each member in the ensemble using a couple of rank 1 vectors R (alpha), S (gamma), RS' and multiplying the result with weight matrix W element wise. We also call R and S as fast weights.

Reference - https://arxiv.org/abs/2002.06715

During both training and testing, we repeat the samples along the batch dimension N times, where N is the ensemble_size. For example, if each mini batch has 10 samples and our ensemble size is 4, then the actual input to the layer has 40 samples. The output of the layer has 40 samples as well, and each 10 samples can be considered as the output of an esnemble member.

Fields

  • layer: The dense layer which transforms the pertubed input to output
  • alpha: The first Fast weight of size (indim, ensemblesize)
  • gamma: The second Fast weight of size (outdim, ensemblesize)
  • ensemble_bias: Bias added to the ensemble output, separate from dense layer bias
  • ensemble_act: The activation function to be applied on ensemble output
  • rank: Rank of the fast weights (rank > 1 doesn't work on GPU for now)

Arguments

  • filter::NTuple{N,Integer}: Kernel dimensions, eg, (5, 5)
  • ch::Pair{<:Integer,<:Integer}: Input channels => output channels
  • rank::Integer: Rank of the fast weights
  • ensemble_size::Integer: Number of models in the ensemble
  • σ::F=identity: Activation of the dense layer, defaults to identity
  • init=glorot_normal: Initialization function, defaults to glorot_normal
  • alpha_init=glorot_normal: Initialization function for the alpha fast weight, defaults to glorot_normal
  • gamma_init=glorot_normal: Initialization function for the gamma fast weight, defaults to glorot_normal
  • bias::Bool=true: Toggle the usage of bias in the dense layer
  • ensemble_bias::Bool=true: Toggle the usage of ensemble bias
  • ensemble_act::F=identity: Activation function for enseble outputs
source