Baysian BatchEnsemble layers
DeepUncertainty.VariationalDenseBE
— TypeVariationalDenseBE(in, out, rank,
ensemble_size,
σ=identity;
bias=true,
init=glorot_normal,
alpha_init=glorot_normal,
gamma_init=glorot_normal)
VariationalDenseBE(layer, alpha_sampler, gamma_sampler,
ensemble_bias, ensemble_act, rank)
Creates a bayesian dense BatchEnsemble layer. Batch ensemble is a memory efficient alternative for deep ensembles. In deep ensembles, if the ensemble size is N, N different models are trained, making the time and memory complexity O(N * complexity of one network). BatchEnsemble generates weight matrices for each member in the ensemble using a couple of rank 1 vectors R (alpha), S (gamma), RS' and multiplying the result with weight matrix W element wise. We also call R and S as fast weights. In the bayesian version of batch ensemble, instead of having point estimates of the fast weights, we sample them form a trainable parameterized distribution.
Reference - https://arxiv.org/abs/2005.07186
During both training and testing, we repeat the samples along the batch dimension N times, where N is the ensemble_size. For example, if each mini batch has 10 samples and our ensemble size is 4, then the actual input to the layer has 40 samples. The output of the layer has 40 samples as well, and each 10 samples can be considered as the output of an esnemble member.
Fields
layer
: The dense layer which transforms the pertubed input to outputalpha_sampler
: Sampler for the first Fast weight of size (indim, ensemblesize)gamma_sampler
: Sampler for the second Fast weight of size (outdim, ensemblesize)ensemble_bias
: Bias added to the ensemble output, separate from dense layer biasensemble_act
: The activation function to be applied on ensemble outputrank
: Rank of the fast weights (rank > 1 doesn't work on GPU for now)
Arguments
in::Integer
: Input dimension of featuresout::Integer
: Output dimension of featuresrank::Integer
: Rank of the fast weightsensemble_size::Integer
: Number of models in the ensembleσ::F=identity
: Activation of the dense layer, defaults to identityinit=glorot_normal
: Initialization function, defaults to glorot_normalbias::Bool=true
: Toggle the usage of bias in the dense layerensemble_bias::Bool=true
: Toggle the usage of ensemble biasensemble_act::F=identity
: Activation function for enseble outputsalpha_init=TrainableMvNormal
: Initialization function for the alpha fast weight, defaults to TrainableMvNormalgamma_init=TrainableMvNormal
: Initialization function for the gamma fast weight, defaults to TrainableMvNormal
DeepUncertainty.VariationalConvBE
— TypeVariationalConvBE(filter, in => out, rank,
ensemble_size, σ = identity;
stride = 1, pad = 0, dilation = 1,
groups = 1, [bias, weight, init])
VariationalConvBE(layer, alpha_sampler, gamma_sampler,
ensemble_bias, ensemble_act, rank)
Creates a bayesian conv BatchEnsemble layer. Batch ensemble is a memory efficient alternative for deep ensembles. In deep ensembles, if the ensemble size is N, N different models are trained, making the time and memory complexity O(N * complexity of one network). BatchEnsemble generates weight matrices for each member in the ensemble using a couple of rank 1 vectors R (alpha), S (gamma), RS' and multiplying the result with weight matrix W element wise. We also call R and S as fast weights. In the bayesian version of batch ensemble, instead of having point estimates of the fast weights, we sample them form a trainable parameterized distribution.
Reference - https://arxiv.org/abs/2005.07186
During both training and testing, we repeat the samples along the batch dimension N times, where N is the ensemble_size. For example, if each mini batch has 10 samples and our ensemble size is 4, then the actual input to the layer has 40 samples. The output of the layer has 40 samples as well, and each 10 samples can be considered as the output of an esnemble member.
Fields
layer
: The conv layer which transforms the pertubed input to outputalpha_sampler
: Sampler for the first Fast weight of size (indim, ensemblesize)gamma_sampler
: Sampler for the second Fast weight of size (outdim, ensemblesize)ensemble_bias
: Bias added to the ensemble output, separate from conv layer biasensemble_act
: The activation function to be applied on ensemble outputrank
: Rank of the fast weights (rank > 1 doesn't work on GPU for now)
Arguments
filter::NTuple{N,Integer}
: Kernel dimensions, eg, (5, 5)ch::Pair{<:Integer,<:Integer}
: Input channels => output channelsrank::Integer
: Rank of the fast weightsensemble_size::Integer
: Number of models in the ensembleσ::F=identity
: Activation of the dense layer, defaults to identityinit=glorot_normal
: Initialization function, defaults to glorot_normalbias::Bool=true
: Toggle the usage of bias in the dense layerensemble_bias::Bool=true
: Toggle the usage of ensemble biasensemble_act::F=identity
: Activation function for enseble outputsalpha_init=TrainableMvNormal
: Initialization function for the alpha fast weight, defaults to TrainableMvNormalgamma_init=TrainableMvNormal
: Initialization function for the gamma fast weight, defaults to TrainableMvNormal