2020-08-15

参数初始化——Xavier

来自https://blog.csdn.net/weixin_35479108/article/details/90694800

优秀的初始化应该使得各层的激活值和状态梯度的方差在传播过程中的方差保持一致：

$$
\begin{aligned}
\forall(i, j), \operatorname{Var}\left(h^{i}\right) &=\operatorname{Var}\left(h^{j}\right) \
\forall(i, j), \operatorname{Var}\left(\frac{\partial \cos t}{\partial z^{i}}\right) &=\operatorname{Var}\left(\frac{\partial \cos t}{\partial z^{j}}\right)
\end{aligned}
$$

但是里面有一些假设，比如特征的方差一样，激活函数对称，0处导数为1，这些假设并不一定会满足

在pytorch中，实现方法为：

def xavier_uniform_(tensor, gain=1.):
    # type: (Tensor, float) -> Tensor
    r"""Fills the input `Tensor` with values according to the method
    described in `Understanding the difficulty of training deep feedforward
    neural networks` - Glorot, X. & Bengio, Y. (2010), using a uniform
    distribution. The resulting tensor will have values sampled from
    :math:`\mathcal{U}(-a, a)` where

    .. math::
        a = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}}

    Also known as Glorot initialization.

    Args:
        tensor: an n-dimensional `torch.Tensor`
        gain: an optional scaling factor

    Examples:
        >>> w = torch.empty(3, 5)
        >>> nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))
    """
    fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)
    std = gain * math.sqrt(2.0 / float(fan_in + fan_out))
    a = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation

    return _no_grad_uniform_(tensor, -a, a)

所以xavier_uniform_就是在$-gain \times \sqrt{3} \times \sqrt{\frac{2}{fan_{in}+fan_{out}}}, gain \times \sqrt{3} \times \sqrt{\frac{2}{fan_{in}+fan_{out}}}$范围内取值。