github: https://github.com/ICEORY/softmax_loss_gradient.git
Reproduction of Softmax Loss with Cross Entropy
softmax function
the softmax function is defined by
$$
y_i = \frac{e^{x_i}}{\sum e^{x_k}}, for~i=1,…, C
$$
where $x$ is the input with $C$ channels, $y$ is the respected output.
the gradient of softmax $\frac{\partial y_i}{\partial x_j}$ is computed by: