backward pass to obtain gradients: in-place ops
backward pass to obtain gradients: in-place ops
linear input to the activation node
the gradient of the loss function with respect to the output, let the out be the output of the activation node such that h(in) = out, then d(Loss)/d(in) = d(Loss)/d(out) * d(out)/d(in) = epsilon * h'(in)
RATanh Rational Approximation of Tanh: h(x) = 1.7159 * tanh(2x/3)
remember to modify Activation.java
,Compact Convolutional Neural Network Cascade for Face Detection