The rectified linear unit (ReLU) activation function was proposed by Nair and Hinton 2010, and ever since, has been the most widely used activation function for deep learning applications with state-of-the-art results to date [57].
Who introduced ReLU activation?
Fukushima published the original Cognitron paper in 1975. That was the first instance of ReLU. It is defined in equation 2 here: Fukushima, K.
Why was ReLU introduced?
Currently, ReLU is used as the default activation in convolutional neural and Perceptron multilayer networks development. The ReLU activation function solves this issue permitting models to perform better and learn faster. There are no right or wrong ways of learning AI and ML technologies – the more, the better!
Why is ReLU famous?
ReLUs are popular because it is simple and fast. On the other hand, if the only problem you're finding with ReLU is that the optimization is slow, training the network longer is a reasonable solution. However, it's more common for state-of-the-art papers to use more complex activations.
Why ReLU is called ReLU?
ReLU has become the darling activation function of the neural network world. Short for Rectified Linear Unit, it is a piecewise linear function that is defined to be 0 for all negative values of x and equal to a × x otherwise, where a is a learnable parameter.