The reason why ReLU is used for hidden layers over sigmoid function in artificial neural networks.
ReLU is the most common activation function used in the hidden layers of artificial neural networks.
Firstly, while there are many potential activation functions, some of them cannot be used for this purpose. An example of this is a linear function, which essentially reduces the neural network to a linear regression model.
On the other hand, the sigmoid function can be used, and it was actually one of the first activation functions used in artificial neural networks. However, it has two flat areas, which can slow down the gradient descent process.
Overall, there are several reasons why ReLU is preferred for hidden layers over the sigmoid function:
Non-linearity: ReLU is a non-linear activation function, which means that it can model complex relationships between inputs and outputs in a neural network.
Computational Efficiency: ReLU is computationally efficient and requires fewer resources compared to other activation functions like sigmoid and tanh.
Sparse Activation: ReLU produces sparse activation, which means that it can activate only a subset of neurons in a layer. Sparse activation can help to reduce overfitting in neural networks by introducing some form of regularization.
Gradient Vanishing: ReLU can mitigate the problem of gradient vanishing that occurs in deep neural networks when using activation functions like sigmoid. Gradient vanishing is a problem where the gradients become too small during backpropagation and cause the weights to converge too slowly or not at all.
Overall, ReLU is a simple and effective activation function that can improve the performance of neural networks by introducing non-linearity, computational efficiency, sparse activation, and mitigating gradient vanishing.