The reason why exponential function is necessary in softmax function

·

2 min read

If you learn softmax function, you can find something like this.

softmax(z)_i = e^(z_i) / sum(e^(z_j))

The exponential function is used when only the use of z_i is enough to produce the probability. This is defined as:

p_i = z_i / sum(z)

As it turns out, this can be used as well.

However, the introducing the exponential function creates some advantages.

  1. Non-linearity: The exponential function introduces non-linearity to the output of the softmax function, which can be beneficial for learning complex decision boundaries. Without the exponential function, the output of the softmax would be a linear combination of the input scores, which could limit the ability of the model to capture complex interactions between the input features.

  2. Amplifying differences: The exponential function amplifies differences between the input scores, which can make it easier for the model to distinguish between classes with similar scores. For example, if the input scores are [2, 3, 4], the corresponding probabilities without the exponential function would be [0.1667, 0.3333, 0.5]. However, with the exponential function, the probabilities would be [0.0900, 0.2447, 0.6652], which more clearly reflects the difference in the input scores.

  3. Numerical stability: The exponential function can help with numerical stability when computing the softmax, especially when the input scores are very large or very small. In particular, using the exponential function can help to prevent overflow or underflow errors that can occur when working with very large or very small numbers.