Difference between Batch Normalization and Input Normalization

Batch Normalization:

Batch Normalization is a technique used to increase the stability of a neural network. It's introduced by Sergey Ioffe and Christian Szegedy in their paper 'Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift' in 2015.

Batch Normalization works by normalizing the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation. However, after this shift/scale operation, the layer may lose some expressive power. So, two parameters (scale and shift) are added that the model learns, which can restore the original data if it is needed.

Input Normalization:

Input Normalization is a preprocessing step that is applied before feeding data into a machine learning algorithm. It involves adjusting the features of your dataset so that they're on a similar scale, typically in the range [0,1] or [-1,1] or to have a mean of 0 and a standard deviation of 1.

The two most common forms of input normalization are:

Min-Max Normalization: It scales the data to a fixed range, typically 0 to 1.
Standardization (or Z-score normalization): It scales the data to have zero mean and unit variance.

Difference between Batch Normalization and Input Normalization:

Purpose: Input normalization is used to scale the input data to make it suitable for a machine learning algorithm, while batch normalization is used to normalize the activations of a layer in a neural network and speed up learning.
Where it's applied: Input normalization is applied to the input data before it's fed into the machine learning model, while batch normalization is applied to the outputs of a layer in a neural network and is part of the model itself.
Frequency of application: Input normalization is typically applied once, as a preprocessing step. Batch normalization is applied every time a batch of data passes through the network, at every training step.
Effect on model: Input normalization doesn't affect the model architecture, while batch normalization adds additional trainable parameters to the model (scale and shift parameters).