Demystifying Weight Matrices in Neural Networks
Last week, I started learning about artificial neural networks, and the first thing I stumbled upon was why they use W as a notation for weight matrices.
However, the answer is quite simple as long as you have a basic understanding of linear algebra. For example, if the weight matrix W1 is of shape (400, 25), it means that the first hidden layer of the neural network has 25 units, and each unit in the first hidden layer has 400 weights. The reason why there are 400 weights in each unit is because the number of weights should be the same as the number of inputs.
In the first hidden layer, all the inputs are multiplied by their corresponding weights. These weighted inputs, along with a bias term, can then be passed through an activation function such as the sigmoid function. In order to perform this operation efficiently, it is necessary to represent the weights and inputs as matrices or vectors.