Optimising Multiple Linear Regression Algorithm with Numpy's Dot Method

·

1 min read

Implementing a multiple linear regression algorithm requires vectorizing features so that they can be multiplied with coefficient w vectors. When first starting to code, it's natural to use for-loops to achieve this. However, for-loops can be slow and inefficient, especially when dealing with large amounts of data.

Fortunately, numpy provides a handy method to overcome this issue: .dot(). The biggest advantage of using .dot() is that it allows for the parallel execution of calculations. This means that elements of a matrix can be multiplied by elements of another matrix using multiple cores, and all of the multiplied elements are summed up.

To illustrate this, consider the following example using a for-loop:

A = np.array([1, 2, 3])
B = np.array([4, 5, 6])
length = A.shape[0]
total = 0
for i in range(length):
    total = total + A[i]*B[i]
print(total)

Now, let's rewrite this using .dot():

A = np.array([1, 2, 3])
B = np.array([4, 5, 6])
np.dot(A, B)

As you can see, the .dot() code is much neater and runs faster. So if you're working with large datasets, it's definitely worth taking advantage of numpy's .dot() method to optimize your code.