I started the implementation of ML algorithms from scratch and compare them with the implementation of scikit-learn library.
The coefficients w and b go down to convergence, and to calculate this, we need gradient descent.
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678287549062/3ffe2fd1-8b05-4bea-ad79-743052017ff3.png" alt class="image--center mx-auto" />
<pre><code class="lang-python">def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function):
 b = b_in
 w = w_in

 for i in range(num_iters):
 dj_dw, dj_db = gradient_function(x, y, w, b)
 b = b - alpha * dj_db
 w = w - alpha * dj_dw
 return w, b
</code></pre>
x: x_train array
y: y_train array
w_in: initial w
b_in: initial b
alpha: Learning rate
num_iters: number of iterations to run gradient descent
cost_function: How much the formula is different from each data point
gradient_function: function to produce gradient dj_dw and dj_db
Here is the test code.
<pre><code class="lang-python">w_init = 0
b_init = 0
iterations = 10000
tmp_alpha = 1.0e-2
w_final, b_final = gradient_descent(x_train, y_train, w_init, b_init, tmp_alpha, iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")
</code></pre>
The output is this.
<pre><code class="lang-python">(w,b) found by gradient descent: (200.0000, 99.9999)
</code></pre>
It seems like this code is working properly.

I started the implementation of ML algorithms from scratch and compare them with the implementation of scikit-learn library.

The coefficients w and b go down to convergence, and to calculate this, we need gradient descent.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1678287549062/3ffe2fd1-8b05-4bea-ad79-743052017ff3.png align="center")

```python
def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function):
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        dj_dw, dj_db = gradient_function(x, y, w, b)
        b = b - alpha * dj_db
        w = w - alpha * dj_dw
    return w, b
```

x: x\_train array

y: y\_train array

w\_in: initial w

b\_in: initial b

alpha: Learning rate

num\_iters: number of iterations to run gradient descent

cost\_function: How much the formula is different from each data point

gradient\_function: function to produce gradient dj\_dw and dj\_db

Here is the test code.

```python
w_init = 0
b_init = 0
iterations = 10000
tmp_alpha = 1.0e-2
w_final, b_final = gradient_descent(x_train, y_train, w_init, b_init, tmp_alpha, iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")
```

The output is this.

```python
(w,b) found by gradient descent: (200.0000, 99.9999)
```

It seems like this code is working properly.

Gradient descent algorithm