What to try next to refine regression models

·

2 min read

In regression models, it's crucial to find a balance between bias and variance to achieve optimal performance. Addressing high bias or high variance requires different strategies.

Here is the cost function I saw a billion times.

To mitigate high bias:

  1. Increase the number of features: Adding more relevant features can help the model capture the underlying relationships in the data more effectively.

  2. Add polynomial features: Introducing polynomial features can increase the model's complexity and enable it to fit the training data more closely.

  3. Decrease the regularization parameter (lambda): Reducing lambda allows the model to be more flexible, potentially reducing bias.

These strategies focus on reducing the training set error.

To address high variance:

  1. Increase the number of training examples: Providing more training data can help the model generalize better to unseen data.

  2. Reduce the number of features: Removing less relevant or redundant features can simplify the model and prevent overfitting.

  3. Decrease polynomial features: Reducing the complexity of the model can help prevent overfitting and improve generalization.

  4. Increase the regularization parameter (lambda): A higher lambda value adds constraints to the model, making it less prone to overfitting.

These approaches aim to minimize cross-validation set error.

It's important to note that some strategies for addressing bias and variance are contradictory, highlighting the trade-off between the two. However, increasing the number of training examples is generally a helpful approach in most situations.