Bias Variance Tradeoff and Under Fit Over Fit in Machine Learning IN SHORT

5 min readJan 16, 2023

Machine Learning Error = Bias error + Variance error + Irreducible error

Bias error is how strictly the model generalises to some designated set of functions or assumptions.

Variance error is how much the estimated function will change when the algorithm is trained with differing datasets.

Irreducible error is an error that is neither bias nor variance error and is hence relatively random.

So what do we mean by Bias Error, Variance Error and their tradeoff?

Prediction errors are mainly categorised into 3 categories. They are Bias Error, Variance Error and Irreducible Error.

Irreducible Error

This error cannot be reduced regardless of which algorithm is used. It is the error caused by unknown variables that influence both input and output variables.

For example, we have data on the Fixed Deposit sales and features included in the shop. These can explain much of the shop sales. But the features of the shop won’t explain everything.

Maybe the location of the shop affects the sale.
Maybe there was a hike in FD rates.
Maybe the government tweeted something about FD.

These factors that we don’t observe could be interpreted as inherent variability. From this perspective, measurement error is a process we do not observe. If we knew what caused the error and could predict the error and put it in our model, there would be nothing inherently variable about it.

Bias Error

Bias involves an assumption of the ML model that the target function to learn.

So Bias can be of Two Types: Low Bias and High Bias

High bias is by making a lot of assumptions about the target function. For example, linear regression is a high-bias model because it has many assumptions to fulfil before we can implement them.

Models with high bias are linear regression and logistic regression.

Low bias is by assuming low or no assumptions about the target function. For example, a decision tree is a low-bias model because it has no assumptions to fulfil before we can implement them.

Models with low bias are decision trees and random forests.

While a model's bias tells us something about how rigid it is towards fitting a particular function, the variance of our model is related to our datasets.

Variance error

Suppose we are training the same machine learning model with two different datasets. Model-wise, everything is the same the algorithm is the same, the hyperparameter configuration is the same, and so on. The only thing that differs is the dataset (From Train Dataset to the Test Dataset). How the model performance differs or varies is the variance error?

So Variance can be also of Two Types: Low Variance and High Variance

If our model is a high variance model, it is really sensitive to changes in the dataset and hence could show highly different performance even when the changes are small. It suggests a larger change from the training set to the test set. So algorithms which have fewer assumptions will have high flexibility and high variance because they have less to satisfy in assumptions.

Models with high variance are decision trees and random forests.

If it’s low-variance, it’s not so sensitive. It suggests a small change from the training set to the test set. So algorithms which have more assumptions will have low variance in data because they have to satisfy assumptions.

Models with low variance are linear regression and logistic regression.

In short, Linear models have High Bias and Low Variance. For example, a regression-based model. Non-Linear models have Low Bias and High Variance. For example Random Forest.

The goal of any Machine Learning algorithm is to have Low Bias and Low Variance. But it is difficult to have because

If we increase Bias, Variability will go down
If we decrease the Bias, Variability will increase

that is why we have a tradeoff of Bias and Variance.

Bias Variance Tradeoff

Methods to check bias-variance tradeoffs are cross-validation, use of regularisation and check classification or regression metrics.

The perfect example of a Bias and Variance use case. Pic Source: Google

If our bias is low and your variance is high, your darts arrows will be near the centre but will show some scattering (ML: capable of fitting many patterns, but with some sensitivity to data changes).
If our bias is high and your variance is low, the darts arrows will be near each other, but not near the centre (ML: not so sensitive to data changes, but too biased, and hence predictions that are collectively off).
If your bias is high and your variance is high, the darts arrows will both be scattered and away from the centre (ML: too sensitive and not capable of generating precise predictions).
If your bias is low and your variance is low, your model is spot on without scattering. This is what you want.

Overall Bias Variance Tradeoff. Pic Source: Google

Under-Fitting and Over-Fitting in Machine Learning Models

Our model is underfitting the training data when the model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the target values (often called Y).

Our model is overfitting your training data when we see that the model performs well on the training data but does not perform well on the evaluation data. This is because the model is memorizing the data it has seen and is unable to generalize to unseen examples.

If our model is underfitting the training data, it leads to poor performance on the training data could be because the model is too simple (the input features are not expressive enough) to describe the target well. Performance can be improved by increasing model flexibility.

Methods to address underfitting models are adding more features or increasing the amount of training data examples and decreasing the amount of regularisation (as it tends to make variable zero).

If our model is overfitting the training data, it makes sense to take actions that reduce model flexibility.

Methods to address overfitting models are to reduce model flexibility, try using Feature selection or increase the amount of regularisation.