Member-only story
Don’t Throw Away Your Outliers!
First, look for the causes of errors in your dataset.
After detecting outliers or anomalies, you need to decide how to handle them. This post explains techniques in taking care of outliers.
The first step, investigation
Investigate your outliers. Why did they occur? Are they truly errors? Will they never happen in real life? They were in the data, so it’s important to discover where the errors came from. Outliers may occur due to experimental or human error. Something could go wrong when loading or processing the data. After discovering the cause, you can decide what to do about it.
Is the problem solved? If you decide the outliers shouldn’t be in the data and you remove them, make sure you can define why. Or even better: document it. If you discover it’s possible new data can have the same values as the outliers, you should take care of them using other techniques, like the ones described below.
Taking care of outliers
If the outliers are just irregular compared to the other data points, but can happen, take care of them! You can try several techniques to improve the results of a machine learning model without removing the outliers.