Photo by AbsolutVision on Unsplash

Member-only story

Don’t Throw Away Your Outliers!

First, look for the causes of errors in your dataset.

Hennie de Harder
TDS Archive
Published in
4 min readAug 30, 2022

--

After detecting outliers or anomalies, you need to decide how to handle them. This post explains techniques in taking care of outliers.

The first step, investigation

Investigate your outliers. Why did they occur? Are they truly errors? Will they never happen in real life? They were in the data, so it’s important to discover where the errors came from. Outliers may occur due to experimental or human error. Something could go wrong when loading or processing the data. After discovering the cause, you can decide what to do about it.

Is the problem solved? If you decide the outliers shouldn’t be in the data and you remove them, make sure you can define why. Or even better: document it. If you discover it’s possible new data can have the same values as the outliers, you should take care of them using other techniques, like the ones described below.

Taking care of outliers

If the outliers are just irregular compared to the other data points, but can happen, take care of them! You can try several techniques to improve the results of a machine learning model without removing the outliers.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Hennie de Harder
Hennie de Harder

Written by Hennie de Harder

📈 Data Scientist & ML Engineer 💡 Simplifying complex topics ✨ Sharing fun side projects 💻 Working at IKEA and BigData Republic 🐈 Love math, cats, & running

Responses (2)

Write a response