cussed, and you must tell your algorithms if a value is mi
Filter out any
unwanted outliers
Outliers may usually
create some problems with certain types of data models. For example, the linear
regression models may be less robust than outliers. Most commonly, if you have
a legitimate usadream.xyz reason for removing an outlier, this will help your model’s
performance. Outliers are usually innocent until proven guilty. You must not
remove an ou
newshut.org tlier just because it is a bigger number.Big numbers may be very
informative sometimes in some specific data models. We cannot stress it out
without enough good reasons for removing an outlier like a suspicious
measurement, which is unlikely to be real data.
Handling missing data
Handling missing data can be a t newspapersmagazine.com ricky affair when it comes to machine learning. In order to be clear about it at the first point itself, you need to understand that one cannot simply ignore the missing values in the given datasets. You should handle them in some ways, as most of the algorithms may not accept any missing values. Two of the most commonly recommended ways to deal with missing
1.
Dropping the
observation, which has some missing values.
2.
Imputing the missing
values based on the observations.
Dropping values is a
suboptimal option as when you drop some observations, you are actually dropping
some valuable information. The fact that some values are missing may be
informative by itself. Also, in the real world, you may often need to make some
predictions on the new data even if some of the features are not available.
Imputing a missing value
is also not an optimal option because the values were originally missing. But
you may have filled it, which always leads to the loss of some valuable
information no matter how sophisticated the imputationmethod is. Missing data
is informative by itself, as we discussed, and you must tell your algorithms if
a value is missing.
Comments
Post a Comment