Outlier detection using a Median Average Deviation "Hampel" filter. This function applies a rolling Hampel filter to find those points that are very far out in the tails of the distribution of values within the window.
The thresholdMin
level is similar to a sigma value for normally
distributed data. The default threshold setting thresholdMin = 8
identifies points that are extremely unlikely to be part of a normal
distribution and therefore very likely to be an outlier. By choosing a
relatively large value for `thresholdMin`` we make it less likely that we
will generate false positives.
The default setting of the window size windowSize = 15
means that 15 samples
from a single channel are used to determine the distribution of values for
which a median is calculated. Each PurpleAir channel makes a measurement
approximately every 120 seconds so the temporal window is 15 * 120 sec or
approximately 30 minutes. This seems like a reasonable period of time over
which to evaluate PM2.5 measurements.
Specifying replace = TRUE
allows you to perform smoothing by
replacing outliers with the window median value. Using this technique, you
can create an highly smoothed, artificial dataset by setting
thresholdMin = 1
or lower (but always above zero).
pat_outliers(
pat = NULL,
windowSize = 15,
thresholdMin = 8,
replace = FALSE,
showPlot = TRUE,
data_shape = 18,
data_size = 1,
data_color = "black",
data_alpha = 0.5,
outlier_shape = 8,
outlier_size = 1,
outlier_color = "red",
outlier_alpha = 1
)
PurpleAir Timeseries pat object.
Integer window size for outlier detection.
Threshold value for outlier detection.
Logical specifying whether replace outliers with the window median value.
Logical specifying whether to generate outlier detection plots.
Symbol to use for data points.
Size of data points.
Color of data points.
Opacity of data points.
Symbol to use for outlier points.
Size of outlier points.
Color of outlier points.
Opacity of outlier points.
A pat object with outliers replaced by median values.
Additional documentation on the algorithm is available in
seismicRoll::findOutliers()
.