Reduce the number of records (timesteps) in the data
dataframe of the incoming mts
through random sampling.
mts_sample(
mts = NULL,
sampleSize = 5000,
seed = NULL,
keepOutliers = FALSE,
width = 5,
thresholdMin = 3
)
mts object.
Non-negative integer giving the number of rows to choose.
Integer passed to set.seed
for reproducible sampling.
Logical specifying a graphics focused sampling algorithm that retains outliers (see Details).
Integer width of the rolling window used for outlier detection.
Numeric threshold for outlier detection.
A subset of the given mts object.
An mts time series object with fewer timesteps.
(A list with meta
and data
dataframes.)
When keepOutliers = FALSE
, random sampling is used to provide
a statistically relevant subsample of the data.
When keepOutliers = TRUE
, a customized sampling algorithm is used that
attempts to create subsets for use in plotting that create plots that are
visually identical to plots using all data. This is accomplished by
preserving outliers and only sampling data in regions where overplotting
is expected.
The process is as follows:
find outliers using MazamaRollUtils::findOutliers()
create a subset consisting of only outliers
sample the remaining data
merge the outliers and sampled data
This algorithm works best when the mts object has only one or two timeseries.
The width
and thresholdMin
parameters determine the number of
outliers detected. For hourly data, a width
of 5 and a thresholdMin
of 3 or 4 seem to find many visually obvious outliers.
Users attempting to optimize plotting speed for lengthy time series are
encouraged to experiment with these two parameters along with
sampleSize
and review the results visually.