Reduce the number of records (timesteps) in the data dataframe of the incoming mts through random sampling.

mts_sample(
  mts = NULL,
  sampleSize = 5000,
  seed = NULL,
  keepOutliers = FALSE,
  width = 5,
  thresholdMin = 3
)

Arguments

mts

mts object.

sampleSize

Non-negative integer giving the number of rows to choose.

seed

Integer passed to set.seed for reproducible sampling.

keepOutliers

Logical specifying a graphics focused sampling algorithm that retains outliers (see Details).

width

Integer width of the rolling window used for outlier detection.

thresholdMin

Numeric threshold for outlier detection.

Value

A subset of the given mts object.

An mts time series object with fewer timesteps. (A list with meta and data dataframes.)

Details

When keepOutliers = FALSE, random sampling is used to provide a statistically relevant subsample of the data.

Outlier Detection

When keepOutliers = TRUE, a customized sampling algorithm is used that attempts to create subsets for use in plotting that create plots that are visually identical to plots using all data. This is accomplished by preserving outliers and only sampling data in regions where overplotting is expected.

The process is as follows:

  1. find outliers using MazamaRollUtils::findOutliers()

  2. create a subset consisting of only outliers

  3. sample the remaining data

  4. merge the outliers and sampled data

This algorithm works best when the mts object has only one or two timeseries.

The width and thresholdMin parameters determine the number of outliers detected. For hourly data, a width of 5 and a thresholdMin of 3 or 4 seem to find many visually obvious outliers.

Users attempting to optimize plotting speed for lengthy time series are encouraged to experiment with these two parameters along with sampleSize and review the results visually.

See MazamaRollUtils::findOutliers().