A sampling function that accepts PurpleAir timeseries dataframes and reduces them by randomly selecting distinct rows of the users chosen size.
If both sampleSize
and sampleFraction
are unspecified,
sampleSize = 5000
will be used.
pat_sample(
pat = NULL,
sampleSize = NULL,
sampleFraction = NULL,
setSeed = NULL,
keepOutliers = FALSE
)
PurpleAir Timeseries pat object.
Non-negative integer giving the number of rows to choose.
Fraction of rows to choose.
Integer that sets random number generation. Can be used to reproduce sampling.
logical specifying a graphics focused sampling algorithm (see Details).
A subset of the given pat object.
When keepOutliers = FALSE
, random sampling is used to provide
a statistically relevant subsample of the data.
When keepOutliers = TRUE
, a customized sampling algorithm is used that
attempts to create subsets for use in plotting that create plots that are
visually identical to plots using all data. This is accomplished by
preserving outliers and only sampling data in regions where overplotting
is expected.
The process is as follows:
find outliers using seismicRoll::findOutliers()
create a subset consisting of only outliers
sample the remaining data
merge the outliers and sampled data