A sampling function that accepts PurpleAir timeseries dataframes and reduces them by randomly selecting distinct rows of the users chosen size.

If both sampleSize and sampleFraction are unspecified, sampleSize = 5000 will be used.

pat_sample(
  pat = NULL,
  sampleSize = NULL,
  sampleFraction = NULL,
  setSeed = NULL,
  keepOutliers = FALSE
)

Arguments

pat

PurpleAir Timeseries pat object.

sampleSize

Non-negative integer giving the number of rows to choose.

sampleFraction

Fraction of rows to choose.

setSeed

Integer that sets random number generation. Can be used to reproduce sampling.

keepOutliers

logical specifying a graphics focused sampling algorithm (see Details).

Value

A subset of the given pat object.

Details

When keepOutliers = FALSE, random sampling is used to provide a statistically relevant subsample of the data.

When keepOutliers = TRUE, a customized sampling algorithm is used that attempts to create subsets for use in plotting that create plots that are visually identical to plots using all data. This is accomplished by preserving outliers and only sampling data in regions where overplotting is expected.

The process is as follows:

  1. find outliers using seismicRoll::findOutliers()

  2. create a subset consisting of only outliers

  3. sample the remaining data

  4. merge the outliers and sampled data

Examples

library(AirSensor)

example_pat %>%
  pat_extractData() %>%
  dim()
#> [1] 2424   21

example_pat %>%
  pat_sample(sampleSize = 1000, setSeed = 1) %>%
  pat_extractData() %>%
  dim()
#> [1] 912  19