Introduction

This tutorial demonstrates how to create files containing ‘pas’ and ‘pat’ data for a particular community and how to save and access them in a local directory. Target audiences include grad students, researchers and any member of the public concerned about air quality and comfortable working with R and RStudio.

Tutorials in this series include:

Goal

Our goal in this tutorial is to create ‘pas’ and ‘pat’ data for a single month for the Methow Valley – a community in north-central Washington state. Clean Air Methow operates as a project of the Methow Valley Citizens Council and began deploying Purple Air Sensors in 2018:

In the summer of 2018, Clean Air Methow launched the Clean Air Ambassador Program, an exciting citizen science project, and one of the largest, rural networks of low-cost sensors in the world!

This tutorial will demonstrate how to create ‘pas’ and ‘pat’ data for this collection of sensors for September, 2020 when the Methow Valley experienced poor air quality due to wildfire smoke.

PAS and PAT Objects

pas objects in the AirSensor package are described in Purple Air Synoptic Data. They contain per instrument metadata for all Purple Air sensors operating at a particular moment in time. This will include both “spatial metadata” like longitude, latitude, timezone, etc. as well as per-instrument keys allowing us to obtain timeseries data from web services.

pat objects are described in Purple Air Timeseries Data. These contain temporally invariant spatial metadata for a single sensor as well as the actual time series measurements made by that sensor.

In order to work successfully with PurpleAir sensor data, we will need to create both a ‘pas’ object and a collection of ‘pat’ objects for the sensors we are interested in.

Set the Archive Directory

Before you start, consider where on your computer you are going to save your data and create your archive. By default, these tutorials will save data underneath your “home” directory.

Run path.expand("~") in the R console to see the location of your home directory. If you wish to save data somewhere else, you can specify an alternate location by modifying the code below. Otherwise you can jump to the next section.

# Check your current home directory 
path.expand("~")

# Create your data directory anywhere you want, changing the home directory 
# part ("~") and keeping "Data/MVCAA".
#
# Windows example: archiveDir <- "C:/MethowValley/Data/MVCAA" 
#    UNIX example: archiveDir <- "/MethowValley/Data/MVCAA"

# The default choice places data underneath your home directory
archiveDir <- "~/Data/MVCAA" 

Find Sensors

The first task is to identify the senors we wish to use. This can be done by loading the default ‘pas’ object from the Mazama Science maintained data archive and then zooming in on the Methow Valley.

# AirSensor package
library(AirSensor)

# Set the archiveBaseUrl so we can get a 'pas' object
setArchiveBaseUrl("http://data.mazamascience.com/PurpleAir/v1")

# Load the default 'pas' (today for the entire US)
pas <- pas_load()

# Subset by state
wa <- pas_filter(pas, stateCode == "WA")

# Look at it
pas_leaflet(wa)

The Methow Valley is located in north central Washington in the Okanagan National Forest. Clicking on some of the sensors in the Methow, it quickly becomes apparent that many of those sensors have a label containing “MV Clean Air Ambassador”. With this information, we can now write a script to create ‘pat’ objects for each of the Methow Valley Clean Air Ambassador (MVCAA) sensors.

NOTE: We could use the pas_filterArea() function to define a bounding box and get all sensors in the area. But for this tutorial we will limit ourselves to those labeled with “MV Clean Air Ambassador”.

R Script

The following R script will take several minutes to run and will create ‘pas’ and ‘pat’ data files on your computer. Once these files have been created, loading them will be very fast.

After running the script, a final section will demonstrate how to load and work with these local data files.

This R script can be used as a starting point for anyone interested in creating small collections of data for other communities and other dates.

# Methow Valley local data archive: Setup

# ----- Setup ------------------------------------------------------------------

# Use the default archiveDir unless it is already defined
if ( !exists("archiveDir") ) {
  archiveDir <- file.path("~/Data/MVCAA")
}

dir.create(archiveDir, recursive = TRUE)

# AirSensor package
library(AirSensor)

# Set the archiveBaseUrl so we can get a pre-generated 'pas' object
setArchiveBaseUrl("http://data.mazamascience.com/PurpleAir/v1")

# ----- Subset PAS object ------------------------------------------------------

# Create a 'pas' object limited to MVCAA sensors
#   - load most recent 'pas' for the entire country
#   - subset to include sensors labeled MVCAA
mvcaa <-
  pas_load() %>%
  pas_filter(stringr::str_detect(label, "MV Clean Air Ambassador"))
  # NOTE: Could have filtered by area with:
  #pas_filterArea(w = -120.5, e = -120.0, s = 48.0, n = 49.0)

# Look at it
pas_leaflet(mvcaa)

# Save it in our archive directory
save(mvcaa, file = file.path(archiveDir, "mvcaa.rda"))

# Examine archive directory:
list.files(file.path(archiveDir))

# ----- Create PAT objects -----------------------------------------------------

# Get all the deviceDeploymentIDs
mvcaa_ids <- pas_getDeviceDeploymentIDs(mvcaa)

# Specify time range
startdate <- "2020-09-01"
enddate <- "2020-10-01"
timezone <- "America/Los_Angeles"

# Create an empty List to store things
patList <- list()

# Initialize counters
idCount <- length(mvcaa_ids)
count <- 0 
successCount <- 0

# Loop over all ids and get data (This might take a while.)
for (id in mvcaa_ids[1:idCount]) {
  
  count <- count + 1
  print(sprintf("Working on %s (%d/%d) ...", id, count, idCount))
  
  # Use a try-block in case you get "no data" errors
  result <- try({
    
    # Here we show the full function signature so you can see all possible arguments
    patList[[id]] <- pat_createNew(
      id = id,
      label = NULL,        # not needed if you have the id
      pas = mvcaa,
      startdate = startdate,
      enddate = enddate,
      timezone = timezone,
      baseUrl = "https://api.thingspeak.com/channels/",
      verbose = FALSE
    )
    successCount <- successCount + 1
    
  }, silent = FALSE)
  
  if ( "try-error" %in% class(result) ) {
    print(geterrmessage())
  }
  
}

# How many did we get?
print(sprintf("Successfully created %d/%d pat objects.", successCount, idCount))

# Save it in our archive directory
save(patList, file = file.path(archiveDir, "patList.rda"))

# ----- Evaluate patList -------------------------------------------------------

# We can use sapply() to apply a function to each element of the list
sapply(patList, function(x) { return(x$meta$label) })

# How big is patList in memory?
print(object.size(patList), units = "MB")

# How big patList.rda on disk (as compressed binary) 
fileSize <- file.size(file.path(archiveDir, "patList.rda"))
sprintf("%.1f Mb", fileSize/1e6)

Loading Local Data

Now we have a local set of data files containing ‘pas’ and ‘pat’ data for all the sensors we are interested in.

# Empty current environment to ensure we're using our local archive.
rm(list = setdiff(ls(), c("archiveDir")))

# AirSensor package
library(AirSensor)

# Use the default archiveDir unless it is already defined
if ( !exists("archiveDir") ) {
  archiveDir <- file.path("~/Data/MVCAA")
}

# Examine archive directory:
list.files(file.path(archiveDir))
## [1] "airsensor"                "airsensorList.rda"       
## [3] "Liberty_data.csv"         "Liberty_School_clean.rda"
## [5] "mvcaa.rda"                "pat"                     
## [7] "patList.rda"
# Load files
mvcaa <- get(load(file.path(archiveDir, "mvcaa.rda"))) 
patList <- get(load(file.path(archiveDir, "patList.rda")))

# Interactive map
pas_leaflet(mvcaa)
# Print site names and associated ids
sapply(patList, function(x) { return(x$meta$label) })
##                               ab5dca99422f2c0d_13669 
##               "MV Clean Air Ambassador @ Balky Hill" 
##                               f6c44edd41c941c7_10182 
##             "MV Clean Air Ambassador @ Benson Creek" 
##                               49215ad49d1a87e3_10188 
##              "MV Clean Air Ambassador @ Bush School" 
##                               f736fd3fb21fc4da_13667 
##               "MV Clean Air Ambassador @ Gunn Ranch" 
##                               db5d6b3b79f5830e_39237 
## "MV Clean Air Ambassador @ Liberty Bell High School" 
##                               4f19d256e1787973_10166 
##          "MV Clean Air Ambassador @ Lower Studhorse" 
##                               f592adb5067ad9d3_13675 
##         "MV Clean Air Ambassador @ McFarland Creek " 
##                               4a47b9252e16e558_15077 
##           "MV Clean Air Ambassador @ Methow Estates" 
##                               0cbfeb2ce4c1553c_13661 
##             "MV Clean Air Ambassador @ Pine Forest " 
##                               2e3b5ceea86a885b_10168 
##       "MV Clean Air Ambassador @ Upper Beaver Creek" 
##                               f96deab8c29aa42b_10134 
##         "MV Clean Air Ambassador @ Willowbrook Farm" 
##                               96b108298883ca47_64441 
##              "MV Clean Air Ambassador-Little Cougar"
# Pull out Balky Hill as a separate 'pat' object
Balky_Hill <- patList[["ab5dca99422f2c0d_13669"]]

# Basic plot
pat_multiplot(Balky_Hill)


Best of luck assessing air quality in your community!