Introduction

This tutorial demonstrates how to convert previously generated ‘pat’ timeseries data into hourly-aggregated ‘airsensor’ objects. In order to run the code in this tutorial you must have followed the instructions in Tutorial 1 and created a directory with ‘pas’ and ‘pat’ data files for the Methow Valley. Target audiences include grad students, researchers and any member of the public concerned about air quality and comfortable working with R and RStudio.

Tutorials in this series include:

Goal

The goal in this tutorial is to load previously generated ‘pat’ data and then create and save a multi-sensor ‘airsensor’ object. Aggregation of raw data to quality controlled, hourly averages allows sensor data to be compared with meteorological and air quality data from other sources. Hourly aggregation is the de facto standard for most air quality analysis.

Sensor Objects

‘airsensor’ objects (sometimes also called ‘sensor’ objects) contain timeseries data derived from ‘pat’ objects. Where ‘pat’ objects contain “raw” data, ‘airsensor’ objects contain highly processed data:

‘pat’ object

  • list containing meta and data dataframes
  • meta contains spatial metadata for a single sensor
  • data records have minute resolution raw data
  • data contains multiple parameters from a single sensor

‘airsensor’ object

  • list containing meta and data dataframes
  • meta contains spatial metadata for one or more sensors
  • data records have hourly resolution aggregated and QC’ed data
  • data contains a single parameter from one or more sensors

An ‘airsensor’ object is equivalent to and compatible with the ‘ws_monitor’ object defined in the PWFSLSmoke R package. This means that any functions from PWFSLSmoke or from AirMonitorPlots designed to work with ‘ws_monitor’ objects can also be used with ‘airsensor’ objects.

This opens up a wide suite of of functionality that can be used to analyze and plot sensor data after it has been converted into an ‘airsensor’ object.

R Script

The following R script will take a few moments to run and will create an ‘airsensor’ data file on your computer.

After running the script, a final section will demonstrate how to load and work with this local data file.

This R script can be used as a starting point for those wishing to create small collections of data for other communities and other dates.

# Methow Valley local data archive

# ----- Setup ------------------------------------------------------------------

library(AirSensor)

# Use the default archiveDir unless it is already defined
if ( !exists("archiveDir") ) {
  archiveDir <- file.path("~/Data/MVCAA")
}

# Load previously generated 'pas' and 'pat' data
mvcaa <- get(load(file.path(archiveDir, "mvcaa.rda")))
patList <- get(load(file.path(archiveDir, "patList.rda")))

# ----- Create 'airsensor' data ------------------------------------------------

# Create an empty List to store things
airsensorList <- list()

# Initialize counters
idCount <- length(patList)
count <- 0
successCount <- 0

# Loop over ids and create 'airsensor' objects (might take a while).
for ( id in names(patList) ) {

  count <- count + 1
  print(sprintf("Working on %s (%d/%d) ...", id, count, idCount))
  
  # Use a try-block in case you get "no data" errors
  result <- try({
    
    # It's nice to copy-paste the full function signature so you can see all possible arguments
    airsensorList[[id]] <- pat_createAirSensor(
      pat = patList[[id]],
      parameter <- "pm25",
      FUN = PurpleAirQC_hourly_AB_01
    )
    successCount <- successCount + 1
    
  }, silent = FALSE)
  
  if ( "try-error" %in% class(result) ) {
    print(geterrmessage())
  }
  
}

# How many did we get?
print(sprintf("Successfully created %d/%d pat objects.", successCount, idCount))

# Save it in our archive directory
save(airsensorList, file = file.path(archiveDir, "airsensorList.rda"))

Script Comments

The above script is straightforward and easily reused but we should take a moment to describe the arguments to pat_createAirSensor():

  • pat – This is just the ‘pat’ object we are using as input data.
  • parameter – The ‘airsensor’ object will contain hourly averaged data for only a single parameter. This will almost always be pm25 but it is also possible to create ‘airsensor’ objects for temperature or humidity or any other parameter available in pat$data.
  • FUN – This is the quality control (QC) function that will be used when aggregating pat data to an hourly axis. Several functions are available in the AirSensor package and it is also possible to create custom QC algorithms.

Loading Local Data

Now that we have an airsensorList object with quality controlled data we can load it from our archive and briefly explore it.

# AirSensor package
library(AirSensor)

# Use the default archiveDir unless it is already defined
if ( !exists("archiveDir") ) {
  archiveDir <- file.path("~/Data/MVCAA")
}

# Examine archive directory
list.files(archiveDir) # you should see: "airsensorList.rda" "mvcaa.rda" "patList.rda" 
## [1] "airsensor"                "airsensorList.rda"       
## [3] "Liberty_data.csv"         "Liberty_School_clean.rda"
## [5] "mvcaa.rda"                "pat"                     
## [7] "patList.rda"
# Load files
mvcaa <- get(load(file.path(archiveDir, "mvcaa.rda")))
patList <- get(load(file.path(archiveDir, "patList.rda")))
airsensorList <- get(load(file.path(archiveDir,"airsensorList.rda")))

# Examine a 'pat' object
pat1 <- patList[[1]]
class(pat1)
## [1] "pa_timeseries" "list"
dim(pat1$data)
## [1] 21413    19
names(pat1$data)       # single sensor, multiple parameters
##  [1] "datetime"    "pm25_A"      "pm25_B"      "temperature" "humidity"   
##  [6] "pressure"    "pm1_atm_A"   "pm25_atm_A"  "pm10_atm_A"  "pm1_atm_B"  
## [11] "pm25_atm_B"  "pm10_atm_B"  "uptime"      "rssi"        "memory"     
## [16] "adc0"        "bsec_iaq"    "datetime_A"  "datetime_B"
names(pat1$meta)
##  [1] "ID"                               "label"                           
##  [3] "sensorType"                       "DEVICE_LOCATIONTYPE"             
##  [5] "THINGSPEAK_PRIMARY_ID"            "THINGSPEAK_PRIMARY_ID_READ_KEY"  
##  [7] "THINGSPEAK_SECONDARY_ID"          "THINGSPEAK_SECONDARY_ID_READ_KEY"
##  [9] "longitude"                        "latitude"                        
## [11] "countryCode"                      "stateCode"                       
## [13] "timezone"                         "deviceID"                        
## [15] "locationID"                       "deviceDeploymentID"              
## [17] "pwfsl_closestDistance"            "pwfsl_closestMonitorID"          
## [19] "sensorManufacturer"               "targetPollutant"                 
## [21] "technologyType"                   "communityRegion"
# Examine an 'airsensor' object
airsensor1 <- airsensorList[[1]]
class(airsensor1)
## [1] "airsensor"  "ws_monitor"
dim(airsensor1$data)
## [1] 720   2
names(airsensor1$data) # single parameter, one or more sensors
## [1] "datetime"               "ab5dca99422f2c0d_13669"
names(airsensor1$meta)
##  [1] "ID"                               "label"                           
##  [3] "sensorType"                       "DEVICE_LOCATIONTYPE"             
##  [5] "THINGSPEAK_PRIMARY_ID"            "THINGSPEAK_PRIMARY_ID_READ_KEY"  
##  [7] "THINGSPEAK_SECONDARY_ID"          "THINGSPEAK_SECONDARY_ID_READ_KEY"
##  [9] "longitude"                        "latitude"                        
## [11] "countryCode"                      "stateCode"                       
## [13] "timezone"                         "deviceID"                        
## [15] "locationID"                       "deviceDeploymentID"              
## [17] "pwfsl_closestDistance"            "pwfsl_closestMonitorID"          
## [19] "sensorManufacturer"               "targetPollutant"                 
## [21] "technologyType"                   "communityRegion"                 
## [23] "monitorID"                        "elevation"                       
## [25] "siteName"                         "countyName"                      
## [27] "msaName"                          "monitorType"                     
## [29] "siteID"                           "instrumentID"                    
## [31] "aqsID"                            "pwfslID"                         
## [33] "pwfslDataIngestSource"            "telemetryAggregator"             
## [35] "telemetryUnitID"

Fancy Plot

The following plot shows what is possible for someone highly experienced with R and the Mazama Science packages.

# Combine single-sensor 'airsensor' objects into a multi-sensor 'monitor' object
my_monitors <- PWFSLSmoke::monitor_combine(airsensorList)
names(my_monitors$data) # datetime + deviceDeploymentIDs
##  [1] "datetime"               "ab5dca99422f2c0d_13669" "f6c44edd41c941c7_10182"
##  [4] "49215ad49d1a87e3_10188" "f736fd3fb21fc4da_13667" "db5d6b3b79f5830e_39237"
##  [7] "4f19d256e1787973_10166" "f592adb5067ad9d3_13675" "4a47b9252e16e558_15077"
## [10] "0cbfeb2ce4c1553c_13661" "2e3b5ceea86a885b_10168" "f96deab8c29aa42b_10134"
## [13] "96b108298883ca47_64441"
# Use MazamaCoreUtils::dateRange() to ensure we get POSIXct times in the local timezone
dateRange <- MazamaCoreUtils::dateRange(
  startdate = 20200903, 
  enddate = 20200921, 
  timezone = "America/Los_Angeles"
)

# Load AirMonitorPlots
library(AirMonitorPlots)

# Help ggplot out by assigning short labels to be used as the facet label
my_monitors$meta$shortName <- 
  my_monitors$meta$siteName %>%
  stringr::str_replace("MV Clean Air Ambassador @", "") %>%
  stringr::str_replace("MV Clean Air Ambassador-", "") %>%
  stringr::str_trim()

# Fancy plot with AQI bars
gg <- 
  ggplot_pm25Timeseries(
    my_monitors,
    dateRange[1],
    dateRange[2],
    timezone = "America/Los_Angeles"
  ) +
  ggtitle("Methow Valley PM2.5 values -- Sep 03 - 21, 2020") +
  stat_AQCategory(color = NA) + 
  facet_grid(rows = vars(shortName)) 

print(gg)

NOTE: It appears that our default QC algorithm has tossed out many of the values during the smokiest hours. It might be time to go back and try out some of the other QC algorithms or create a new one. This demonstrates why it is so important to work in a well documented, reproducible manner and how data visualization can alert you to pitfalls in your analysis.


Best of luck assessing air quality in your community!