Synoptic data provides a synopsis - a comprehensive view of something at a moment in time. This vignette demonstrates an example workflow for exploring air quality synoptic data using the AirSensor R package and data captured by PurpleAir air quality sensors.

Synoptic Data Basics

Creating Current Synoptic Data (slow)

PurpleAir sensor readings are uploaded to the cloud every 120 seconds. (Every 80 seconds prior to a May 31, 2019 firmware upgrade.) Data are processed by PurpleAir and a version of the data is displayed on the PurpleAir website.

You can generate a current PurpleAir Synoptic (PAS) object (hereafter called a pas) by using the pas_createNew() function. A pas object is just a large dataframe with 44 data columns and a record for each PurupleAir sensor channel (2 channels per sensor).

The pas_createNew() function performs the following tasks under the hood:

  1. Download a raw dataset of the entire PurpleAir network that includes both metadata and recent PM2.5 averages for each deployed sensor across the globe. See downloadParseSynopticData() for more info.

  2. Subset and enhance the raw dataset by replacing variables with more consistent, human readable names and adding spatial metadata for each sensor including the nearest official air quality monitor. For a more in depth explanation, see enhanceSynopticData().

To create a new pas object you must first properly initialize the MazamaSpatialUtils package. The following example will create a brand new pas object with up-to-the-minute data:

NOTE: This can take up to a minute to process.

library(AirSensor)

# Initialize spatial data processing 
library(MazamaSpatialUtils)
initializeMazamaSpatialUtils()

# Create a 'pas' object with current data
pas <- pas_createNew(countryCodes = "US")

Loading Pre-generated Synoptic Data (fast)

It is also possible to load pre-generated pas objects from a data archive. These objects are updated regularly throughout each day and are typically used by other package functions primarily for the location metadata they contain. Archived pas objects from previous days will thus have data associated with near midnight of that date.

The archived pas objects can be loaded very quickly with the pas_load() function which obtains pas objects from the archive specified with setArchvieBaseUrl(). When used without specifying the datestamp argument, pas_load() will obtain the most recently processed pas object – typically less than an hour old.

# Load packages
library(AirSensor)
library(dplyr)
library(ggplot2)

# Set location of pre-generated data files
setArchiveBaseUrl("https://airfire-data-exports.s3-us-west-2.amazonaws.com/PurpleAir/v1")

# Load the most recent archived 'pas' object
pas <- pas_load()

PAS Data Structure

The pas dataset contains 45 columns, and each row corresponds to different PurpleAir sensors. For the data analysis examples we will focus on the columns labeled stateCode, pm25_*, humidity, pressure, temperature, and pwfsl_closestDistance.

The complete list of columns is given below. Names in ALL_CAPS have been retained from the PurpleAir .json file. Other columns have been renamed for human readability.

##  [1] "ID"                               "label"                           
##  [3] "DEVICE_LOCATIONTYPE"              "THINGSPEAK_PRIMARY_ID"           
##  [5] "THINGSPEAK_PRIMARY_ID_READ_KEY"   "THINGSPEAK_SECONDARY_ID"         
##  [7] "THINGSPEAK_SECONDARY_ID_READ_KEY" "latitude"                        
##  [9] "longitude"                        "pm25"                            
## [11] "lastSeenDate"                     "sensorType"                      
## [13] "flag_hidden"                      "isOwner"                         
## [15] "humidity"                         "temperature"                     
## [17] "pressure"                         "age"                             
## [19] "parentID"                         "flag_highValue"                  
## [21] "flag_attenuation_hardware"        "Ozone1"                          
## [23] "pm25_current"                     "pm25_10min"                      
## [25] "pm25_30min"                       "pm25_1hr"                        
## [27] "pm25_6hr"                         "pm25_1day"                       
## [29] "pm25_1week"                       "statsLastModifiedDate"           
## [31] "statsLastModifiedInterval"        "countryCode"                     
## [33] "stateCode"                        "timezone"                        
## [35] "deviceID"                         "locationID"                      
## [37] "deviceDeploymentID"               "airDistrict"                     
## [39] "pwfsl_closestDistance"            "pwfsl_closestMonitorID"          
## [41] "sensorManufacturer"               "targetPollutant"                 
## [43] "technologyType"                   "communityRegion"

Let’s take a quick peek at some of the PM2.5 data:

# Extract and round just the PM2.5 data
pm25_data <-
  pas %>% 
  select(starts_with("pm25_")) %>% 
  round(1)

# Combine sensor label and pm2.5 data 
bind_cols(label = pas$label, pm25_data) %>%
  head(10) %>% 
  knitr::kable(
    col.names = c("label", "current", "10 min", "30 min", "1 hr", "6 hr", "1 day", "1 wk"),
    caption = "PAS PM2.5 Values"
  )
PAS PM2.5 Values
label current 10 min 30 min 1 hr 6 hr 1 day 1 wk
Hazelwood canary 40.0 39.2 40.0 42.5 60.3 75.4 45.5
Hazelwood canary B 38.5 37.4 38.1 40.5 57.0 72.0 43.2
WC Hillside 47.4 49.2 49.9 50.7 75.2 100.6 67.1
WC Hillside B 43.2 45.3 45.7 46.3 66.9 90.0 60.9
#ValleyClimate 92.9 93.5 95.3 98.0 140.2 140.3 71.5
#ValleyClimate B 106.7 102.0 104.0 106.5 151.6 151.8 76.6
‘S’ St Between Inyo and Mono 77.7 73.2 71.1 71.5 98.4 96.9 56.2
‘S’ St Between Inyo and Mono B 91.2 85.9 83.8 84.6 116.2 113.3 62.6
(Indoors) Lansing St 0.9 0.8 0.9 1.2 3.9 5.7 4.1
(Indoors) Lansing St B 0.1 0.4 0.4 0.7 2.9 4.3 3.2

Mapping pas PM2.5 Data

To visually explore a region, we can use our pas data with the pas_leaflet() function to plot an interactive leaflet map. By default, pas_leaflet() will map the coordinates of each PurpleAir sensor and the hourly PM2.5 data. Clicking on a sensor will show sensor metadata.

pas %>% 
  pas_leaflet()