This tutorial demonstrates how to create pas, pat and monitor objects for PurpleAir sensors in a particular community. Target audiences include grad students, researchers, air quality professionals and any member of the public concerned about air quality and comfortable working with R and RStudio.
Our goal in this tutorial is to create pas, pat and monitor objects (data structures) for the Methow Valley – a community in north-central Washington state. Clean Air Methow operates as a project of the Methow Valley Citizens Council and began deploying PurpleAir Sensors in 2018:
In the summer of 2018, Clean Air Methow launched the Clean Air Ambassador Program, an exciting citizen science project, and one of the largest, rural networks of low-cost sensors in the world!
This tutorial will demonstrate how to access and work with data from this collection of sensors.
pas objects in the AirSensor package
contain per-instrument metadata for a collection of PurpleAir sensors.
This will include both “spatial metadata” like longitude, latitude,
timezone, etc. as well as each instrument’s
sensor_index, allowing us to request timeseries data from
the PurpleAir API.
pat objects in the AirSensor package contain time-invariant spatial metadata for a single sensor as well as the time-dependent measurements made by that sensor.
monitor objects in the AirMonitor package contain time-invariant spatial metadata for multiple sensors as well as the time-dependent PM2.5 measurements made by those sensors.
Before we start working with PurpleAir data we need to get a few things set up:
# AirSensor2 package
library(AirSensor2)
## Loading required package: MazamaCoreUtils
# Working from the git repo, your directory should end with "AirSensor2"
cat(getwd())
## /Users/jonathancallahan/Projects/MazamaScience/AirSensor2
# Set user's PurpleAir_API_READ_KEY
library(dotenv)
## Warning: package 'dotenv' was built under R version 4.4.1
dotenv::load_dot_env()
PurpleAir_API_READ_KEY <- Sys.getenv("PurpleAir_API_READ_KEY")
# Check that the key works
PurpleAir_checkAPIKey(PurpleAir_API_READ_KEY)
## $api_version
## [1] "V1.2.0-1.1.45"
##
## $time_stamp
## [1] "2026-05-06 17:47:12 UTC"
##
## $api_key_type
## [1] "READ"
# Set this key once for all pas_/pat_ functions
setAPIKey("PurpleAir-read", PurpleAir_API_READ_KEY)
# Initialize previously installed spatial datasets
initializeMazamaSpatialUtils()
To find the sensors we wish to investigate, we must first create a pas object with metadata for all PurpleAir senors in our target area. The Methow Valley valley is located entirely within Okanogan County, WA, so we can create an Okanogan-only pas object as our starting point.
# Create a new 'pas' object for Okanogan county
okanogan_pas <-
pas_createNew(
countryCodes = "US",
stateCodes = "WA",
counties = c("Okanogan"),
lookbackDays = 0, # all historical sensors
location_type = 0 # outdoor sensors only
)
# Interactive map
okanogan_pas %>% pas_leaflet()
Clicking on some of the sensors in the Methow Valley, it quickly becomes apparent that many of those sensors have a label that associates them with the “Clean Air Ambassador” program. Unfortunately, the naming is not consistent.
A quick review of sort(okanogan_pas$locationName)
reveals:
Clearly, some effort was made to systematize the naming even if it wasn’t entirely successful. Nevertheless, we can filter for all location names that begin with “MV” or “Clean Air” to create a MVCAA-only pas object
mvcaa_pas <-
okanogan_pas %>%
pas_filter(stringr::str_detect(locationName, "^MV|^Clean Air"))
# Interactive map
mvcaa_pas %>% pas_leaflet()
Along with where sensors are located, it’s important
to know when they collected data. (Low-cost sensors
don’t last forever.) The pas_lifespanPlot() helps with this
as seen in the following two examples:
# Quick look at sensor lifespans
mvcaa_pas %>%
pas_lifespanPlot(showSensor = TRUE, moreSpace = 0.1)
# Human readable names
mvcaa_pas %>%
pas_lifespanPlot(
showSensor = TRUE,
moreSpace = 0.3,
sensorIdentifier = "locationName"
)
A pat object contains time series data for a specific
sensor. The pat_create() function downloads all data
records for a sensor – “raw data” typically measured every 2 minutes. A
similar function, pat_createHourly(), downloads hourly
aggregated data as provided by the PurpleAir API.
The pat_create() function has a fields
argument that lets you specify which data fields should be included in
the result. But default, it uses all those defined in
PurpleAir_PAT_QC_FIELDS:
## [1] "rssi" "uptime" "pa_latency" "memory" "humidity"
## [6] "temperature" "pressure" "pm2.5_atm" "pm2.5_atm_a" "pm2.5_atm_b"
Clicking on the leaflet map above, we identify the
sensor_index for the sensor at “Little Start School” as
"95189". (Zoom in on Winthrop to see it.) From the lifespan
plot, we can see that this monitor was first deployed at the end of 2020
and is still producing data.
The following chunk of code creates a raw pat object for this sensor:
# Create raw pat object
pat <-
pat_create(
api_key = PurpleAir_API_READ_KEY,
pas = mvcaa_pas,
sensor_index = "95189",
startdate = "2025-09-01",
enddate = "2025-09-08",
timezone = "UTC",
verbose = TRUE
)
# Pull out data
tbl <- pat$data
# Review parameters
names(tbl)
## [1] "datetime" "rssi" "uptime" "pa_latency" "memory"
## [6] "humidity" "temperature" "pressure" "pm2.5_atm" "pm2.5_atm_a"
## [11] "pm2.5_atm_b"
We can now use the standard behavior of the base plot()
function to review all parameters and look for any interesting
correlations among them.
In the plots below, we see that temperature and
humidity (aka “relative humidity”) are inversely
correlated, that pm2.5_atm_a and pm2.5_atm_b
are strongly correlated and that pm2.5_atm values varied
from ~20 to ~120.
# NOTE: Using "pch = 15" greatly improves the speed of drawing
# Sensor Electronics
plot(tbl[,c(1,2:5)], pch = 15, cex = 0.5, main = "Sensor Electronics")
# Atmospheric Variables
plot(tbl[,c(1,6:8)], pch = 15, cex = 0.5, main = "Atmospheric Variables")
# PM2.5 "atm"
plot(tbl[,c(1,9:11)], pch = 15, cex = 0.5, main = "PM2.5 'atm'")
While useful for engineering-level analysis, it is important to note that high resolution, raw data can consume a lot of points interacting with the PurpleAir API. Creation of hourly data is much more frugal, utilizing the API’s ability to aggregate data.
For use cases involving comparison with regulatory monitors, calculating daily averages or informing the public, it is imperative to use hourly aggregated data that has had a correction equation applied. (PurpleAir sensors tend to report pm2.5 values that are higher than those reported by EPA regulatory monitors.)
The pat_createHourly() function only downloads those
parameters typically used in QC and correction functions. A
correction_FUN is applied to the hourly pat object
and returns corrected PM2.5 values. By default, an EPA vetted correction
function is applied that brings PurpleAir values in line with EPA data
even in the smoky conditions seen with wildfire smoke. (See the
documentation for pat_applyCorrection() for details.)
Below, we create a pat_hourly object for a period in August, 2021 when wildfire smoke severely impacted the Methow Valley.
# "Mazama Trailhead" during the Cedar Creek and Cub Creek fires of 2021
trailhead_hourly <-
pat_createHourly(
api_key = PurpleAir_API_READ_KEY,
pas = mvcaa_pas,
sensor_index = "95227", # Mazama Trailhead
startdate = "2021-07-15",
enddate = "2021-08-01",
timezone = "America/Los_Angeles", # timestamps interpreted in this time zone
verbose = TRUE
)
# This object has hourly data in the 'data' dataframe
head(trailhead_hourly$data)
## # A tibble: 6 × 6
## datetime humidity temperature pm2.5_atm pm2.5_atm_a pm2.5_atm_b
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2021-07-15 07:00:00 23 78 16 15.9 16.2
## 2 2021-07-15 08:00:00 24 77 12.5 12.6 12.4
## 3 2021-07-15 09:00:00 26 75 8.8 8.8 8.8
## 4 2021-07-15 10:00:00 28 74 6.6 6.7 6.5
## 5 2021-07-15 11:00:00 28 74 5.6 5.6 5.6
## 6 2021-07-15 12:00:00 30 73 5.2 5.2 5.1
The pat_toMonitor() function returns a monitor
object ready to use with the AirMonitor and AirMonitorPlots
packages.
trailhead_monitor <-
trailhead_hourly %>%
pat_toMonitor()
# Now we have corrected pm2.5 with a 'deviceDeploymentID' column header
head(trailhead_monitor$data)
## # A tibble: 6 × 2
## datetime c2dkknhd2_pa.95227
## <dttm> <dbl>
## 1 2021-07-15 07:00:00 12.2
## 2 2021-07-15 08:00:00 10.2
## 3 2021-07-15 09:00:00 8.12
## 4 2021-07-15 10:00:00 6.79
## 5 2021-07-15 11:00:00 6.27
## 6 2021-07-15 12:00:00 5.89
We can now use functions from the AirMonitor and AirMonitorPlots packages to manipulate and visualize this data.
NOTE: Values are in units of µg/m³ not AQI.
# Create a basic timeseries plot
AirMonitor::monitor_timeseriesPlot(
trailhead_monitor,
shadedNight = TRUE,
addAQI = TRUE
)
AirMonitor::addAQILegend("topleft", cex = 0.8, bg = "white")
# Create a daily barplot
trailhead_monitor %>%
AirMonitor::monitor_dailyBarplot()
AirMonitor::addAQILegend("topleft", cex = 0.8, bg = "white")
# Extract regulatory daily averages using "Local Standard Time"
trailhead_monitor %>%
AirMonitor::monitor_dailyStatistic(
FUN = mean,
minHours = 18,
dayBoundary = "LST"
) %>%
AirMonitor::monitor_getData() %>%
dplyr::mutate(datetime = as.Date(datetime)) %>%
dplyr::rename_with(~ c("date", "pm25_mean")) %>%
print()
## # A tibble: 18 × 2
## date pm25_mean
## <date> <dbl>
## 1 2021-07-14 NA
## 2 2021-07-15 6.86
## 3 2021-07-16 12.8
## 4 2021-07-17 18.8
## 5 2021-07-18 55.7
## 6 2021-07-19 132.
## 7 2021-07-20 23.9
## 8 2021-07-21 5.22
## 9 2021-07-22 27.9
## 10 2021-07-23 26.7
## 11 2021-07-24 133.
## 12 2021-07-25 21.6
## 13 2021-07-26 87.3
## 14 2021-07-27 114.
## 15 2021-07-28 158.
## 16 2021-07-29 173.
## 17 2021-07-30 204.
## 18 2021-07-31 223.
# Use AirMonitorPlots to create a "diurnal" plot
trailhead_monitor %>%
AirMonitorPlots::monitor_ggDailyByHour_archival(
title = "Mazama Trailhead"
)
Best of luck assessing air quality in your community!