vignettes/articles/pas_introduction.Rmd
pas_introduction.Rmd
Synoptic data provides a synopsis - a comprehensive view of something at a moment in time. This vignette demonstrates an example workflow for exploring air quality synoptic data using the AirSensor R package and data captured by PurpleAir air quality sensors.
PurpleAir sensor readings are uploaded to the cloud every 120 seconds. (Every 80 seconds prior to a May 31, 2019 firmware upgrade.) Data are processed by PurpleAir and a version of the data is displayed on the PurpleAir website.
You can generate a current PurpleAir Synoptic (PAS) object (hereafter
called a pas
) by using the pas_createNew()
function. A pas
object is just a large dataframe with 59
data columns and a record for each PurupleAir sensor channel (2 channels
per sensor).
The pas_createNew()
function performs the following
tasks under the hood:
Download a raw dataset of the entire PurpleAir network that
includes both metadata and recent PM2.5 averages for each deployed
sensor across the globe. See downloadParseSynopticData()
for more info.
Subset and enhance the raw dataset by replacing variables with
more consistent, human readable names and adding spatial metadata for
each sensor including the nearest official air quality monitor. For a
more in depth explanation, see
enhanceSynopticData()
.
To create a new pas
object you must first properly
initialize the MazamaSpatialUtils package. The
following example will create a brand new pas
object with
up-to-the-minute data:
NOTE: This can take up to a minute to process.
library(AirSensor)
library(dplyr)
library(ggplot2)
# Define PURPLE_AIR_API_READ_KEY in a .gitignore protected file
source("global_vars.R")
setAPIKey("PurpleAir-read", PURPLE_AIR_API_READ_KEY)
# Initialize spatial data processing
library(MazamaSpatialUtils)
initializeMazamaSpatialUtils()
# Create a 'pas' object with current data
pas <-
pas_createNew(
countryCodes = "US",
stateCodes = "CA",
counties = "Los Angeles"
)
It is also possible to load pre-generated pas
objects
from a data archive. These objects are updated regularly throughout each
day and are typically used by other package functions primarily for the
location metadata they contain. Archived pas
objects from
previous days will thus have data associated with near midnight of that
date.
The archived pas
objects can be loaded very quickly with
the pas_load()
function which obtains pas
objects from the archive specified with
setArchvieBaseUrl()
. When used without specifying the
datestamp
argument, pas_load()
will obtain the
most recently processed pas
object – typically less than an
hour old.
The pas
dataset contains 45 columns, and each row
corresponds to different PurpleAir sensors. For the data analysis
examples we will focus on the columns labeled stateCode
,
pm25_*
, humidity
, pressure
,
temperature
, and pwfsl_closestDistance
.
The complete list of columns is given below. Names in
ALL_CAPS
have been retained from the PurpleAir .json file.
Other columns have been renamed for human readability.
## [1] "deviceDeploymentID" "deviceID" "locationID"
## [4] "locationName" "longitude" "latitude"
## [7] "elevation" "countryCode" "stateCode"
## [10] "countyName" "timezone" "houseNumber"
## [13] "street" "city" "zip"
## [16] "sensor_index" "last_modified" "date_created"
## [19] "last_seen" "privacy" "name"
## [22] "location_type" "model" "hardware"
## [25] "led_brightness" "firmware_version" "firmware_upgrade"
## [28] "rssi" "uptime" "pa_latency"
## [31] "memory" "position_rating" "altitude"
## [34] "channel_state" "channel_flags" "channel_flags_manual"
## [37] "channel_flags_auto" "confidence" "confidence_auto"
## [40] "confidence_manual" "humidity" "temperature"
## [43] "pressure" "pm2.5_10minute" "pm2.5_30minute"
## [46] "pm2.5_60minute" "pm2.5_6hour" "pm2.5_24hour"
## [49] "pm2.5_1week" "sensorManufacturer" "ID"
## [52] "label" "sensorType" "pm25"
## [55] "targetPollutant" "technologyType" "pwfsl_closestDistance"
## [58] "pwfsl_closestMonitorID" "communityRegion"
Let’s take a quick peek at some of the PM2.5 data:
# Extract and round just the PM2.5 data
pm25_data <-
pas %>%
select(starts_with("pm2.5_")) %>%
round(1)
# Combine sensor label and pm2.5 data
bind_cols(label = pas$locationName, pm25_data) %>%
head(10) %>%
knitr::kable(
col.names = c("location name", "10 min", "30 min", "1 hr", "6 hr", "1 day", "1 wk"),
caption = "PAS PM2.5 Values"
)
location name | 10 min | 30 min | 1 hr | 6 hr | 1 day | 1 wk |
---|---|---|---|---|---|---|
SCUV_09 | 0.1 | 0.7 | 0.9 | 5.9 | 7.5 | 7.2 |
SCSB_02 | 0.2 | 0.2 | 0.2 | 0.8 | 1.3 | 3.4 |
SCSB_13 | 1.0 | 0.6 | 0.4 | 2.8 | 6.1 | 6.2 |
SCSB_11 | 0.9 | 0.5 | 0.4 | 2.7 | 5.9 | 5.6 |
SCSB_03 | 7.6 | 5.8 | 4.2 | 2.4 | 8.4 | 10.6 |
SCSB_07 | 0.6 | 0.3 | 0.2 | 2.1 | 4.6 | 4.6 |
SCSB_05 | 0.9 | 0.4 | 0.4 | 2.8 | 5.8 | 5.0 |
SCSB_31 | 1.2 | 0.6 | 0.4 | 3.1 | 6.8 | 6.6 |
SCSB_26 | 0.9 | 0.4 | 0.4 | 2.7 | 5.8 | 5.6 |
SCSB_17 | 1.7 | 1.2 | 1.0 | 3.8 | 7.3 | 7.0 |
pas
PM2.5 Data
To visually explore a region, we can use our pas
data
with the pas_leaflet()
function to plot an interactive leaflet map. By default,
pas_leaflet()
will map the coordinates of each PurpleAir
sensor and the hourly PM2.5 data. Clicking on a sensor will show sensor
metadata.
pas %>%
pas_leaflet(parameter = "pm2.5_60minute")
If we want to narrow our selection, for example to California, we can
look at which locations have a moderate to unhealthy 6-hour average air
quality rating with the following short script that uses the
%>%
“pipe” operator:
pas %>%
pas_filter(pm2.5_6hour >= 15.0) %>%
pas_leaflet(parameter = "pm2.5_6hour")
This code pipes our pas
data into
pas_filter()
where we can set our selection criteria. The
stateCode
is the ISO 3166-2 state code, which tells
pas_filter()
to subset for only those station sin
California. The pm25_6hr > 25.0
filter selects those
records where the 6-hour average is above 25.0. The final function in
the pipeline plots the remaining sensors colored by
pm25_6hr
.
pas
Auxiliary Data
We can also explore and utilize other PurpleAir sensor data. Check
the pas_leaflet()
documentation for all supported
parameters.
Here is an example of humidity data captured from PurpleAir sensors across the state of California.
pas %>%
pas_leaflet(parameter = "humidity")
Happy Exploring!
Mazama Science