+ - 0:00:00
Notes for current slide
Notes for next slide

Air Quality Recipes

Package APIs for R Dabblers

Jonathan Callahan

June 21, 2022

1 / 31

Giving credit

What I'll show represents the work of many people and support from several different organizations:

  • US Forest Service AirFire Team
  • South Coast Air Quality Management District
  • Incorporated Research Institutions for Seismology
  • Mazama Science

 

USFS   SCAQMD   IRIS MazamaScience

2 / 31

My winding path

Spectroscopy in graduate school
  - computers and data visualization

Scientist/Engineer at NOAA's PMEL
  - weather data and open source software

CEO of Mazama Science
  - writing R packages
  - air quality data
  - mentoring young people

Now at the Desert Research Institute.
DRI

3 / 31

Air Quality

4 / 31

Air Quality is important!

World Economic Forum May 25, 2022

"Air pollution ... claims the lives of 7 million people every year."

World Bank Blog May 18, 2022

"... air pollution can reinforce socioeconomic inequalities."

"2.8 billion people face hazardous air pollution levels."

BBC April 04, 2022

"Twenty-one of the world's 30 cities with the worst levels of air pollution are in India."

5 / 31

Air Quality data consumers

Lots of people want to work with Air Quality data:

  • Air Qualty Management Districts (AQMDs)
  • Public health agencies
  • Schools
  • Hospitals
  • Researchers
  • Graduate Students
  • Citizen Scientists

 

Air quality data is of interest to everyone!

6 / 31

Air Quality analyst skills

Many people in the air quality community still use Excel.

Some folks do data munging in R ...

7 / 31

Air Quality analyst skills

Many people in the air quality community still use Excel.

Some folks do data munging in R ...

      ... so they can continue working in Excel.

8 / 31

Air Quality analyst skills

Many people in the air quality community still use Excel.

Some folks do data munging in R ...

      ... so they can continue working in Excel.

But R and RStudio could be the perfect tool ...

  • scripted and reproducible
  • excellent graphics
  • interactive data visualization
  • packages focused on air quality?
9 / 31

Goals for Air Quality R packages

  • Meet the needs of air quality analysts.
  • Use systematic naming for objects and functions.
  • Allow chaining of results.
  • Use a compact data model.
  • Provide good graphics.
  • Provide lots of documentation and examples.

 

Make the hard easy and the easy invisible.

10 / 31

Air Quality Packages

11 / 31

The Mazama package suite

MazamaCoreUtils -- utilities for production code
MazamaSpatialUtils -- spatial searching
MazamaLocationUtils -- management of spatial metadata
MazamaTimeSeries -- environmental time series
AirMonitor -- processing air quality monitoring data
AirMonitorPlots -- plotting for AirMonitor

 

Modular lego bricks for air quality analysis.

12 / 31

MazamaCoreUtils

Utilities for writing operational code.

  • Python style logging
  • Simple error messaging
  • Cache management
  • API key handling
  • Date-time parsing
  • Lat/lon validation and uniqueID creation
  • Source code linting
13 / 31

Locations and Times (timezones are required)

MazamaCoreUtils::createLocationID(
longitude = -127.27,
latitude = 47.45
)
## [1] "7d192358bb09f94e"
MazamaCoreUtils::parseDatetime(
datetime = "2021-06-20 05:33",
timezone = "America/Los_Angeles"
)
## [1] "2021-06-20 05:33:00 PDT"
MazamaCoreUtils::dateSequence(
startdate = 20210620,
enddate = 20210622,
timezone = "America/Los_Angeles"
)
## [1] "2021-06-20 PDT" "2021-06-21 PDT" "2021-06-22 PDT"
14 / 31

Notice the explicit naming of arguments: longitude, latitude, datetime, timezone, etc.

Using explicit, complete names is important. No more guessing what shorthand is used for longitude ("lon", "lng", "long", ...).

MazamaSpatialUtils

GIS point-in-polygon searches made simple.

Harmonized Datasets (plus simplified versions)

* 2.1M EEZCountries.RData
* 15M NaturalEarthAdm1.RData
* 61M OSMTimezones.RData
* 3.6M TMWorldBorders.RData
* 48MTerrestrialEcoregions.RData
* 7.5M USCensus115thCongress.RData
* 17M USCensusCounties.RData
* 4.6M USCensusStates.RData
15 / 31
lons <- seq(0,20,5)
lats <- seq(40,60,5)
MazamaSpatialUtils::getCountryCode(lons, lats)
## [1] "ES" "FR" "DE" "DK" "FI"
MazamaSpatialUtils::getCountryName(lons, lats)
## [1] "Spain" "France" "Germany" "Denmark" "Finland"
MazamaSpatialUtils::getTimezone(lons, lats)
## [1] "Europe/Madrid" "Europe/Paris"
## [3] "Europe/Berlin" "Europe/Copenhagen"
## [5] "Europe/Mariehamn"
MazamaSpatialUtils::getStateName(
longitude = lons,
latitude = lats,
useBuffering = TRUE
)
## [1] "Castellón" "Drôme" "Bayern" "Hovedstaden"
## [5] "Lemland"
16 / 31

The useBuffering argument is important when your location is just outside a polygon, perhaps on a peninsula or an island. This is a case where we handle the literal and littoral "edge cases" and provide what people expect.

17 / 31

MazamaLocationUtils

Instrument deployments -- aka "known locations".

Problem

  • Many enviornmental time series are stationary.
  • Site locations metadata is expensive to acquire.
  • GPS locations can have "jitter".
  • Instruments move and get replaced.

Solution

  • Keep a table of known locations.
  • Provide tools to find nearest location.
18 / 31

MazamaTimeSeries

Compact data model -- separate meta and data.

  • Time-independent metadata goes in the meta table.
  • Time-dependent measurements go in the data table.
  • A deviceDeploymentID connects meta and data.
  • Multiple Time Series ('mts') share a common time axis.
  • Reduces file sizes by factor of 10-100.
  • Reduces memory footprint by factor of 10-100.
19 / 31

Data model is very compact but not "tidy". So we need our own manipulation functions:

  • mts_collapse()
  • mts_combine()
  • mts_distinct()
  • mts_filterData()
  • mts_filterDatetime()
  • mts_filterMeta()
  • mts_getDistance()
  • mts_isEmpty()
  • mts_select()
  • mts_summarize()
  • mts_trimDate()
20 / 31

Function names begin with a prefix that identifies the type of object they work on and return. In the case an "mts" aka "MultipleTimeSeries" object.

AirMonitor

Work with pre-processed Air Quality data.

  • Maintained by the US Forest Service AirFire group.
  • PM2.5 data from regulatory and temporary monitors.
  • Updated every few minutes.
  • Archives go back a decade.
  • Used in operational sites: Monitoring v4 and Fire & Smoke Map.
  • Database is accessible to anyone.
21 / 31

Recipes

22 / 31

What is a recipe?

Cake <-
flour %>%
add_sugar %>%
add_eggs %>%
add_other_flavors %>%
mix %>%
pour_into_cake_pan %>%
bake_in_oven_at_350 %>%
remove_and_cool %>%
add_icing %>%
add_sprinkles

 

Yum!

23 / 31

Our first Air Quality recipe

library(AirMonitor)
Camp_Fire <-
monitor_loadAnnual(2018) %>%
monitor_filter(stateCode == 'CA') %>%
monitor_filterDate(
startdate = "2018-11-08",
enddate = "2018-11-23",
timezone = "America/Los_Angeles"
) %>%
monitor_dropEmpty()
24 / 31

Easy to understand what is happening.

monitor_leaflet(Camp_Fire)
Max PM2.5 AQI Level
Hazardous
Very Unhealthy
Unhealthy
USG
Moderate
Good
Leaflet | Tiles © Esri — Esri, DeLorme, NAVTEQ, TomTom, Intermap, iPC, USGS, FAO, NPS, NRCAN, GeoBase, Kadaster NL, Ordnance Survey, Esri Japan, METI, Esri China (Hong Kong), and the GIS User Community
25 / 31

Clicking on a location pops up spatial and instrument metadata including the unique deviceDeploymentID.

Sacramento <-
Camp_Fire %>%
monitor_select("8ca91d2521b701d4_060670010")
Sacramento %>%
monitor_timeseriesPlot(shadedNight = TRUE, addAQI = TRUE)

26 / 31

This timeseries plot is tailor made for the target audience with day/night shading and Air Quality Index colors. It is designed to be "publication ready".

Sacramento_area_daily_average <-
Camp_Fire %>%
monitor_filterByDistance(
longitude = Sacramento$meta$longitude,
latitude = Sacramento$meta$latitude,
radius = 50000
) %>%
monitor_collapse(
deviceID = "Sacramento_area"
) %>%
monitor_dailyStatistic(FUN = mean) %>%
monitor_getData()
head(Sacramento_area_daily_average)
## # A tibble: 6 × 2
## datetime `0ad50de3895a9886_Sacramento_area`
## <dttm> <dbl>
## 1 2018-11-08 00:00:00 16.2
## 2 2018-11-09 00:00:00 22.9
## 3 2018-11-10 00:00:00 118.
## 4 2018-11-11 00:00:00 109.
## 5 2018-11-12 00:00:00 77.6
## 6 2018-11-13 00:00:00 69.3
27 / 31

A straightfoward recipe which results in two-column tibble with one column of day-start times in the local timezone and another column of daily average values. Perfect for loading into a spreadsheet.

AirMonitorPlots

Publication ready plots

  • ggplot2 based
  • time series plot
  • daily barplots
  • daily/hourly barplot
  • diurnal cycle plot
  • build-your-own components
28 / 31
library(AirMonitorPlots)
Carmel_Valley %>%
monitor_trimDate() %>%
monitor_ggDailyByHour_archival(title = "Carmel Valley")

29 / 31

This diurnal plot shows when it would be better to stay inside and when the air quality improves enough to go outside -- in the evenings. This is an example of a plot that provides useful "actionable information" to the general public.

Take Home Message

30 / 31

Know your audience

  • Get feedback.
  • Meet them where they are.
  • Design small modular components.
  • Use explicit naming.
  • Sweat the details.
  • Be flexible.
  • Be consistent!!!

 

Have fun playing with your new lego set!

31 / 31

Giving credit

What I'll show represents the work of many people and support from several different organizations:

  • US Forest Service AirFire Team
  • South Coast Air Quality Management District
  • Incorporated Research Institutions for Seismology
  • Mazama Science

 

USFS   SCAQMD   IRIS MazamaScience

2 / 31
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow