class: center, middle, inverse, title-slide .title[ # Air Quality Recipes ] .subtitle[ ## Package APIs for R Dabblers ] .author[ ### Jonathan Callahan ] .date[ ### June 21, 2022 ] --- # Giving credit What I'll show represents the work of many people and support from several different organizations: * US Forest Service AirFire Team * South Coast Air Quality Management District * Incorporated Research Institutions for Seismology * Mazama Science .center[ ![USFS](images/USFS_logo.png) ![SCAQMD](images/SCAQMD_logo.jpg) ![IRIS](images/IRIS_logo.png) ![MazamaScience](images/MazamaScience_logo.png) ] --- # My winding path **Spectroscopy in graduate school**<br> - computers and data visualization **Scientist/Engineer at NOAA's PMEL**<br> - weather data and open source software **CEO of Mazama Science**<br> - writing R packages<br> - air quality data<br> - mentoring young people .right[ _Now at the Desert Research Institute._<br> ![DRI](images/DRI_logo.png) ] --- class: chapter-slide # Air Quality --- ## Air Quality is important! [World Economic Forum](https://www.weforum.org/agenda/2022/05/air-pollution-killing-millions-how-can-we-tackle/) May 25, 2022 <p style="font-size: smaller; font-style: italic;"> "Air pollution ... claims the lives of 7 million people every year." </p> [World Bank Blog](https://blogs.worldbank.org/developmenttalk/air-pollution-kills-evidence-global-analysis-exposure-and-poverty) May 18, 2022 <p style="font-size: smaller; font-style: italic;"> "... air pollution can reinforce socioeconomic inequalities." </p> <p style="font-size: smaller; font-style: italic;"> "2.8 billion people face hazardous air pollution levels." </p> [BBC](https://www.bbc.com/future/article/20220405-the-fungi-cleaning-new-delhis-air#:~:text=Twenty%2Done%20of%20the%20world's,toxic%20air%20in%20the%20country.) April 04, 2022 <p style="font-size: smaller; font-style: italic;"> "Twenty-one of the world's 30 cities with the worst levels of air pollution are in India." </p> --- ## Air Quality data consumers Lots of people want to work with Air Quality data: * Air Qualty Management Districts (AQMDs) * Public health agencies * Schools * Hospitals * Researchers * Graduate Students * Citizen Scientists .right[_Air quality data is of interest to everyone!_] --- ## Air Quality analyst skills Many people in the air quality community still use Excel. Some folks do data munging in R ... -- ... so they can continue working in Excel. -- But R and RStudio could be the _**perfect tool**_ ... * scripted and reproducible * excellent graphics * interactive data visualization * _packages focused on air quality?_ --- ## Goals for Air Quality R packages * Meet the needs of air quality analysts. * Use systematic naming for objects and functions. * Allow chaining of results. * Use a compact data model. * Provide good graphics. * Provide lots of documentation and examples. .right[_Make the hard easy and the easy invisible._] --- class: chapter-slide # Air Quality Packages --- ## The _Mazama_ package suite [MazamaCoreUtils](https://mazamascience.github.io/MazamaCoreUtils/) -- <span style="font-size: smaller;">utilities for production code</span> <br> [MazamaSpatialUtils](http://mazamascience.github.io/MazamaSpatialUtils/) -- <span style="font-size: smaller;">spatial searching</span> <br> [MazamaLocationUtils](http://mazamascience.github.io/MazamaLocationUtils/) -- <span style="font-size: smaller;">management of spatial metadata</span> <br> [MazamaTimeSeries](http://mazamascience.github.io/MazamaTimeSeries/) -- <span style="font-size: smaller;">environmental time series</span> <br> [AirMonitor](http://mazamascience.github.io/AirMonitor/) -- <span style="font-size: smaller;">processing air quality monitoring data</span> <br> [AirMonitorPlots](http://mazamascience.github.io/AirMonitorPlots/) -- <span style="font-size: smaller;">plotting for <b>AirMonitor</b></span> .right[_Modular lego bricks for air quality analysis._] --- ## MazamaCoreUtils Utilities for writing _operational_ code. * Python style logging * Simple error messaging * Cache management * API key handling * Date-time parsing * Lat/lon validation and uniqueID creation * Source code linting --- **Locations and Times (timezones are required)** ```r MazamaCoreUtils::createLocationID( longitude = -127.27, latitude = 47.45 ) ## [1] "7d192358bb09f94e" MazamaCoreUtils::parseDatetime( datetime = "2021-06-20 05:33", timezone = "America/Los_Angeles" ) ## [1] "2021-06-20 05:33:00 PDT" MazamaCoreUtils::dateSequence( startdate = 20210620, enddate = 20210622, timezone = "America/Los_Angeles" ) ## [1] "2021-06-20 PDT" "2021-06-21 PDT" "2021-06-22 PDT" ``` ??? Notice the explicit naming of arguments: longitude, latitude, datetime, timezone, _etc._ Using explicit, complete names is important. No more guessing what shorthand is used for longitude ("lon", "lng", "long", ...). --- ## MazamaSpatialUtils GIS point-in-polygon searches made simple. **Harmonized Datasets (plus simplified versions)** ``` * 2.1M EEZCountries.RData * 15M NaturalEarthAdm1.RData * 61M OSMTimezones.RData * 3.6M TMWorldBorders.RData * 48MTerrestrialEcoregions.RData * 7.5M USCensus115thCongress.RData * 17M USCensusCounties.RData * 4.6M USCensusStates.RData ``` --- ```r lons <- seq(0,20,5) lats <- seq(40,60,5) MazamaSpatialUtils::getCountryCode(lons, lats) ## [1] "ES" "FR" "DE" "DK" "FI" MazamaSpatialUtils::getCountryName(lons, lats) ## [1] "Spain" "France" "Germany" "Denmark" "Finland" MazamaSpatialUtils::getTimezone(lons, lats) ## [1] "Europe/Madrid" "Europe/Paris" ## [3] "Europe/Berlin" "Europe/Copenhagen" ## [5] "Europe/Mariehamn" MazamaSpatialUtils::getStateName( longitude = lons, latitude = lats, useBuffering = TRUE ) ## [1] "Castellón" "Drôme" "Bayern" "Hovedstaden" ## [5] "Lemland" ``` ??? The `useBuffering` argument is important when your location is just outside a polygon, perhaps on a peninsula or an island. This is a case where we handle the literal and littoral "edge cases" and provide what people expect. --- <img src="Air_Quality_Recipes_files/figure-html/spatial_plot-1.png" style="display: block; margin: auto;" /> --- ## MazamaLocationUtils Instrument deployments -- aka "known locations". **Problem** * Many enviornmental time series are stationary. * Site locations metadata is expensive to acquire. * GPS locations can have "jitter". * Instruments move and get replaced. **Solution** * Keep a table of known locations. * Provide tools to find nearest location. --- ## MazamaTimeSeries Compact data model -- separate `meta` and `data`. * Time-independent metadata goes in the `meta` table. * Time-dependent measurements go in the `data` table. * A `deviceDeploymentID` connects `meta` and `data`. * Multiple Time Series (_'mts'_) share a common time axis. * Reduces file sizes by factor of 10-100. * Reduces memory footprint by factor of 10-100. --- Data model is very compact but not "tidy". So we need our own manipulation functions: * `mts_collapse()` * `mts_combine()` * `mts_distinct()` * `mts_filterData()` * `mts_filterDatetime()` * `mts_filterMeta()` * `mts_getDistance()` * `mts_isEmpty()` * `mts_select()` * `mts_summarize()` * `mts_trimDate()` ??? Function names begin with a prefix that identifies the type of object they work on and return. In the case an "mts" aka "MultipleTimeSeries" object. --- ## AirMonitor Work with pre-processed Air Quality data. * Maintained by the US Forest Service AirFire group. * PM2.5 data from regulatory and temporary monitors. * Updated every few minutes. * Archives go back a decade. * Used in operational sites: [Monitoring v4](https://tools.airfire.org/monitoring/v4) and [Fire & Smoke Map](https://fire.airnow.gov). * Database is accessible to anyone. --- class: chapter-slide # Recipes --- ## What is a recipe? ``` Cake <- flour %>% add_sugar %>% add_eggs %>% add_other_flavors %>% mix %>% pour_into_cake_pan %>% bake_in_oven_at_350 %>% remove_and_cool %>% add_icing %>% add_sprinkles ``` .right[_Yum!_] --- ## Our first Air Quality recipe ```r library(AirMonitor) Camp_Fire <- monitor_loadAnnual(2018) %>% monitor_filter(stateCode == 'CA') %>% monitor_filterDate( startdate = "2018-11-08", enddate = "2018-11-23", timezone = "America/Los_Angeles" ) %>% monitor_dropEmpty() ``` ??? Easy to understand what is happening. --- ```r monitor_leaflet(Camp_Fire) ```
??? Clicking on a location pops up spatial and instrument metadata including the unique `deviceDeploymentID`. --- ```r Sacramento <- Camp_Fire %>% monitor_select("8ca91d2521b701d4_060670010") Sacramento %>% monitor_timeseriesPlot(shadedNight = TRUE, addAQI = TRUE) ``` <img src="Air_Quality_Recipes_files/figure-html/sacramenmto-1.png" style="display: block; margin: auto;" /> ??? This timeseries plot is tailor made for the target audience with day/night shading and Air Quality Index colors. It is designed to be "publication ready". --- ```r Sacramento_area_daily_average <- Camp_Fire %>% monitor_filterByDistance( longitude = Sacramento$meta$longitude, latitude = Sacramento$meta$latitude, radius = 50000 ) %>% monitor_collapse( deviceID = "Sacramento_area" ) %>% monitor_dailyStatistic(FUN = mean) %>% monitor_getData() head(Sacramento_area_daily_average) ``` ``` ## # A tibble: 6 × 2 ## datetime `0ad50de3895a9886_Sacramento_area` ## <dttm> <dbl> ## 1 2018-11-08 00:00:00 16.2 ## 2 2018-11-09 00:00:00 22.9 ## 3 2018-11-10 00:00:00 118. ## 4 2018-11-11 00:00:00 109. ## 5 2018-11-12 00:00:00 77.6 ## 6 2018-11-13 00:00:00 69.3 ``` ??? A straightfoward recipe which results in two-column tibble with one column of day-start times in the local timezone and another column of daily average values. Perfect for loading into a spreadsheet. --- ## AirMonitorPlots Publication ready plots * **ggplot2** based * time series plot * daily barplots * daily/hourly barplot * diurnal cycle plot * build-your-own components --- ```r library(AirMonitorPlots) Carmel_Valley %>% monitor_trimDate() %>% monitor_ggDailyByHour_archival(title = "Carmel Valley") ``` <img src="Air_Quality_Recipes_files/figure-html/time_of_day_plot-1.png" style="display: block; margin: auto;" /> ??? This diurnal plot shows when it would be better to stay inside and when the air quality improves enough to go outside -- in the evenings. This is an example of a plot that provides useful "actionable information" to the general public. --- class: chapter-slide # Take Home Message --- ## Know your audience * Get feedback. * Meet them where they are. * Design small modular components. * Use explicit naming. * Sweat the details. * Be flexible. * Be consistent!!! .right[_Have fun playing with your new lego set!_]