What I'll show represents the work of many people and support from several different organizations:
Spectroscopy in graduate school
- computers and data visualization
Scientist/Engineer at NOAA's PMEL
- weather data and open source software
CEO of Mazama Science
- writing R packages
- air quality data
- mentoring young people
Now at the Desert Research Institute.
World Economic Forum May 25, 2022
"Air pollution ... claims the lives of 7 million people every year."
World Bank Blog May 18, 2022
"... air pollution can reinforce socioeconomic inequalities."
"2.8 billion people face hazardous air pollution levels."
BBC April 04, 2022
"Twenty-one of the world's 30 cities with the worst levels of air pollution are in India."
Lots of people want to work with Air Quality data:
Air quality data is of interest to everyone!
Many people in the air quality community still use Excel.
Some folks do data munging in R ...
But R and RStudio could be the perfect tool ...
Make the hard easy and the easy invisible.
-- utilities for production code
-- spatial searching
-- management of spatial metadata
-- environmental time series
-- processing air quality monitoring data
-- plotting for AirMonitor
Modular lego bricks for air quality analysis.
Utilities for writing operational code.
Locations and Times (timezones are required)
MazamaCoreUtils::createLocationID( longitude = -127.27, latitude = 47.45)## [1] "7d192358bb09f94e"MazamaCoreUtils::parseDatetime( datetime = "2021-06-20 05:33", timezone = "America/Los_Angeles")## [1] "2021-06-20 05:33:00 PDT"MazamaCoreUtils::dateSequence( startdate = 20210620, enddate = 20210622, timezone = "America/Los_Angeles")## [1] "2021-06-20 PDT" "2021-06-21 PDT" "2021-06-22 PDT"
Notice the explicit naming of arguments: longitude, latitude, datetime, timezone, etc.
Using explicit, complete names is important. No more guessing what shorthand is used for longitude ("lon", "lng", "long", ...).
GIS point-in-polygon searches made simple.
Harmonized Datasets (plus simplified versions)
* 2.1M EEZCountries.RData * 15M NaturalEarthAdm1.RData * 61M OSMTimezones.RData * 3.6M TMWorldBorders.RData * 48MTerrestrialEcoregions.RData * 7.5M USCensus115thCongress.RData * 17M USCensusCounties.RData * 4.6M USCensusStates.RData
lons <- seq(0,20,5)lats <- seq(40,60,5)MazamaSpatialUtils::getCountryCode(lons, lats)## [1] "ES" "FR" "DE" "DK" "FI"MazamaSpatialUtils::getCountryName(lons, lats)## [1] "Spain" "France" "Germany" "Denmark" "Finland"MazamaSpatialUtils::getTimezone(lons, lats)## [1] "Europe/Madrid" "Europe/Paris" ## [3] "Europe/Berlin" "Europe/Copenhagen"## [5] "Europe/Mariehamn"MazamaSpatialUtils::getStateName( longitude = lons, latitude = lats, useBuffering = TRUE)## [1] "Castellón" "Drôme" "Bayern" "Hovedstaden"## [5] "Lemland"
The useBuffering
argument is important when your location is just outside a
polygon, perhaps on a peninsula or an island. This is a case where we handle
the literal and littoral "edge cases" and provide what people expect.
Instrument deployments -- aka "known locations".
Compact data model -- separate meta
and data
connects meta
and data
.Data model is very compact but not "tidy". So we need our own manipulation functions:
Function names begin with a prefix that identifies the type of object they work on and return. In the case an "mts" aka "MultipleTimeSeries" object.
Work with pre-processed Air Quality data.
Cake <- flour %>% add_sugar %>% add_eggs %>% add_other_flavors %>% mix %>% pour_into_cake_pan %>% bake_in_oven_at_350 %>% remove_and_cool %>% add_icing %>% add_sprinkles
library(AirMonitor)Camp_Fire <- monitor_loadAnnual(2018) %>% monitor_filter(stateCode == 'CA') %>% monitor_filterDate( startdate = "2018-11-08", enddate = "2018-11-23", timezone = "America/Los_Angeles" ) %>% monitor_dropEmpty()
Easy to understand what is happening.
Clicking on a location pops up spatial and instrument metadata including the
unique deviceDeploymentID
Sacramento <- Camp_Fire %>% monitor_select("8ca91d2521b701d4_060670010")Sacramento %>% monitor_timeseriesPlot(shadedNight = TRUE, addAQI = TRUE)
This timeseries plot is tailor made for the target audience with day/night shading and Air Quality Index colors. It is designed to be "publication ready".
Sacramento_area_daily_average <- Camp_Fire %>% monitor_filterByDistance( longitude = Sacramento$meta$longitude, latitude = Sacramento$meta$latitude, radius = 50000 ) %>% monitor_collapse( deviceID = "Sacramento_area" ) %>% monitor_dailyStatistic(FUN = mean) %>% monitor_getData()head(Sacramento_area_daily_average)
## # A tibble: 6 × 2## datetime `0ad50de3895a9886_Sacramento_area`## <dttm> <dbl>## 1 2018-11-08 00:00:00 16.2## 2 2018-11-09 00:00:00 22.9## 3 2018-11-10 00:00:00 118. ## 4 2018-11-11 00:00:00 109. ## 5 2018-11-12 00:00:00 77.6## 6 2018-11-13 00:00:00 69.3
A straightfoward recipe which results in two-column tibble with one column of day-start times in the local timezone and another column of daily average values. Perfect for loading into a spreadsheet.
Publication ready plots
library(AirMonitorPlots)Carmel_Valley %>% monitor_trimDate() %>% monitor_ggDailyByHour_archival(title = "Carmel Valley")
This diurnal plot shows when it would be better to stay inside and when the air quality improves enough to go outside -- in the evenings. This is an example of a plot that provides useful "actionable information" to the general public.
Have fun playing with your new lego set!
