What I'll show represents the work of many people and support from several different organizations:
Spectroscopy in graduate school
- computers and data visualization
Scientist/Engineer at NOAA's PMEL
- weather data and open source software
CEO of Mazama Science
- writing R packages
- air quality data
- mentoring young people
Now at the Desert Research Institute.
World Economic Forum May 25, 2022
"Air pollution ... claims the lives of 7 million people every year."
World Bank Blog May 18, 2022
"... air pollution can reinforce socioeconomic inequalities."
"2.8 billion people face hazardous air pollution levels."
BBC April 04, 2022
"Twenty-one of the world's 30 cities with the worst levels of air pollution are in India."
Lots of people want to work with Air Quality data:
Air quality data is of interest to everyone!
Many people in the air quality community still use Excel.
Some folks do data munging in R ...
Many people in the air quality community still use Excel.
Some folks do data munging in R ...
... so they can continue working in Excel.
Many people in the air quality community still use Excel.
Some folks do data munging in R ...
... so they can continue working in Excel.
But R and RStudio could be the perfect tool ...
Make the hard easy and the easy invisible.
MazamaCoreUtils
-- utilities for production code
MazamaSpatialUtils
-- spatial searching
MazamaLocationUtils
-- management of spatial metadata
MazamaTimeSeries
-- environmental time series
AirMonitor
-- processing air quality monitoring data
AirMonitorPlots
-- plotting for AirMonitor
Modular lego bricks for air quality analysis.
Utilities for writing operational code.
Locations and Times (timezones are required)
MazamaCoreUtils::createLocationID( longitude = -127.27, latitude = 47.45)## [1] "7d192358bb09f94e"MazamaCoreUtils::parseDatetime( datetime = "2021-06-20 05:33", timezone = "America/Los_Angeles")## [1] "2021-06-20 05:33:00 PDT"MazamaCoreUtils::dateSequence( startdate = 20210620, enddate = 20210622, timezone = "America/Los_Angeles")## [1] "2021-06-20 PDT" "2021-06-21 PDT" "2021-06-22 PDT"
Notice the explicit naming of arguments: longitude, latitude, datetime, timezone, etc.
Using explicit, complete names is important. No more guessing what shorthand is used for longitude ("lon", "lng", "long", ...).
GIS point-in-polygon searches made simple.
Harmonized Datasets (plus simplified versions)
* 2.1M EEZCountries.RData * 15M NaturalEarthAdm1.RData * 61M OSMTimezones.RData * 3.6M TMWorldBorders.RData * 48MTerrestrialEcoregions.RData * 7.5M USCensus115thCongress.RData * 17M USCensusCounties.RData * 4.6M USCensusStates.RData
lons <- seq(0,20,5)lats <- seq(40,60,5)MazamaSpatialUtils::getCountryCode(lons, lats)## [1] "ES" "FR" "DE" "DK" "FI"MazamaSpatialUtils::getCountryName(lons, lats)## [1] "Spain" "France" "Germany" "Denmark" "Finland"MazamaSpatialUtils::getTimezone(lons, lats)## [1] "Europe/Madrid" "Europe/Paris" ## [3] "Europe/Berlin" "Europe/Copenhagen"## [5] "Europe/Mariehamn"MazamaSpatialUtils::getStateName( longitude = lons, latitude = lats, useBuffering = TRUE)## [1] "Castellón" "Drôme" "Bayern" "Hovedstaden"## [5] "Lemland"
The useBuffering
argument is important when your location is just outside a
polygon, perhaps on a peninsula or an island. This is a case where we handle
the literal and littoral "edge cases" and provide what people expect.
Instrument deployments -- aka "known locations".
Problem
Solution
Compact data model -- separate meta
and data
.
meta
table.data
table.deviceDeploymentID
connects meta
and data
.Data model is very compact but not "tidy". So we need our own manipulation functions:
mts_collapse()
mts_combine()
mts_distinct()
mts_filterData()
mts_filterDatetime()
mts_filterMeta()
mts_getDistance()
mts_isEmpty()
mts_select()
mts_summarize()
mts_trimDate()
Function names begin with a prefix that identifies the type of object they work on and return. In the case an "mts" aka "MultipleTimeSeries" object.
Work with pre-processed Air Quality data.
Cake <- flour %>% add_sugar %>% add_eggs %>% add_other_flavors %>% mix %>% pour_into_cake_pan %>% bake_in_oven_at_350 %>% remove_and_cool %>% add_icing %>% add_sprinkles
Yum!
library(AirMonitor)Camp_Fire <- monitor_loadAnnual(2018) %>% monitor_filter(stateCode == 'CA') %>% monitor_filterDate( startdate = "2018-11-08", enddate = "2018-11-23", timezone = "America/Los_Angeles" ) %>% monitor_dropEmpty()
Easy to understand what is happening.
monitor_leaflet(Camp_Fire)
Clicking on a location pops up spatial and instrument metadata including the
unique deviceDeploymentID
.
Sacramento <- Camp_Fire %>% monitor_select("8ca91d2521b701d4_060670010")Sacramento %>% monitor_timeseriesPlot(shadedNight = TRUE, addAQI = TRUE)
This timeseries plot is tailor made for the target audience with day/night shading and Air Quality Index colors. It is designed to be "publication ready".
Sacramento_area_daily_average <- Camp_Fire %>% monitor_filterByDistance( longitude = Sacramento$meta$longitude, latitude = Sacramento$meta$latitude, radius = 50000 ) %>% monitor_collapse( deviceID = "Sacramento_area" ) %>% monitor_dailyStatistic(FUN = mean) %>% monitor_getData()head(Sacramento_area_daily_average)
## # A tibble: 6 × 2## datetime `0ad50de3895a9886_Sacramento_area`## <dttm> <dbl>## 1 2018-11-08 00:00:00 16.2## 2 2018-11-09 00:00:00 22.9## 3 2018-11-10 00:00:00 118. ## 4 2018-11-11 00:00:00 109. ## 5 2018-11-12 00:00:00 77.6## 6 2018-11-13 00:00:00 69.3
A straightfoward recipe which results in two-column tibble with one column of day-start times in the local timezone and another column of daily average values. Perfect for loading into a spreadsheet.
Publication ready plots
library(AirMonitorPlots)Carmel_Valley %>% monitor_trimDate() %>% monitor_ggDailyByHour_archival(title = "Carmel Valley")
This diurnal plot shows when it would be better to stay inside and when the air quality improves enough to go outside -- in the evenings. This is an example of a plot that provides useful "actionable information" to the general public.
Have fun playing with your new lego set!
What I'll show represents the work of many people and support from several different organizations:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |