Environmental Time Series Data

Dates, times and timezones can be frustrating, especially when working with environmental time series such as those collected by air and water quality sensors.

Environmental time series data often have a strong diurnal signal and are typically plotted with a time axis displaying local time. However, when data are aggregated into larger collections, it is typical to store data with a universal time axis – UTC.

Problems can arise when parsing and formatting dates and times because R defaults to the system timezone available with Sys.timezone(). Imagine an agency scientist based in Washington, DC, using their laptop to display recent air quality data from Los Angeles while at a conference in Tasmania. The data center processing the data might be in Boulder but the data processing machine might be set to use UTC. Potential timezones (available with OlsonNames()) relevant to this scenario include:

  • America/New_York
  • America/Los_Angeles
  • Australia/Tasmania
  • America/Denver
  • UTC

Which timezone should be used to convert a request for data from “2019-08-08”” to “2018-08-15”” into POSIXct datetimes?

To enforce specification of timezones and to help with the common user interface need to specify a range of dates or times, the MazamaCoreUtils package provides the following functions:

  • dateRange() – parses and returns POSIXct start and end dates representing full days in the specified timezone
  • timeRange() – parses and returns POSixct start and end times in the specified timezone
  • parseDatetime() – parses and returns a vector of POSIXct values in the specified timezone

The parseDatetime() function is intended as a timezone-requiring replacement for lubridate::parse_date_time().

Linting for timezones

Enforcing the specification of timezones throughout a body of code is the most robust way to remove timezone-related errors from your software. To help with this this type of code review, the package also includes functions for testing whether specific named arguments are used with certain function calls:

To use these functions you must define a set of function:argument rules to be applied such as:

timezoneLintRules <- list(
  "parse_date_time" = "tz",
  "with_tz" = "tzone",
  "now" = "tzone",
  "strftime" = "tz"
)

This is interpreted as:

  • Every use of the parse_date_time() function must use the tz argument explicitly.
  • Every use of the with_tz() function must use the tzone argument explicitly

While these functions could be used to test for explicit use in any function:argument pair, our concern here is primarily with specification of timezones. The packages includes a detailed list of timezoneLintRules to help with this. As an example, here is the result of linting the dateRange.R function in this package:

> lintFunctionArgs_file("R/dateRange.R", timezoneLintRules)
# A tibble: 7 x 6
  file        line_number column_number function_name   named_args includes_required
  <chr>             <int>         <int> <chr>           <list>     <lgl>            
1 dateRange.R         125            29 with_tz         <chr [1]>  TRUE             
2 dateRange.R         128            27 with_tz         <chr [1]>  TRUE             
3 dateRange.R         141            18 parse_date_time <chr [2]>  TRUE             
4 dateRange.R         142            18 parse_date_time <chr [2]>  TRUE             
5 dateRange.R         159            18 parse_date_time <chr [2]>  TRUE             
6 dateRange.R         176            18 parse_date_time <chr [2]>  TRUE             
7 dateRange.R         188            18 now             <chr [1]>  TRUE             

The result shows that the dateRange.R source code is consistent in always explicitly specifying a timezone.

Hopefully, this attention to timezones will help your code avoid misunderstandings when it comes to date and time requests.