vignettes/MazamaTimeSeries.Rmd
MazamaTimeSeries.Rmd
This package supports data management activities associated with environmental time series collected at fixed locations in space. The motivating fields include both air and water quality monitoring where fixed sensors report at regular time intervals.
The most compact format for time series data collected at fixed
locations is a list including two tables.
MazamaTimeSeries stores time series measurements in a
data
table where each row is a synoptic record containing
all measurements associated with a particular UTC time stamp and each
column contains data measured by a single sensor (aka “device”). Any
time invariant metadata associated with a sensor at a known location
(aka a “device-deployment”) is stored in a separate meta
table. A unique deviceDeploymentID
connects the two tables.
In the language of relational databases, this “normalizes” the database
and can greatly reduce the disk space and memory needed to store and
work with the data.
Time series data from a single environmental sensor typically
consists of multiple parameters measured at successive times. This data
is stored in an R list containing two dataframes. The package refers to
this structure as an sts
object for
SingleTimeSeries:
sts$meta
– 1 row = unique device-deployment; cols =
device/location metadata
sts$data
– rows = UTC times; cols = measured parameters
(plus an additional datetime
column)
sts
objects can support the following types of time
series data:
Raw, “engineering data” containing uncalibrated measurements, instrument voltages and QC flags may be stored in this format. This format is also appropriate for processed and QC’ed data whenever multiple parameters are measured by a single device.
Note: The sts
object time axis
specified in data$datetime
reflects device measurement
times and is not required to have uniform spacing. (It may be
regular but it need not be.) It is guaranteed to be
monotonically increasing.
Working with timeseries data from multiple sensors at once is often
challenging because of the amount of memory required to store all the
data from each sensor. However, a common situation is to have time
series that share a common time axis – e.g. hourly
measurements. In this case, it is possible to create single-parameter
data
dataframes that contain all data for all sensors for a
single parameter of interest. In air quality applications, common
parameters of interest include PM2.5 and Ozone.
Multi-sensor, single-parameter time series data is stored in an R
list with two dataframes. The package refers to this structure as an
mts
object for
MultipleTimeSeries:
mts$meta
– N rows = unique device-deployments; cols =
device/location metadata
mts$data
– rows = UTC times; N cols = device-deployments
(plus an additional datetime
column)
A key feature of mts
objects is the use of the
deviceDeploymentID
as a “foreign key” that allows sensor
data
columns to be mapped onto the associated spatial and
sensor metadata in a meta
row. The following will always be
true:
identical(names(mts$data), c('datetime', mts$meta$deviceDeploymentID))
mts
objects can support the following types of time
series data:
Each column of mts$data
represents a timeseries
associated with a particular device-deployment while each row represents
a synoptic snap shot of all measurements made at a particular
time.
In this manner, software can create both timeseries plots and maps
from a single mts
object in memory.
Note: The mts
object time axis
specified in data$datetime
is guaranteed to be a regularly
spaced, monotonic axis with no gaps.
See usage examples in the function documentation.
Best wishes for efficient and productive analysis of time series data!
This R package was created with funding from the USFS AirFire Research Team.