Logging in Java and Python

Operational systems, by definition, need to work without human input. Systems are considered “operational” after they have been thoroughly tested and shown to work properly with a variety of input.

However, no software is perfect and no real-world system operates with 100% availability or 100% consistent input. Things occasionally go wrong – perhaps intermittently. In a situation with occasional failures it is vitally important to have good logging information for forensic analysis.

Other languages used in operational settings support logging at different levels of information density. The code to write out logs at these different levels looks very similar in java and python:

java

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.commons.logging.impl.Jdk14Logger;

… ugly setup of log files …

log.error("ERROR level message");
log.warn("WARN level message");
log.info("INFO level message");
log.debug("DEBUG level message");
log.trace("TRACE level message");

python

import logging
 
logging.basicConfig(filename="my_INFO.log",
                    level=logging.INFO)

logging.error("ERROR level message")
logging.warning("WARNING level message")
logging.info("INFO level message")
logging.debug("DEBUG level message")
logging.trace("TRACE level message")

Logging in R

The MazamaCoreUtils package provides functions so that R logging can look very similar:

library(MazamaCoreUtils)
 
logger.setup(infoLog="my_INFO.log")

logger.error("ERROR level message")
logger.warn("WARN level message")
logger.info("INFO level message")
logger.debug("DEBUG level message")
logger.trace("TRACE level message")

This functionality is built on top of the excellent futile.logger package.

Log Levels

When used properly, logging at different levels lets you create log files that help you quickly navigate to the specific part of your code base that generated an error.

Different log levels also allow problems to be dealt with by staff with different skill levels. Core developers don’t need to be dragged in whenever anything goes wrong. If a real-time web site is non-responsive, non-technical staff can be assigned to look at logs to assess the situation.

  • If the ERROR or WARN logs report that no data was obtained from an upstream source, the problem is not with the R code.
  • If the INFO log reports that incoming data was properly obtained but that none of it passed QC thresholds, the problem is not with the R code.
  • If, however, these logs show everything running smoothly until some ugly R error message, then it is time to call in the core developers.

Here are our best practices for how to use the different log levels:

ERROR

ERROR log statements should only be generated immediately before a stop() is issued. The ERROR.log file should only have entries when R code failed to complete.

WARN

WARN log statements imply that something unexpected happened but that R code is able to complete, perhaps with unpredictable results. An appropriate use would be after testing the response of an internet request. R code may be able to trap and handle a failed request but should comment on the failure in the logs.

INFO

INFO log statements are useful for a general overview of how your R code is proceeding, reflecting top level code structure and perhaps ending with a final:

  logger.info("Successfully completed ...")
  logger.info("\n=============== THE END ===============\n")

DEBUG

DEBUG log statements are best used to map out the path taken through the code. We recommend including DEBUG level statements inside of top level loops and beginning each function with the following:

logger.debug("----- FUNCTION_NAME() -----")

TRACE

TRACE log statements are for the gory details of any part of your code where you expect or are experiencing errors. These should contain information about program state and are intended only for developers.

A Logging Example

The following minimal example demonstrates the use of logging statements and their output.

library(MazamaCoreUtils)

# Set up logging 
tmpDir <- tempdir()
logger.setup(
  errorLog = file.path(tmpDir,"ERROR.log"),
  infoLog = file.path(tmpDir,"INFO.log"),
  traceLog = file.path(tmpDir,"TRACE.log")
)

# Begin main program
logger.info("Begin loggingExample ...")

# Try something that might fail
outputSums <- list()
for ( column in names(iris) ) {
  
  logger.trace("Working on: %s", column)
  
  result <- try({
    outputSums[[column]] <- 
      paste0("Sum of ", column, " = ", sum(iris[[column]]))
  }, silent = FALSE)
  
  if ( "try-error" %in% class(result) ) {
    logger.warn("Oops: %s", geterrmessage())
  }
  
}
#> Error in Summary.factor(structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  : 
#>   'sum' not meaningful for factors

# Successful completion
logger.error("NO ERRORS")
logger.info("Successfully completed loggingExample")
logger.info("\n=============== THE END ===============\n")

# Check the output
str(outputSums)
#> List of 4
#>  $ Sepal.Length: chr "Sum of Sepal.Length = 876.5"
#>  $ Sepal.Width : chr "Sum of Sepal.Width = 458.6"
#>  $ Petal.Length: chr "Sum of Petal.Length = 563.7"
#>  $ Petal.Width : chr "Sum of Petal.Width = 179.9"

Here is what is generated in the log files:

ERROR.log

cat(paste(readLines(file.path(tmpDir,"ERROR.log")),collapse='\n'))
#> ERROR [2024-12-02 20:43:29] NO ERRORS

INFO.log

cat(paste(readLines(file.path(tmpDir,"INFO.log")),collapse='\n'))
#> INFO [2024-12-02 20:43:29] Begin loggingExample ...
#> WARN [2024-12-02 20:43:29] Oops: Error in Summary.factor(structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  : 
#>   'sum' not meaningful for factors
#> 
#> ERROR [2024-12-02 20:43:29] NO ERRORS
#> INFO [2024-12-02 20:43:29] Successfully completed loggingExample
#> INFO [2024-12-02 20:43:29] 
#> =============== THE END ===============

TRACE.log

cat(paste(readLines(file.path(tmpDir,"TRACE.log")),collapse='\n'))
#> INFO [2024-12-02 20:43:29] Begin loggingExample ...
#> TRACE [2024-12-02 20:43:29] Working on: Sepal.Length
#> TRACE [2024-12-02 20:43:29] Working on: Sepal.Width
#> TRACE [2024-12-02 20:43:29] Working on: Petal.Length
#> TRACE [2024-12-02 20:43:29] Working on: Petal.Width
#> TRACE [2024-12-02 20:43:29] Working on: Species
#> WARN [2024-12-02 20:43:29] Oops: Error in Summary.factor(structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  : 
#>   'sum' not meaningful for factors
#> 
#> ERROR [2024-12-02 20:43:29] NO ERRORS
#> INFO [2024-12-02 20:43:29] Successfully completed loggingExample
#> INFO [2024-12-02 20:43:29] 
#> =============== THE END ===============

Extra Functionality

Two more utility functions need to be mentioned

logger.setLevel()

This function sets the log level used in the console. By default this level is set to FATAL which basically means “only when R crashes”. If you are debugging code interactively you may wish to see log messages as the are generated by entering the following command:

logger.setLevel(TRACE)

initializeLogging()

This function is a wrapper for our typical usage. It accepts a logDir parameter and looks for TRACE, DEBUG INFO and ERROR logs in that directory. If they are found, they are renamed as paste0(logLevel,".log.",timestamp). After renaming, new log files are setup.

This is particularly useful for operational code run inside docker containers which may be automatically rebooted.