Master essential techniques for identifying, understanding, and fixing errors in R code
Author
David Munoz Tord
Published
May 17, 2025
Course Overview
Welcome to the Troubleshooting and Debugging in R course! Being able to effectively diagnose and fix errors is an essential skill for any R programmer. This course will teach you systematic approaches to identify common errors, use R’s built-in debugging tools, and develop robust strategies for solving problems in your code.
By the end of this course, you will: * Understand common error message patterns and their meanings * Master debugging tools available in R * Apply systematic troubleshooting approaches * Gain confidence in solving coding problems independently
Lesson 1: Understanding Error Messages
When R code fails, the interpreter provides error messages that can help identify the issue. These messages often seem cryptic at first, but learning to interpret them is a valuable skill.
Types of Errors in R
R errors generally fall into three main categories:
Syntax Errors: Occur when code doesn’t follow proper R syntax rules
Runtime Errors: Occur during execution when R encounters an invalid operation
Logical Errors: Code runs without errors but produces incorrect results
Reading Error Messages
Error messages in R typically contain: * The specific error encountered * The function where the error occurred * Sometimes, line numbers or context for the error
Example: Basic Error Messages
# Syntax error (missing closing parenthesis)sum(1, 2, 3# Reference error (object not found)print(nonexistent_variable)# Type error (incompatible data types)"string"+5
Error in parse(text = input): <text>:5:1: unexpected symbol
4: # Reference error (object not found)
5: print
^
When facing errors, focus on these key parts: * The error message text itself * The function name mentioned * The context or line number * Any traceback information showing the call stack
Common Error Messages and Their Meanings
1. “object not found” This indicates trying to reference a variable or function that doesn't exist in the current environment.
# Common causes:# 1. Typos in variable namesmean_value <-10print(mean_vlue) # Typo in variable name
Error: object 'mean_vlue' not found
# 2. Forgetting to load a package# head(starwars) # Without loading dplyr# 3. Variable exists in a different environment (scope)example_function <-function() { internal_variable <-10}# After the function executes, trying to access:# print(internal_variable) # Error: object not found
2. “cannot open the connection” This error usually relates to file operations.
# Trying to read a non-existent fileread.csv("file_that_does_not_exist.csv")
Warning in file(file, "rt"): cannot open file 'file_that_does_not_exist.csv':
No such file or directory
Error in file(file, "rt"): cannot open the connection
3. “argument is missing, with no default” Functions require certain arguments, and this error occurs when you miss a required one.
# seq() requires at least \'from\' and \'to\'seq()
[1] 1
4. “non-numeric argument to binary operator” This occurs when trying to perform mathematical operations on non-numeric values.
"hello"*3
Error in "hello" * 3: non-numeric argument to binary operator
Exercise 1.1: Identifying Error Types
For each code snippet below: 1. Run the code to see the error 2. Identify the type of error (syntax, runtime, or logical) 3. Explain what caused the error 4. Fix the code to make it work
Solution:
# Code Snippet 1# Type of error: Runtime error# Explanation: The mean() function requires numeric input, but 'dat' contains a character value ("four").# Fixed code:dat <-c(1, 2, 3, 4, 5) # Replace "four" with 4mean(dat)# Code Snippet 2# Type of error: Runtime error (warning)# Explanation: The loop tries to access values[5], but the vector only has 4 elements.# Fixed code:values <-c(10, 20, 30, 40)for (i in1:length(values)) { # Use length(values) instead of hardcoding 5print(values[i])}# Code Snippet 3# Type of error: Runtime error# Explanation: The function requires two arguments (length and width), but only one is provided.# Fixed code:calculate_area <-function(length, width) { area <- length * widthreturn(area)}calculate_area(5, 3) # Provide both arguments# Code Snippet 4# Type of error: Syntax error# Explanation: Missing closing parenthesis in the vector definition.# Fixed code:x <-c(1, 2, 3) # Add closing parenthesisy <- x +1print(y)
Exercise 1.2: Understanding Error Messages
Given the following error messages, determine the possible cause and how you would fix it:
Solution:
# 1. Error in data.frame(Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30), : # arguments imply differing number of rows: 3, 2# Explanation:# The data.frame() function is being called with vectors of different lengths.# The 'Name' vector has 3 elements, but the 'Age' vector only has 2.# Fix: Make sure all vectors have the same length.data_fixed <-data.frame(Name =c("Alice", "Bob", "Charlie"),Age =c(25, 30, 35) # Added a third age value)print(data_fixed)# 2. Error: unexpected symbol in "my result"# Explanation:# Variable names in R cannot contain spaces.# Fix: Use an underscore or period instead of a space.my_result <-42# Used underscore instead of space
Lesson 2: Debugging Tools
R provides several built-in tools to help with debugging. Understanding these tools can save enormous time when troubleshooting complex code.
Debugging in Rstudio
In RStudio, when you’re debugging R code, the IDE provides options under Tools > Global Options > Code > Debugging to control what happens when an error occurs during code execution. The three main options are:
1. “On Error - Message Only”
Behavior: When an error occurs, R simply prints the error message to the console.
Use case: Ideal if you just want to be informed that something went wrong but don’t need to debug interactively.
Example:
x <-log("a")# Error in log("a") : non-numeric argument to mathematical function
2. “On Error - Error Inspector”
Behavior: When an error occurs, RStudio opens the Error Inspector, a GUI tool that helps you investigate the error.
You can:
See the call stack (which function calls led to the error)
Examine variable values at each stack frame
Use case: Great for users who want to interactively debug without dropping into the raw debugger.
3. “On Error - Break in Code”
Behavior: When an error occurs, RStudio enters the browser()/debug mode (see it later on) at the point in the code where the error happened.
Use case: Useful for developers who want fine-grained control during debugging and want to manually step through the code to see exactly what went wrong.
Basic Debugging Approaches
Before diving into specialized functions, consider these basic strategies:
1. Print Statements The simplest debugging technique is to add print() or cat() statements at strategic points in your code.
List of 3
$ numbers: int [1:5] 1 2 3 4 5
$ text : chr "hello world"
$ matrix : int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
class(sample_data$numbers)
[1] "integer"
class(sample_data$text)
[1] "character"
Specialized Debugging Functions
R provides several specialized debugging functions:
1. traceback() After an error occurs, traceback() shows the sequence of function calls that led to the error.
# Example of using traceback()f <-function() g()g <-function() h()h <-function() stop("Error in h")# This would cause an error# f()# Then you can run:# traceback()# Which would show the calling sequence:# 3: stop("Error in h") at #1# 2: h() at #1# 1: g() at #1# 0: f()
2. browser() Inserts an interactive debugging environment at any point in your function.
browser() tips
In the browser, you can: - Type ‘n’ to execute the next statement - Type ‘c’ to continue to the end of the function - Type ‘Q’ to quit debugging - Type variable names to see their values - Execute any R code to inspect the environment
debug_calculation <-function(x, y) {# This launches the browser when the function runsbrowser() step1 <- x *2 step2 <- y +10 result <- step1 * step2return(result)}# When calling this function, execution will pause at browser()# debug_calculation(5, 3)
3. debug() and debugonce() Marks a function for debugging, activating browser mode when the function runs.
# debug() flags a function for debugging# debugonce() does it for just the next callsimple_function <-function(x) { result <- x^2return(result)}# debug(simple_function)# simple_function(5)# undebug(simple_function)# Or for a single use:# debugonce(simple_function)# simple_function(5)
4. The Error Inspector in RStudio When an error occurs in RStudio, you can often click on “Show Traceback” to see the call stack, and “Rerun with Debug” to enter debugging mode.
Using try() and tryCatch()
For more controlled error handling, R provides try() and tryCatch() functions.
1. try() - Continue execution despite errors
# Without try(), this would stop executionresult1 <-try(1+"a", silent =TRUE)print("This code runs even after the error")
[1] "This code runs even after the error"
# Check if an error occurredif (inherits(result1, "try-error")) {print("An error occurred in the first operation")}
[1] "An error occurred in the first operation"
# Continue with other operationsresult2 <-try(1+2)print(paste("Second result:", result2))
[1] "Second result: 3"
2. tryCatch() - More sophisticated error handling
result <-NULL# Initialize resultresult <-tryCatch( {# The main code to try x <-1/0return(x) # This won't execute due to the error },error =function(e) {# Code to run if an error occursmessage("An error occurred: ", e$message)return(NA) # Return a default value },warning =function(w) {# Code to run if a warning occursmessage("A warning occurred: ", w$message)return(NULL) },finally = {# Code that always runs, regardless of error/warningmessage("This cleanup code always runs") })
This cleanup code always runs
[1] Inf
print(result)
NULL
Exercise 2.1: Using Debugging Tools
Practice using different debugging techniques:
Solution:
if (inherits(result, "try-error")) {print("Error detected! Let's debug...")# 1. Identify the issueprint("Issue: The function fails because it tries to calculate statistics on a non-numeric column.")# 2. Fix the function calculate_statistics_fixed <-function(data) {if (!is.data.frame(data)) {stop("Input must be a data frame") }# Only select numeric columns numeric_data <- data %>%select_if(is.numeric)if (ncol(numeric_data) ==0) {stop("No numeric columns found in the data") } means <-colMeans(numeric_data) sds <-sapply(numeric_data, sd) ranges <-sapply(numeric_data, function(x) max(x) -min(x))return(list(means = means, sds = sds, ranges = ranges)) }# 3. Test the fixed function result_fixed <-calculate_statistics_fixed(test_data)print("Fixed function results:")print(result_fixed)}
Exercise 2.2: Building Robust Functions
Create a function with built-in error handling that calculates the square root of each element in a vector, but handles negative numbers gracefully.
Solution:
safe_sqrt <-function(x, handle_negative =TRUE) {if (!is.numeric(x)) {stop("Input must be numeric") }# Process each element with tryCatch result <-sapply(x, function(val) {tryCatch( {if (val <0&& handle_negative) {warning(paste("Taking square root of negative number:", val))return(complex(real =0, imaginary =sqrt(abs(val)))) } else {return(sqrt(val)) } },error =function(e) {warning(paste("Error with value", val, ":", e$message))return(NA) } ) })return(result)}
Lesson 3: Systematic Troubleshooting
Beyond specific tools, developing a systematic approach to debugging will make you more efficient at solving problems. This section covers structured approaches to troubleshooting.
The Debugging Mindset
Effective debugging requires: * Patience and persistence * Systematic, methodical thinking * Breaking complex problems into smaller parts * Formulating and testing hypotheses
Isolating the Problem
When facing a complex error, try to:
Create a minimal reproducible example (reprex)
Strip down your code to the smallest version that still produces the error
Remove unnecessary data or code that isn’t relevant to the problem
Binary search debugging
Comment out half of your code to see if the error still occurs
If it does, the problem is in the remaining half; if not, it’s in the half you commented out
Repeat this process, narrowing down the location of the error
# Example of code you might need to debugcomplex_function <-function() {# Step 1 - Data loading# ...# Step 2 - Data preprocessing# ...# Step 3 - Analysis# ...# Step 4 - Visualization# ...}# Binary search approach:# 1. Comment out steps 3 & 4 to see if error occurs in steps 1 & 2# 2. If error still occurs, comment out step 2 to check if it's in step 1# 3. Continue narrowing down until you find the exact line
Verifying Assumptions
Many bugs stem from incorrect assumptions about: * What functions do * What data contains * The state of objects at different points in code execution
Always verify these assumptions:
# Check function behavior with simple inputshead(mtcars, 2) # Make sure you understand what head() does
# Check data types and structurestypeof(mtcars$mpg)
[1] "double"
is.data.frame(mtcars)
[1] TRUE
dim(mtcars)
[1] 32 11
# Check for missing valuessum(is.na(mtcars))
[1] 0
# Check for unexpected valuessummary(mtcars$mpg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90
range(mtcars$mpg)
[1] 10.4 33.9
Using Version Control for Debugging
Version control systems like Git can be valuable debugging tools:
Bisect through commits:
If your code worked before, use git bisect to find which commit introduced the bug
Compare working and non-working versions:
Use git diff to see what changed between versions
Revert to a known good state:
When completely stuck, you can fall back to a version you know works: git reset SHA
Rubber Duck Debugging
Sometimes, the mere act of explaining your code helps you spot the issue:
Explain your code line by line to an inanimate object (or a patient colleague)
Articulate what each line is supposed to do
Describe the expected vs. actual behavior
Often, you’ll spot the issue while explaining
Exercise 3.1: Isolating Bugs
The following function has multiple issues. Use a systematic approach to identify and fix them.
Solution:
analyze_data_fixed <-function(data, filter_col, min_value) {# Fix 1: Input validationif (!is.data.frame(data)) {stop("Input must be a data frame") }if (!filter_col %in%names(data)) {stop(paste("Column", filter_col, "not found in data")) }# Fix 2: Proper filtering syntax filtered_data <- data[data[[filter_col]] > min_value, ]# Fix 3: Handle empty resultif (nrow(filtered_data) ==0) {warning("No rows match the filter criteria")return(NULL) }# Fix 4: Calculate means only for numeric columns numeric_cols <-sapply(filtered_data, is.numeric)if (sum(numeric_cols) ==0) {warning("No numeric columns found")return(NULL) } result <-colMeans(filtered_data[, numeric_cols, drop =FALSE])# Fix 5: Explicit returnreturn(result)}
Exercise 3.2: Creating a Minimal Reproducible Example
Take this complex and error-prone code and create a minimal reproducible example that isolates the core issue.
Solution:
# Create a minimal reproducible example that isolates the issueminimal_reprex <-function() {# Simplified data with just the key issue simple_data <-data.frame(group =c("A", "A", "B"),value =c(1, 2, 3) )# Core issue: The filter(count >= 2) drops single-row groups problem_result <- simple_data %>%group_by(group) %>%summarize(count =n(),.groups ="drop" ) %>%filter(count >=2) # This drops group Bprint("Minimal example showing the issue:")print(problem_result) # Only shows group A, drops B# Fix: result_fixed <- simple_data %>%group_by(group) %>%summarize(count =n(),.groups ="drop" ) %>%filter(count >=1) # Keep all groupsprint("Fixed version:")print(result_fixed) # Shows both groups A and Breturn(result_fixed) # Return the fixed result}# Fix the full functionprocess_data_fixed <-function(data, group_var, measure_var, do_transform =TRUE, min_count =1) {# Same function but with parameterized min_count and fixed filterif (!is.data.frame(data) ||!group_var %in%names(data) ||!measure_var %in%names(data)) {stop("Invalid inputs") }if (do_transform) { data <- data %>%mutate(across(where(is.numeric), ~ . +100)) } result <- data %>%group_by(across(all_of(group_var))) %>%summarize(mean_value =mean(!!sym(measure_var)),count =n(),.groups ="drop" ) %>%filter(count >= min_count) %>%# Now parameterizedarrange(desc(mean_value))names(result) <-str_replace_all(names(result), "_", ".")return(result)}
Part 2: Advanced Topics and Best Practices
Lesson 4: Advanced Debugging Techniques
While the tools in Lesson 2 cover most day-to-day debugging, R offers more advanced techniques for complex scenarios, performance issues, and deeper inspection of R’s object systems.
1. Post-mortem Debugging with recover()
The recover() function is invaluable when an error occurs and you want to inspect the environment of each function in the call stack at the time of the error. It’s like browser() but activated after an error.
To use it effectively, you can set it as the error handler globally: options(error = recover)
When an error occurs, R will present a menu listing the function calls in the stack. You can select a number to enter the environment of that function call and inspect variables, just like in browser().
# Set recover as the error handleroptions(error = recover)# Define a series of nested functionsfunc_a <-function(x) {func_b(x *2)}func_b <-function(y) {func_c(y +5)}func_c <-function(z) {if (z >10) {stop("Value too high in func_c!") }return(z /2)}# Trigger an error# func_a(3) # This will result in z = (3*2)+5 = 11, triggering the stop()# After the error, you'd see a menu like:# Enter a frame number, or 0 to exit## 1: func_a(3)# 2: func_b(x * 2)# 3: func_c(y + 5)## Selection:## Typing '3' would take you into func_c's environment where you can inspect 'z'.# Typing 'ls()' would show objects in that frame.# Typing 'z' would show its value.# Typing '0' exits recover.# Don't forget to reset the error option if you don't want recover globally# options(error = NULL)
When to use recover(): * When an unexpected error halts a long computation. * When you need to understand the state of multiple functions in the call stack leading to an error. * When traceback() isn’t enough and you need interactive inspection.
2. Profiling Code with Rprof() and profvis
Sometimes the “bug” isn’t an error, but unexpectedly slow code. Profiling helps identify these performance bottlenecks.
a) Rprof() R’s built-in Rprof() function samples the call stack at regular intervals during code execution and writes the results to a file.
# Start profiling, output to "Rprof.out"Rprof("Rprof.out")# Your potentially slow code hereslow_function <-function() { total <-0for (i in1e5) { total <- total +log(sqrt(i)) }return(total)}result <-slow_function()# Stop profilingRprof(NULL)# Analyze the outputsummaryRprof("Rprof.out")# This gives a summary of time spent in each function.
The output of summaryRprof() can be a bit dense. It shows time spent in each function (“self” time) and time spent in that function plus functions it called (“total” time).
b) profvis Package The profvis package provides an interactive visualization of profiling data, making it much easier to understand.
# install.packages("profvis") # If not already installedlibrary(profvis)# Profile an expressionprofvis({# Your potentially slow code here slow_function <-function() { total <-0for (i in1e5) { total <- total +log(sqrt(i)) # sqrt() is a bit redundant here with log() }return(total) } result <-slow_function() another_operation <-function(n) {Sys.sleep(n) # Simulate some work }another_operation(0.5)})# This will open an interactive HTML widget in RStudio or your browser.# It shows a flame graph and data table, highlighting time-consuming calls.
profvis is highly recommended for a more intuitive understanding of where your code spends its time.
3. Debugging S3 and S4 Methods
R’s object-oriented systems (S3 and S4) use generic functions and methods. Debugging them can be tricky because the actual code executed depends on the class of the object.
a) Finding Which Method is Called * sloop::s3_dispatch() or sloop::s4_dispatch(): These functions from the sloop package are excellent for seeing the method dispatch path. * methods(): e.g., methods(print) shows all print methods. methods(class = "lm") shows all methods for lm objects. * getS3method() / getMethod(): To retrieve the actual code of a specific method.
# install.packages("sloop") # If not already installedlibrary(sloop)# S3 Exampledata_frame <-data.frame(x =1:3, y = letters[1:3])s3_dispatch(print(data_frame))# This will show:# => print.data.frame# * print.default# Get the code for print.data.frame# getS3method("print", "data.frame")# S4 Example (requires a class and generic with methods)setClass("Person", slots =c(name ="character", age ="numeric"))setGeneric("display", function(object) standardGeneric("display"))setMethod("display", "Person", function(object) {cat("Name:", object@name, ", Age:", object@age, "\\n")})alice <-new("Person", name ="Alice", age =30)s4_dispatch(display(alice))# => display,Person,ANY# Get the code for the method# getMethod("display", "Person")
b) Debugging a Specific Method Once you’ve identified the method, you can use debug() or browser() on it: * debug(getS3method("print", "data.frame")) * Or, if it’s your own S4 method, you can put browser() directly in its source code.
4. Global Error Handling Options: options(error = ...)
We saw options(error = recover). Another useful one is options(error = dump.frames). * dump.frames saves the call stack and environments to an object (default last.dump) when an error occurs. * You can then use debugger() to perform post-mortem debugging on this dump.
# Set dump.frames as the error handleroptions(error = dump.frames)# Define a function that will cause an errorerror_function <-function(a, b) { a + b # This will error if b is a string}# Trigger an error# error_function(5, "hello")# Now, the dump is saved. You can inspect it:# debugger(last.dump)# This will start a browser-like session for the saved frames.# To list saved dumps:# ls(pattern = ".dump$")# Reset error option# options(error = NULL)
This is useful for non-interactive sessions (like batch scripts) where recover can’t be used directly. You can save the dump and analyze it later.
Exercise 4.1: Using recover()
Set options(error = recover).
Write a function calculate_ratio(a, b) that returns a / b.
Inside calculate_ratio, add a check: if b is zero, stop("Division by zero!").
Call calculate_ratio(10, 0).
When recover prompts you, enter the frame for calculate_ratio.
Inspect the values of a and b.
Exit recover and reset options(error = NULL).
Solution:
# Solution for Exercise 4.1 (conceptual, as recover is interactive)# options(error = recover) # Step 1calculate_ratio <-function(a, b) { # Step 2if (b ==0) { # Step 3stop("Division by zero!") }return(a / b)}# calculate_ratio(10, 0) # Step 4# At this point, recover() would activate.# User would select the frame for calculate_ratio.# Then inspect 'a' (shows 10) and 'b' (shows 0).# Then exit recover.# options(error = NULL) # Step 7
Exercise 4.2: Profiling with profvis
Create a function generate_data(n_rows, n_cols) that creates a data frame with n_rows and n_cols. Each cell should be a random number runif(1).
Create another function process_data(df) that:
Calculates the mean of each column.
Calculates the sum of each row.
Does this in a loop 10 times (just to make it take a bit longer).
Use profvis to profile calling generate_data(1000, 100) and then passing its result to process_data().
Identify which parts of your functions take the most time.
Solution:
generate_data <-function(n_rows, n_cols) {Sys.sleep(0.1) # Simulate some work matrix_data <-matrix(runif(n_rows * n_cols), nrow = n_rows, ncol = n_cols) df <-as.data.frame(matrix_data)return(df)}process_data <-function(df) { col_means_list <-list() row_sums_list <-list()for (i in1:10) { # Loop to make it take longerSys.sleep(0.05) # Simulate work col_means_list[[i]] <-colMeans(df, na.rm =TRUE) row_sums_list[[i]] <-rowSums(df, na.rm =TRUE) }return(list(col_means = col_means_list, row_sums = row_sums_list))}# Profiling would show that:# 1. The Sys.sleep() calls take the most time (in a real scenario, these would be actual computations)# 2. The matrix creation and data frame conversion in generate_data() would be significant for large dimensions# 3. The repeated colMeans() and rowSums() calculations in the loop would be inefficient
Part 3: Specialized Debugging and Best Practices
Lesson 5: Debugging in Specific Contexts
Debugging challenges can vary significantly depending on the R environment or framework you’re working with. This lesson explores common issues and tailored strategies for Shiny applications, R Markdown/Quarto documents, and custom R packages.
1. Debugging Shiny Applications
Shiny apps introduce reactivity, which can make debugging less straightforward. Errors might stem from server logic, UI definitions, or the interaction between them.
Common Issues:
Reactivity Problems: Outputs not updating, observers firing unexpectedly, or infinite reactive loops.
Server/UI Disconnects: UI elements not correctly linked to server-side logic or vice-versa.
Slow Performance: Bottlenecks in reactive expressions or data processing.
Silent Errors: Errors within reactive expressions might not always stop the app but can lead to incorrect behavior.
Debugging Tools & Techniques:
browser() in Server Logic: Place browser() inside reactive() expressions, observeEvent(), or renderPlot() functions to inspect values and flow at specific points.
# server.R or app.R (server part)server <-function(input, output) { data_reactive <-reactive({# browser() # Uncomment to debug this reactive expressionreq(input$my_slider) data_frame <-data.frame(x =1:input$my_slider, y =rnorm(input$my_slider))# print(head(data_frame)) # Useful for quick checksreturn(data_frame) }) output$my_plot <-renderPlot({ df <-data_reactive()# browser() # Uncomment to inspect df before plottingplot(df$x, df$y) })}
print() Statements: Use print() or cat() within reactive expressions to output values to the R console. Remember that these will execute every time the reactive expression re-evaluates.
isolate(): Use isolate() to read a reactive value without creating a dependency, which can help pinpoint unnecessary re-evaluations.
shiny::reactiveLog() and reactlog package: These tools provide a visual representation of the reactive graph and its execution, helping to understand dependencies and data flow. reactlog::reactlog_show() or shiny::showReactLog() can be very insightful.
# At the top of your app.R or global.R# options(shiny.reactlog = TRUE) # For built-in reactive log# Or using the reactlog package (recommended)# install.packages("reactlog")# library(reactlog)# reactlog_enable()# After running the app and interacting with it:# reactlog_show() # Opens the visualization
Shiny’s Built-in Debugging Aids: Shiny often prints useful information to the R console, including error messages that might not be visible in the app’s UI.
RStudio’s Shiny Debugging Tools: RStudio provides features like direct inspection of reactive values when the app is running.
Tips for Debugging Shiny Apps: * Isolate Components: Test reactive expressions or modules individually if possible. * Simplify: If an app is complex, try to reproduce the issue in a minimal version. * Check Inputs: Ensure input values are what you expect (e.g., correct type, not NULL when req() is needed).
2. Debugging R Markdown and Quarto Documents
When knitting R Markdown (.Rmd) or rendering Quarto (.qmd) documents, errors often occur within code chunks.
Common Issues: * Code Chunk Errors: An R error in a specific chunk stops the rendering process. * Environment Conflicts: Variables or functions defined in one chunk might not be available or might be overwritten in another, depending on chunk options. * Rendering Problems: Issues with output formats (HTML, PDF, Word), especially with LaTeX for PDF output. * Package/Dependency Issues: A required package might not be installed or loaded in the environment where the document is rendered.
Debugging Tools & Techniques: * Run Chunks Individually: Execute code chunks one by one within RStudio (or your IDE) to pinpoint where an error occurs. * Chunk Option error = TRUE:knitr::opts_chunk$set(error = TRUE) (globally) or {r error=TRUE}` (for a specific chunk) will allow the document to render even if a chunk has an error, printing the error message in the output. This is useful for identifying multiple errors at once. * **Chunk Option `debug = TRUE`:**{r debug=TRUE}(for a specific chunk) can sometimes provide more detailed debugging information or invoke a debugger, though its behavior can vary. * **Inspect Intermediate Files:** For complex issues, especially with PDF output, inspect the intermediate.mdor.texfiles generated byknitror Quarto. This can reveal formatting or LaTeX-specific errors. * Usekeep_md: trueorkeep_tex: truein the YAML header. * **Simplify the Document:** Comment out sections or chunks to isolate the problematic part. * **ChecksessionInfo()`:** Ensure that the R session rendering the document has all necessary packages at the correct versions.
Tips for Debugging .Rmd/.qmd Files: * Chunk Labels: Always use unique and descriptive labels for your code chunks. * Cache with Caution: While caching (cache = TRUE) speeds up rendering, it can sometimes hide issues or use stale results. Clear the cache if you suspect problems. * Small, Focused Chunks: Break down long computations into smaller, more manageable chunks.
3. Debugging Custom R Packages
Developing R packages introduces its own set of debugging challenges, often related to namespaces, documentation, or the build/check process.
Common Issues: * Namespace Errors: Functions not being exported or imported correctly (Error: object 'X' not found when it should be available). * Documentation Mismatches: Discrepancies between function arguments in code and in .Rd files (often caught by R CMD check). * Failing R CMD check: Various warnings or errors related to coding standards, examples, tests, or vignette building. * Test Failures: Unit tests in tests/testthat/ not passing.
Debugging Tools & Techniques: * devtools::load_all() (or pkgload::load_all()): This is the most crucial tool. It simulates package installation and loading, making all exported and internal functions available for interactive testing in the console. Use it frequently during development. * devtools::check(): Runs R CMD check, which performs a comprehensive suite of checks on your package. Address all errors, warnings, and notes. * devtools::test() (or testthat::test_local()): Runs your unit tests. Use browser() within your test files or the functions being tested to debug failures. * RStudio’s Build Pane: Provides convenient buttons for Load All, Test Package, and Check Package. * browser() in Package Functions: You can place browser() directly into your package’s R functions. After devtools::load_all(), calling the function will trigger the debugger. * Debugging Exported vs. Internal Functions: Remember that internal functions (not exported) are accessed with packageName::,functionName, while exported ones are packageName::functionName or directly after library(packageName).
Tips for Debugging Packages: * Iterative Development:Load All and test small changes frequently. * Read R CMD check Output Carefully: The messages, even notes, often point to important issues or best practices. * Use roxygen2: For generating documentation and managing the NAMESPACE file. This reduces manual errors. * Write Comprehensive Unit Tests: Good tests are your first line of defense and a great debugging aid.
Exercise 5.1: Debugging a (Conceptual) Shiny App
Imagine a Shiny app with an input sliderInput("num", "Choose a number", 1, 10, 5) and an output that should display input$num * 2. The output is not updating.
Where would you first place browser() or print() statements to investigate?
What are possible reasons for the output not updating?
Solution:
# 1. Where to place browser() or print():# - Inside the reactive expression that calculates `input$num * 2`.# - Inside the render function for the output that displays the result.# Example:# server <- function(input, output) {# output$doubled_value <- renderText({# current_num <- input$num# print(paste("Slider value:", current_num)) # Print statement# # browser() # Or browser()# result <- current_num * 2# print(paste("Result:", result))# return(result)# })# }# 2. Possible reasons for the output not updating:# - The output UI element is not correctly defined or named.# - The `renderText` (or similar) function is not correctly assigned to `output$doubled_value`.# - The reactive expression is not actually re-evaluating (e.g., `input$num` is not being read reactively).# - An error is occurring silently within the reactive expression before the new value is returned.# - The UI is not correctly displaying the output (e.g., wrong output function used in UI like `textOutput` vs `verbatimTextOutput`).
Exercise 5.2: Debugging an R Markdown Chunk
An R Markdown document has a chunk that loads data and then tries to plot it. The plot is not appearing, and knitting stops with an error “object ‘my_data’ not found” in the plotting chunk.
What is the most likely cause related to chunk execution order or environment?
How would you verify this and fix it?
Solution:
# 1. Most likely cause:# - The chunk that loads `my_data` was not executed before the chunk that tries to plot `my_data`.# - The chunk loading `my_data` might have an error itself, preventing `my_data` from being created.# - `my_data` was created in a chunk, but then removed or modified before the plotting chunk (less likely for simple "not found").# - Chunk options like `eval=FALSE` on the data loading chunk, or a different `engine` that doesn't share the R environment.# 2. How to verify and fix:# - **Verify:**# - Run the data loading chunk manually in the R console or RStudio.# - Check if `my_data` exists in the environment after running the loading chunk (`ls()`, `exists("my_data")`).# - Check for any error messages when running the data loading chunk.# - Ensure the plotting chunk comes *after* the data loading chunk in the document.# - **Fix:**# - Ensure the data loading chunk is correctly written and executes without errors before the plotting chunk.# - Make sure `eval=TRUE` (which is default) for the data loading chunk.# - If using RStudio, try "Run All Chunks Above" for the plotting chunk.
Lesson 6: Preventative Measures and Writing Debuggable Code
While knowing how to debug is crucial, writing code that is less prone to bugs and easier to debug in the first place can save significant time and effort. This lesson focuses on proactive strategies.
1. Unit Testing with testthat
Unit tests are small pieces of code that verify that individual functions (units) of your code work as expected. The testthat package is the standard for unit testing in R.
Importance of Unit Tests: * Catch Bugs Early: Detect regressions (when new changes break existing functionality) immediately. * Facilitate Refactoring: Allow you to change code internals with confidence, as long as tests still pass. * Serve as Documentation: Tests demonstrate how your functions are intended to be used and what their expected outputs are. * Improve Code Design: Thinking about how to test a function often leads to better, more modular design.
Basic Structure of testthat Tests: Tests are typically organized in files within the tests/testthat/ directory of a package (e.g., tests/testthat/test-my_function.R).
test_that("description of what is being tested", { ... }): Defines a block of related tests.
Expectation Functions (expect_*): These functions make assertions about your code.
expect_equal(object, expected_value): Checks for exact equality (for numbers, considers tolerance).
expect_identical(object, expected_value): Checks for exact, bit-for-bit equality.
expect_true(condition), expect_false(condition): Checks if a condition is TRUE or FALSE.
expect_error(expression_that_should_error, regexp_for_error_message): Checks if code throws an error.
# Example: tests/testthat/test-addition.R# Source the function to be tested (if not in a package being loaded via devtools)# source("../../R/addition.R") # Assuming your function is in R/addition.R# A simple function to testadd_numbers <-function(x, y) {if (!is.numeric(x) ||!is.numeric(y)) {stop("Inputs must be numeric") }return(x + y)}library(testthat)test_that("add_numbers works with positive integers", {expect_equal(add_numbers(2, 3), 5)expect_equal(add_numbers(100, 200), 300)})test_that("add_numbers works with zero and negative numbers", {expect_equal(add_numbers(0, 5), 5)expect_equal(add_numbers(-5, 5), 0)expect_equal(add_numbers(-5, -5), -10)})test_that("add_numbers handles non-numeric input", {expect_error(add_numbers("a", 5), "Inputs must be numeric")expect_error(add_numbers(5, "b"), "Inputs must be numeric")})# To run tests for a package:# devtools::test()# To run a specific test file:# testthat::test_file("tests/testthat/test-addition.R")
Running Tests: * For packages: devtools::test() or RStudio’s Build pane. * For individual files: testthat::test_file("path/to/test-file.R").
2. Defensive Programming
Defensive programming involves writing code that anticipates potential problems and handles them gracefully, rather than assuming inputs and states will always be perfect.
Techniques:
Assertions (stopifnot()):stopifnot() checks if its arguments are all TRUE. If not, it throws an error. It’s good for preconditions at the beginning of a function.
calculate_mean_positive <-function(vec) {stopifnot(is.numeric(vec), length(vec) >0, all(vec >0))mean(vec)}# calculate_mean_positive(c(1, 2, -3)) # This will error due to all(vec > 0) being false# calculate_mean_positive(c()) # This will error due to length(vec) > 0 being falseprint(calculate_mean_positive(c(1, 2, 3))) # This works
Input Validation (Checking Arguments): Explicitly check function arguments for type, class, length, range, etc., and provide informative error messages if they are invalid.
create_greeting <-function(name, language ="en") {if (!is.character(name) ||length(name) !=1) {stop("`name` must be a single string.", call. =FALSE) }if (!language %in%c("en", "es")) {stop("`language` must be 'en' or 'es'.", call. =FALSE) }if (language =="en") paste("Hello,", name)elsepaste("Hola,", name)}print(create_greeting("Alice"))
[1] "Hello, Alice"
# print(create_greeting("Bob", "fr")) # Errors
The assertthat package provides more expressive assertion functions (e.g., assert_that(is.number(x))).
Graceful Failure: Instead of just erroring, sometimes functions can return a specific value (e.g., NA, NULL, an empty data frame) or issue a warning for non-critical issues.
Clear Error Messages: Make error messages informative. Tell the user what went wrong, why, and potentially how to fix it.
3. Code Style and Readability
Readable code is easier to understand, maintain, and debug (by yourself and others).
Key Principles:
Consistent Style: Follow a style guide (e.g., the tidyverse style guide). Use linters (like the lintr package) to check and enforce style.
Meaningful Names: Choose descriptive names for variables, functions, and arguments. Avoid overly short or cryptic names.
Good: calculate_average_income, customer_data
Less good: avginc, df1
Effective Commenting: Comments should explain the “why” (the intent or logic), not just the “what” (which should be clear from the code itself). Comment complex sections or non-obvious logic.
Modularity (Small Functions): Break down complex tasks into smaller, single-purpose functions. Each function should do one thing well. This makes them easier to test and debug.
Limit Side Effects: Functions are easiest to reason about when they don’t modify objects outside their own environment (i.e., they don’t have side effects). Prefer functions that take inputs and return outputs.
DRY (Don’t Repeat Yourself): If you find yourself copying and pasting code, consider writing a function instead.
Whitespace: Use whitespace (blank lines, indentation) to structure code visually and improve readability.
Exercise 6.1: Writing a testthat Test
Given the following function: trim_vector <- function(x, n = 1) { x[(n+1):(length(x)-n)] }
Write at least two test_that() blocks for this function.
Include tests for valid inputs and edge cases (e.g., what if n is too large? What if x is short?).
Solution:
test_that("trim_vector works with basic valid inputs", {expect_equal(trim_vector(1:10, n =1), 2:9)expect_equal(trim_vector(letters[1:5], n =2), letters[3])expect_equal(trim_vector(c(TRUE, FALSE, TRUE, FALSE), n =0), c(TRUE, FALSE, TRUE, FALSE))})test_that("trim_vector handles edge cases", {expect_equal(trim_vector(1:5, n =2), 3) # Trims to a single elementexpect_equal(length(trim_vector(1:5, n =3)), 0) # n is large, results in empty vectorexpect_equal(length(trim_vector(1:2, n =1)), 0) # Trims everythingexpect_equal(trim_vector(c(), n =1), c()) # Empty inputexpect_error(trim_vector(1:5, n =-1)) # Invalid n})
Exercise 6.2: Improving Code Readability
Consider this poorly written function: f <- function(d, c1, c2, v) { d_s <- d[d[[c1]] > v, ]; aggregate(d_s[[c2]] ~ d_s[[c1]], FUN=mean) }
Rewrite this function with more descriptive names and better formatting.
Add comments explaining its purpose and arguments.
Add basic input validation.
Solution:
calculate_mean_by_group_filtered <-function(data, group_col_name, value_col_name, filter_threshold) {# Calculates the mean of 'value_col_name' for each group in 'group_col_name',# after filtering 'data' where 'group_col_name' is greater than 'filter_threshold'.# # Args:# data: A data frame.# group_col_name: Character string, name of the column to group by and filter on.# value_col_name: Character string, name of the column to calculate the mean from.# filter_threshold: Numeric, the value to filter 'group_col_name' by (rows kept if > threshold).## Returns:# A data frame with group means, or NULL if inputs are invalid.# Input validationif (!is.data.frame(data)) {stop("Input 'data' must be a data frame.") }if (!is.character(group_col_name) ||length(group_col_name) !=1||!group_col_name %in%names(data)) {stop("'group_col_name' must be a valid column name in 'data'.") }if (!is.character(value_col_name) ||length(value_col_name) !=1||!value_col_name %in%names(data)) {stop("'value_col_name' must be a valid column name in 'data'.") }if (!is.numeric(data[[group_col_name]])) {stop(paste("Column '", group_col_name, "' must be numeric for filtering.", sep="")) }if (!is.numeric(data[[value_col_name]])) {stop(paste("Column '", value_col_name, "' must be numeric for calculating mean.", sep="")) }if (!is.numeric(filter_threshold) ||length(filter_threshold) !=1) {stop("'filter_threshold' must be a single numeric value.") }# Filter the data: keep rows where the group_col_name value is greater than filter_threshold subset_data <- data[data[[group_col_name]] > filter_threshold, ]if (nrow(subset_data) ==0) {warning("No data remains after filtering. Returning empty result.")# Create an empty data frame with expected column names empty_df <-data.frame(matrix(ncol =2, nrow =0))names(empty_df) <-c(group_col_name, value_col_name) # Adjust if aggregate names differentlyreturn(empty_df) }# Construct the formula for aggregation dynamically# e.g., value_column ~ group_column formula_str <-paste(value_col_name, "~", group_col_name) agg_formula <-as.formula(formula_str)# Aggregate to find the mean of value_col_name for each group in group_col_name aggregated_results <-aggregate(agg_formula, data = subset_data, FUN = mean)return(aggregated_results)}
Course Conclusion
Congratulations on completing the Troubleshooting and Debugging in R course!
Throughout these lessons, you’ve learned to: * Interpret R’s error messages effectively. * Utilize a range of built-in debugging tools like browser(), traceback(), tryCatch(). * Apply systematic approaches to isolate and resolve bugs. * Leverage advanced techniques such as recover() for post-mortem debugging and profvis for performance profiling. * Adapt your debugging strategies for specific contexts like Shiny, R Markdown/Quarto, and package development. * Embrace preventative measures, including writing unit tests with testthat, practicing defensive programming, and maintaining a clean, readable code style.
Debugging is a skill honed through practice. The more you encounter and solve errors, the more proficient you’ll become. Remember to be patient, methodical, and don’t hesitate to use the tools and techniques covered here.
Happy coding, and may your bugs be few and easy to find!