3  Functional programming with purrr

3.1 Introduction to Functional Programming with purrr

The purrr package in R provides a set of tools to make your data processing more consistent, readable, and powerful. Think of it like a Swiss Army knife for data manipulation. This chapter will walk you through the key concepts and functions of purrr, also referred to as “functional programming” in R. It combines concepts from developing custom functions and looping from the previous chapters to create more efficient and maintainable code.

3.1.1 Key Concepts

  1. Mapping: Applying the same function to each element of a list or vector
  2. Filtering: Selecting elements based on specific conditions
  3. Reducing: Combining multiple elements into a single result
  4. Error Handling: Gracefully managing variations in data

3.2 Getting Started with purrr

First, install and load the package:

# Install purrr if not already installed
install.packages("purrr")
library(purrr)

3.2.1 Practical Example 1: “Mapping” Functions Across Village Datasets

# Simulated village malaria test results
village_tests <- list(
    northern = c(45, 67, 89, 32, 56),
    southern = c(62, 78, 91, 53, 70),
    eastern = c(55, 72, 86, 41, 63)
)

# Calculate average parasite count for each village
average_counts <- map_dbl(village_tests, mean)
print(average_counts)

# Identify villages above a certain parasite threshold
high_parasite_villages <- map_lgl(village_tests, function(x) any(x > 70))
print(high_parasite_villages)

3.2.2 Practical Example 2: Complex Data Transformation

# Simulate patient treatment response data
patient_data <- list(
    patient1 = list(parasite_count = 500, age = 25, treatment_response = 0.8),
    patient2 = list(parasite_count = 250, age = 40, treatment_response = 0.7),
    patient3 = list(parasite_count = 750, age = 35, treatment_response = 0.6)
)

# Extract specific information across patients
parasite_counts <- map_dbl(patient_data, "parasite_count")
treatment_responses <- map_dbl(patient_data, "treatment_response")

print(parasite_counts)
print(treatment_responses)

3.3 Advanced purrr Techniques

3.3.1 Safely Handling Variations in Data

The purrr package provides a set of tools to handle variations in data, which can be important for catch errors and improve code robustness.

# Simulate potentially problematic data
unreliable_data <- list(
    complete_record = list(parasite_count = 300),
    partial_record = list(),
    invalid_record = NULL
)

# Safely extract parasite counts
safe_extraction <- map(unreliable_data, safely(~.x$parasite_count))
print(safe_extraction)

3.3.2 Reducing and Combining Data

The reduce function in purrr allows you to combine elements of a list into a single value.

# Combine treatment efficacy across multiple interventions
intervention_results <- list(
    net_distribution = 0.6,
    medication_efficacy = 0.75,
    community_education = 0.5
)

# Calculate combined intervention impact
combined_impact <- reduce(intervention_results, `*`)
print(combined_impact)

3.4 Advanced Example: Custom Function with Dataframe Analysis

We can also use custom functions and the map() to apply the same function to each row of a dataframe. This is similar to create a loop but may be more efficient, readable, and “safer” (e.g., more likely to fail gracefully than a standard loop, which is a good thing!).

# Load required libraries
library(purrr)
library(dplyr)

# Create a sample dataframe of malaria patient records
malaria_patients <- data.frame(
    patient_id = 1:10,
    parasite_count = c(50, 120, 300, 750, 1500, 200, 450, 80, 600, 250),
    age = c(25, 35, 45, 12, 60, 28, 40, 15, 50, 32),
    region = c("Northern", "Southern", "Eastern", "Northern", "Southern", 
               "Eastern", "Northern", "Southern", "Eastern", "Northern")
)

# Create a custom function to assess patient risk
assess_malaria_risk <- function(parasite_count, age) {
    # Risk calculation based on parasite count and age
    base_risk <- case_when(
        parasite_count < 100 ~ 1,  # Low risk
        parasite_count < 500 ~ 2,  # Moderate risk
        TRUE ~ 3                   # High risk
    )
    
    # Adjust risk based on age
    age_factor <- case_when(
        age < 5 | age > 65 ~ 1.5,  # Higher risk for children and elderly
        age < 18 | age > 50 ~ 1.2, # Slightly increased risk
        TRUE ~ 1                   # Standard risk
    )
    
    # Calculate final risk score
    round(base_risk * age_factor, 2)
}

# Apply the custom function using purrr
malaria_patients <- malaria_patients %>%
    mutate(
        risk_score = map2_dbl(parasite_count, age, assess_malaria_risk),
        risk_category = case_when(
            risk_score < 1.5 ~ "Low Risk",
            risk_score < 2.5 ~ "Moderate Risk",
            TRUE ~ "High Risk"
        )
    )

# Analyze risk by region
risk_summary <- malaria_patients %>%
    group_by(region) %>%
    summarise(
        avg_risk_score = mean(risk_score),
        high_risk_count = sum(risk_category == "High Risk"),
        total_patients = n()
    )

print(malaria_patients)
print(risk_summary)

3.4.1 Breaking Down the Example

That was a complex example, so let’s take a breakdown of what’s happening. This example demonstrates:

  • Creating a custom risk assessment function
  • Using map2_dbl() to apply a function with two inputs
  • Combining functional programming with data manipulation
  • Generating insights from patient data

Key purrr Functions Used:

  • map2_dbl(): Applies a function to two vectors simultaneously
  • Integrates seamlessly with dplyr for data manipulation

The map2_dbl() allows use to apply a function with two inputs and forces the output to be a “double” data type. This is important for numerical operations, and we actually want to return an error if a non-numeric value is encountered.

3.5 Additional Powerful purrr Techniques

3.5.1 Handling Multiple Columns with pmap()

We are not just limited to two inputs, we can use pmap() to apply a function across multiple columns.

# Example of applying a function across multiple columns
complex_analysis <- function(parasite, age, region) {
    # More complex analysis combining multiple factors
    base_score <- case_when(
        region == "Northern" ~ parasite * 1.2,
        region == "Southern" ~ parasite * 1.1,
        TRUE ~ parasite
    ) * (1 + (age / 100))
    
    return(round(base_score, 2))
}

# Apply complex analysis using pmap
malaria_patients <- malaria_patients %>%
    mutate(
        complex_score = pmap_dbl(
            list(parasite_count, age, region), 
            complex_analysis
        )
    )

3.6 Common Pitfalls and Best Practices

  1. Understand Your Data: Know the structure before mapping
  2. Use Appropriate Map Functions:
    1. map() for lists
    2. map_dbl() for numeric outputs
    3. map_chr() for character outputs
  3. Handle Potential Errors
  4. Keep Functions Simple and Focused

3.7 Real-World Context: Why Functional Programming Matters

Functional programming in R allows you to:

  • Process complex health datasets more efficiently
  • Reduce code complexity
  • Create more maintainable research tools
  • Quickly adapt to changing research requirements

3.8 When to Use purrr vs. Traditional Loops

Ask yourself:

  • Am I performing the same operation across multiple elements?
  • Do I need a clean, readable way to transform data?
  • Can I break my task into simple, repeatable functions?

If yes, purrr might be your best approach.

3.9 Conclusion

Functional programming transforms how you approach data analysis. For malaria researchers, it’s not just a programming technique—it’s a way to uncover insights faster, more accurately, and with less code.