3 Functional programming with purrr
3.1 Introduction to Functional Programming with purrr
The purrr package in R provides a set of tools to make your data processing more consistent, readable, and powerful. Think of it like a Swiss Army knife for data manipulation. This chapter will walk you through the key concepts and functions of purrr, also referred to as “functional programming” in R. It combines concepts from developing custom functions and looping from the previous chapters to create more efficient and maintainable code.
3.1.1 Key Concepts
- Mapping: Applying the same function to each element of a list or vector
- Filtering: Selecting elements based on specific conditions
- Reducing: Combining multiple elements into a single result
- Error Handling: Gracefully managing variations in data
3.2 Getting Started with purrr
First, install and load the package:
# Install purrr if not already installed
install.packages("purrr")
library(purrr)3.2.1 Practical Example 1: “Mapping” Functions Across Village Datasets
# Simulated village malaria test results
village_tests <- list(
northern = c(45, 67, 89, 32, 56),
southern = c(62, 78, 91, 53, 70),
eastern = c(55, 72, 86, 41, 63)
)
# Calculate average parasite count for each village
average_counts <- map_dbl(village_tests, mean)
print(average_counts)
# Identify villages above a certain parasite threshold
high_parasite_villages <- map_lgl(village_tests, function(x) any(x > 70))
print(high_parasite_villages)3.2.2 Practical Example 2: Complex Data Transformation
# Simulate patient treatment response data
patient_data <- list(
patient1 = list(parasite_count = 500, age = 25, treatment_response = 0.8),
patient2 = list(parasite_count = 250, age = 40, treatment_response = 0.7),
patient3 = list(parasite_count = 750, age = 35, treatment_response = 0.6)
)
# Extract specific information across patients
parasite_counts <- map_dbl(patient_data, "parasite_count")
treatment_responses <- map_dbl(patient_data, "treatment_response")
print(parasite_counts)
print(treatment_responses)3.3 Advanced purrr Techniques
3.3.1 Safely Handling Variations in Data
The purrr package provides a set of tools to handle variations in data, which can be important for catch errors and improve code robustness.
# Simulate potentially problematic data
unreliable_data <- list(
complete_record = list(parasite_count = 300),
partial_record = list(),
invalid_record = NULL
)
# Safely extract parasite counts
safe_extraction <- map(unreliable_data, safely(~.x$parasite_count))
print(safe_extraction)3.3.2 Reducing and Combining Data
The reduce function in purrr allows you to combine elements of a list into a single value.
# Combine treatment efficacy across multiple interventions
intervention_results <- list(
net_distribution = 0.6,
medication_efficacy = 0.75,
community_education = 0.5
)
# Calculate combined intervention impact
combined_impact <- reduce(intervention_results, `*`)
print(combined_impact)3.4 Advanced Example: Custom Function with Dataframe Analysis
We can also use custom functions and the map() to apply the same function to each row of a dataframe. This is similar to create a loop but may be more efficient, readable, and “safer” (e.g., more likely to fail gracefully than a standard loop, which is a good thing!).
# Load required libraries
library(purrr)
library(dplyr)
# Create a sample dataframe of malaria patient records
malaria_patients <- data.frame(
patient_id = 1:10,
parasite_count = c(50, 120, 300, 750, 1500, 200, 450, 80, 600, 250),
age = c(25, 35, 45, 12, 60, 28, 40, 15, 50, 32),
region = c("Northern", "Southern", "Eastern", "Northern", "Southern",
"Eastern", "Northern", "Southern", "Eastern", "Northern")
)
# Create a custom function to assess patient risk
assess_malaria_risk <- function(parasite_count, age) {
# Risk calculation based on parasite count and age
base_risk <- case_when(
parasite_count < 100 ~ 1, # Low risk
parasite_count < 500 ~ 2, # Moderate risk
TRUE ~ 3 # High risk
)
# Adjust risk based on age
age_factor <- case_when(
age < 5 | age > 65 ~ 1.5, # Higher risk for children and elderly
age < 18 | age > 50 ~ 1.2, # Slightly increased risk
TRUE ~ 1 # Standard risk
)
# Calculate final risk score
round(base_risk * age_factor, 2)
}
# Apply the custom function using purrr
malaria_patients <- malaria_patients %>%
mutate(
risk_score = map2_dbl(parasite_count, age, assess_malaria_risk),
risk_category = case_when(
risk_score < 1.5 ~ "Low Risk",
risk_score < 2.5 ~ "Moderate Risk",
TRUE ~ "High Risk"
)
)
# Analyze risk by region
risk_summary <- malaria_patients %>%
group_by(region) %>%
summarise(
avg_risk_score = mean(risk_score),
high_risk_count = sum(risk_category == "High Risk"),
total_patients = n()
)
print(malaria_patients)
print(risk_summary)3.4.1 Breaking Down the Example
That was a complex example, so let’s take a breakdown of what’s happening. This example demonstrates:
- Creating a custom risk assessment function
- Using
map2_dbl()to apply a function with two inputs - Combining functional programming with data manipulation
- Generating insights from patient data
Key purrr Functions Used:
map2_dbl(): Applies a function to two vectors simultaneously- Integrates seamlessly with
dplyrfor data manipulation
The map2_dbl() allows use to apply a function with two inputs and forces the output to be a “double” data type. This is important for numerical operations, and we actually want to return an error if a non-numeric value is encountered.
3.5 Additional Powerful purrr Techniques
3.5.1 Handling Multiple Columns with pmap()
We are not just limited to two inputs, we can use pmap() to apply a function across multiple columns.
# Example of applying a function across multiple columns
complex_analysis <- function(parasite, age, region) {
# More complex analysis combining multiple factors
base_score <- case_when(
region == "Northern" ~ parasite * 1.2,
region == "Southern" ~ parasite * 1.1,
TRUE ~ parasite
) * (1 + (age / 100))
return(round(base_score, 2))
}
# Apply complex analysis using pmap
malaria_patients <- malaria_patients %>%
mutate(
complex_score = pmap_dbl(
list(parasite_count, age, region),
complex_analysis
)
)3.6 Common Pitfalls and Best Practices
- Understand Your Data: Know the structure before mapping
- Use Appropriate Map Functions:
map()for listsmap_dbl()for numeric outputsmap_chr()for character outputs
- Handle Potential Errors
- Keep Functions Simple and Focused
3.7 Real-World Context: Why Functional Programming Matters
Functional programming in R allows you to:
- Process complex health datasets more efficiently
- Reduce code complexity
- Create more maintainable research tools
- Quickly adapt to changing research requirements
3.8 When to Use purrr vs. Traditional Loops
Ask yourself:
- Am I performing the same operation across multiple elements?
- Do I need a clean, readable way to transform data?
- Can I break my task into simple, repeatable functions?
If yes, purrr might be your best approach.
3.9 Conclusion
Functional programming transforms how you approach data analysis. For malaria researchers, it’s not just a programming technique—it’s a way to uncover insights faster, more accurately, and with less code.
3.10 Recommended Next Steps
- Practice with your own datasets
- Explore more advanced
purrrfunctions - Combine with other data manipulation techniques
- Focus on writing clear, modular functions