3 Functional programming with purrr
3.1 Introduction to Functional Programming with purrr
The purrr
package in R provides a set of tools to make your data processing more consistent, readable, and powerful. Think of it like a Swiss Army knife for data manipulation. This chapter will walk you through the key concepts and functions of purrr, also referred to as “functional programming” in R. It combines concepts from developing custom functions and looping from the previous chapters to create more efficient and maintainable code.
3.1.1 Key Concepts
- Mapping: Applying the same function to each element of a list or vector
- Filtering: Selecting elements based on specific conditions
- Reducing: Combining multiple elements into a single result
- Error Handling: Gracefully managing variations in data
3.2 Getting Started with purrr
First, install and load the package:
# Install purrr if not already installed
install.packages("purrr")
library(purrr)
3.2.1 Practical Example 1: “Mapping” Functions Across Village Datasets
# Simulated village malaria test results
<- list(
village_tests northern = c(45, 67, 89, 32, 56),
southern = c(62, 78, 91, 53, 70),
eastern = c(55, 72, 86, 41, 63)
)
# Calculate average parasite count for each village
<- map_dbl(village_tests, mean)
average_counts print(average_counts)
# Identify villages above a certain parasite threshold
<- map_lgl(village_tests, function(x) any(x > 70))
high_parasite_villages print(high_parasite_villages)
3.2.2 Practical Example 2: Complex Data Transformation
# Simulate patient treatment response data
<- list(
patient_data patient1 = list(parasite_count = 500, age = 25, treatment_response = 0.8),
patient2 = list(parasite_count = 250, age = 40, treatment_response = 0.7),
patient3 = list(parasite_count = 750, age = 35, treatment_response = 0.6)
)
# Extract specific information across patients
<- map_dbl(patient_data, "parasite_count")
parasite_counts <- map_dbl(patient_data, "treatment_response")
treatment_responses
print(parasite_counts)
print(treatment_responses)
3.3 Advanced purrr Techniques
3.3.1 Safely Handling Variations in Data
The purrr package provides a set of tools to handle variations in data, which can be important for catch errors and improve code robustness.
# Simulate potentially problematic data
<- list(
unreliable_data complete_record = list(parasite_count = 300),
partial_record = list(),
invalid_record = NULL
)
# Safely extract parasite counts
<- map(unreliable_data, safely(~.x$parasite_count))
safe_extraction print(safe_extraction)
3.3.2 Reducing and Combining Data
The reduce function in purrr allows you to combine elements of a list into a single value.
# Combine treatment efficacy across multiple interventions
<- list(
intervention_results net_distribution = 0.6,
medication_efficacy = 0.75,
community_education = 0.5
)
# Calculate combined intervention impact
<- reduce(intervention_results, `*`)
combined_impact print(combined_impact)
3.4 Advanced Example: Custom Function with Dataframe Analysis
We can also use custom functions and the map() to apply the same function to each row of a dataframe. This is similar to create a loop but may be more efficient, readable, and “safer” (e.g., more likely to fail gracefully than a standard loop, which is a good thing!).
# Load required libraries
library(purrr)
library(dplyr)
# Create a sample dataframe of malaria patient records
<- data.frame(
malaria_patients patient_id = 1:10,
parasite_count = c(50, 120, 300, 750, 1500, 200, 450, 80, 600, 250),
age = c(25, 35, 45, 12, 60, 28, 40, 15, 50, 32),
region = c("Northern", "Southern", "Eastern", "Northern", "Southern",
"Eastern", "Northern", "Southern", "Eastern", "Northern")
)
# Create a custom function to assess patient risk
<- function(parasite_count, age) {
assess_malaria_risk # Risk calculation based on parasite count and age
<- case_when(
base_risk < 100 ~ 1, # Low risk
parasite_count < 500 ~ 2, # Moderate risk
parasite_count TRUE ~ 3 # High risk
)
# Adjust risk based on age
<- case_when(
age_factor < 5 | age > 65 ~ 1.5, # Higher risk for children and elderly
age < 18 | age > 50 ~ 1.2, # Slightly increased risk
age TRUE ~ 1 # Standard risk
)
# Calculate final risk score
round(base_risk * age_factor, 2)
}
# Apply the custom function using purrr
<- malaria_patients %>%
malaria_patients mutate(
risk_score = map2_dbl(parasite_count, age, assess_malaria_risk),
risk_category = case_when(
< 1.5 ~ "Low Risk",
risk_score < 2.5 ~ "Moderate Risk",
risk_score TRUE ~ "High Risk"
)
)
# Analyze risk by region
<- malaria_patients %>%
risk_summary group_by(region) %>%
summarise(
avg_risk_score = mean(risk_score),
high_risk_count = sum(risk_category == "High Risk"),
total_patients = n()
)
print(malaria_patients)
print(risk_summary)
3.4.1 Breaking Down the Example
That was a complex example, so let’s take a breakdown of what’s happening. This example demonstrates:
- Creating a custom risk assessment function
- Using
map2_dbl()
to apply a function with two inputs - Combining functional programming with data manipulation
- Generating insights from patient data
Key purrr
Functions Used:
map2_dbl()
: Applies a function to two vectors simultaneously- Integrates seamlessly with
dplyr
for data manipulation
The map2_dbl()
allows use to apply a function with two inputs and forces the output to be a “double” data type. This is important for numerical operations, and we actually want to return an error if a non-numeric value is encountered.
3.5 Additional Powerful purrr Techniques
3.5.1 Handling Multiple Columns with pmap()
We are not just limited to two inputs, we can use pmap()
to apply a function across multiple columns.
# Example of applying a function across multiple columns
<- function(parasite, age, region) {
complex_analysis # More complex analysis combining multiple factors
<- case_when(
base_score == "Northern" ~ parasite * 1.2,
region == "Southern" ~ parasite * 1.1,
region TRUE ~ parasite
* (1 + (age / 100))
)
return(round(base_score, 2))
}
# Apply complex analysis using pmap
<- malaria_patients %>%
malaria_patients mutate(
complex_score = pmap_dbl(
list(parasite_count, age, region),
complex_analysis
) )
3.6 Common Pitfalls and Best Practices
- Understand Your Data: Know the structure before mapping
- Use Appropriate Map Functions:
map()
for listsmap_dbl()
for numeric outputsmap_chr()
for character outputs
- Handle Potential Errors
- Keep Functions Simple and Focused
3.7 Real-World Context: Why Functional Programming Matters
Functional programming in R allows you to:
- Process complex health datasets more efficiently
- Reduce code complexity
- Create more maintainable research tools
- Quickly adapt to changing research requirements
3.8 When to Use purrr vs. Traditional Loops
Ask yourself:
- Am I performing the same operation across multiple elements?
- Do I need a clean, readable way to transform data?
- Can I break my task into simple, repeatable functions?
If yes, purrr
might be your best approach.
3.9 Conclusion
Functional programming transforms how you approach data analysis. For malaria researchers, it’s not just a programming technique—it’s a way to uncover insights faster, more accurately, and with less code.
3.10 Recommended Next Steps
- Practice with your own datasets
- Explore more advanced
purrr
functions - Combine with other data manipulation techniques
- Focus on writing clear, modular functions