1 Writing custom functions in R
2 Understanding Custom Functions in R for Public Health Data Analysis
2.1 What are Custom Functions?
In R, a function is a block of code that performs a specific task or set of operations. It’s a way to encapsulate a set of instructions and make them reusable within your code. We have multiple functions in R in the previous chapters, such as mean
, sqrt
, sum
, and round
, was are available by default (these are called the base R functions). We have also used functions that have come as part of packages like dplyr
and ggplot2
that we install in order to expand the available functions in R.
In this lesson, we will learn at how to create custom functions to perform specific tasks.
2.2 Why Do We Need Custom Functions?
Imagine you’re working with a large dataset of public health records. Every day, you need to perform the same set of calculations or data transformations. Typing out the same code repeatedly would be time-consuming and prone to errors. This is where custom functions come to the rescue.
2.2.1 Real-World Scenarios
Let’s consider a practical example. You’re analyzing vaccination rates across different regions during a public health campaign. You might need to:
- Calculate coverage percentages
- Identify areas with low vaccination rates
- Standardize data across multiple datasets
Instead of copying and pasting code or manually changing values each time, you can create a function that does this work for you with just a few lines of code.
2.3 What is a Function?
Before we start, it might be helpful to develop a mental model of what a function actually does. Think of a function like a recipe. Just as a recipe takes ingredients, follows a specific set of steps, and produces a dish, a function in R:
- Takes inputs (ingredients)
- Performs a specific set of operations (cooking steps)
- Produces an output (the final meal)
2.3.1 A Simple Public Health Example
Here’s a basic function to calculate vaccination coverage:
<- function(vaccinated_count, total_population) {
calculate_vaccination_coverage # Calculate the percentage of vaccinated individuals
<- (vaccinated_count / total_population) * 100
coverage_rate
# Round to two decimal places
<- round(coverage_rate, 2)
rounded_rate
return(rounded_rate)
}
# Example usage
<- calculate_vaccination_coverage(5000, 10000)
county_vaccination print(county_vaccination) # Outputs: 50
2.4 Building More Complex Functions
2.4.1 Handling Multiple Scenarios
Let’s create a function that provides more detailed vaccination insights:
<- function(vaccinated_count, total_population, threshold = 70) {
analyze_vaccination_status # Calculate coverage rate
<- (vaccinated_count / total_population) * 100
coverage_rate
# Determine status based on coverage
if (coverage_rate >= threshold) {
<- "Target Achieved"
status else if (coverage_rate >= 50) {
} <- "Needs Improvement"
status else {
} <- "Critical"
status
}
# Return a list with details
return(list(
coverage_rate = round(coverage_rate, 2),
population_total = total_population,
vaccinated_count = vaccinated_count,
status = status
))
}
# Using the function
<- analyze_vaccination_status(6500, 10000)
result print(result)
2.5 Making Your Functions Flexible
2.5.1 Default Values and Optional Arguments
Notice in the previous example, we used a default threshold of 70%. You can easily change this:
# Using default threshold
analyze_vaccination_status(6500, 10000)
# Changing the threshold
analyze_vaccination_status(6500, 10000, threshold = 80)
2.6 Handling Potential Errors
Public health data can be messy. Your functions should be robust:
<- function(vaccinated_count, total_population) {
safe_vaccination_analysis # Check for invalid inputs
if (vaccinated_count < 0 || total_population <= 0) {
stop("Invalid input: Counts must be non-negative and population must be positive")
}
if (vaccinated_count > total_population) {
warning("Vaccinated count exceeds total population")
}
# Perform analysis
<- (vaccinated_count / total_population) * 100
coverage_rate return(round(coverage_rate, 2))
}
2.7 Practical Tips for Public Health Data Analysts
- Keep Functions Simple: Each function should do one thing well
- Use Clear Names:
calculate_vaccination_coverage
is better thanfunc1
- Add Comments: Explain what your function does
- Test Your Functions: Try different scenarios
- Reuse and Adapt: Create a library of useful functions for your work
2.8 When to Create a Custom Function
Ask yourself:
- Do I repeat this code multiple times?
- Would a function make my analysis more readable?
- Can I generalize this calculation?
If you answer “yes” to these, it’s time to write a function!
2.9 Conclusion
Custom functions are your friends in data analysis. They help you:
- Reduce repetitive code
- Minimize errors
- Make your analysis more organized
- Save time in the long run
Practice creating functions, and soon they’ll become second nature in your R programming toolkit.
2.10 Challenge Exercises: Creating Custom Functions for Malaria Research
2.10.1 Challenge 1: Parasite Density Calculation
Create a function that calculates parasite density from blood smear data. The function should:
- Take inputs for total parasites counted and volume of blood examined
- Calculate parasites per microliter
- Provide a classification of infection intensity:
- Low: < 1,000 parasites/µL
- Moderate: 1,000 - 10,000 parasites/µL
- High: > 10,000 parasites/µL
# Example expected implementation
<- function(total_parasites, blood_volume_uL) {
parasite_density_analysis # Your code here
}
# Test cases
# parasite_density_analysis(50, 0.1) # Should return appropriate result
2.10.2 Challenge 2: Artemisinin Treatment Efficacy
Develop a function to analyze treatment outcomes for artemisinin-based combination therapy (ACT). The function should:
- Accept parameters for:
- Initial parasite count
- Final parasite count
- Treatment duration
- Calculate:
- Parasite clearance rate
- Treatment efficacy percentage
- Flag potential drug resistance if clearance is below 99%
# Example expected implementation
<- function(initial_count, final_count, treatment_days) {
act_treatment_analysis # Your code here
}
# Test cases
# act_treatment_analysis(10000, 100, 3) # Should return comprehensive analysis
2.10.3 Challenge 3: Mosquito Net Coverage Calculator
Create a function to assess mosquito net coverage in a region. The function should:
- Calculate net coverage percentage
- Determine protection level based on:
- Total population
- Number of nets distributed
- Average household size
- Provide recommendations for additional net distribution
# Example expected implementation
<- function(total_population, nets_distributed, avg_household_size) {
mosquito_net_coverage # Your code here
}
# Test cases
# mosquito_net_coverage(50000, 15000, 5) # Should return detailed coverage analysis
2.10.4 Challenge 4: Age-Based Malaria Risk Stratification
Design a function that stratifies malaria risk by age group. The function should:
- Accept demographic data
- Categorize risk levels based on age
- Calculate potential intervention needs
- Provide summary statistics
# Example expected implementation
<- function(population_data) {
malaria_age_risk # Your code here
# population_data should be a data frame with age and other relevant columns
}
# Test cases
# sample_data <- data.frame(
# age = c(5, 15, 25, 35, 45, 55),
# location = c("rural", "urban", "rural", "urban", "rural", "urban")
# )
# malaria_risk_analysis(sample_data)
2.10.5 Bonus Challenge: Error Handling and Robust Design
For each of these functions, consider:
- What happens with negative numbers?
- How do you handle missing data?
- Can you add informative warning messages?
- What default values might be appropriate?
2.10.6 Submission Guidelines
For each challenge: 1. Write a function that meets the described requirements 2. Include comments explaining your logic 3. Demonstrate the function with at least two different test cases 4. Implement appropriate error checking
Tip: There’s no single “correct” solution. Focus on clear, readable, and robust code!