1  Writing custom functions in R

2 Understanding Custom Functions in R for Public Health Data Analysis

2.1 What are Custom Functions?

In R, a function is a block of code that performs a specific task or set of operations. It’s a way to encapsulate a set of instructions and make them reusable within your code. We have multiple functions in R in the previous chapters, such as mean, sqrt, sum, and round, was are available by default (these are called the base R functions). We have also used functions that have come as part of packages like dplyr and ggplot2 that we install in order to expand the available functions in R.

In this lesson, we will learn at how to create custom functions to perform specific tasks.

2.2 Why Do We Need Custom Functions?

Imagine you’re working with a large dataset of public health records. Every day, you need to perform the same set of calculations or data transformations. Typing out the same code repeatedly would be time-consuming and prone to errors. This is where custom functions come to the rescue.

2.2.1 Real-World Scenarios

Let’s consider a practical example. You’re analyzing vaccination rates across different regions during a public health campaign. You might need to:

  • Calculate coverage percentages
  • Identify areas with low vaccination rates
  • Standardize data across multiple datasets

Instead of copying and pasting code or manually changing values each time, you can create a function that does this work for you with just a few lines of code.

2.3 What is a Function?

Before we start, it might be helpful to develop a mental model of what a function actually does. Think of a function like a recipe. Just as a recipe takes ingredients, follows a specific set of steps, and produces a dish, a function in R:

  • Takes inputs (ingredients)
  • Performs a specific set of operations (cooking steps)
  • Produces an output (the final meal)

2.3.1 A Simple Public Health Example

Here’s a basic function to calculate vaccination coverage:

calculate_vaccination_coverage <- function(vaccinated_count, total_population) {
    # Calculate the percentage of vaccinated individuals
    coverage_rate <- (vaccinated_count / total_population) * 100
    
    # Round to two decimal places
    rounded_rate <- round(coverage_rate, 2)
    
    return(rounded_rate)
}

# Example usage
county_vaccination <- calculate_vaccination_coverage(5000, 10000)
print(county_vaccination)  # Outputs: 50

2.4 Building More Complex Functions

2.4.1 Handling Multiple Scenarios

Let’s create a function that provides more detailed vaccination insights:

analyze_vaccination_status <- function(vaccinated_count, total_population, threshold = 70) {
    # Calculate coverage rate
    coverage_rate <- (vaccinated_count / total_population) * 100
    
    # Determine status based on coverage
    if (coverage_rate >= threshold) {
        status <- "Target Achieved"
    } else if (coverage_rate >= 50) {
        status <- "Needs Improvement"
    } else {
        status <- "Critical"
    }
    
    # Return a list with details
    return(list(
        coverage_rate = round(coverage_rate, 2),
        population_total = total_population,
        vaccinated_count = vaccinated_count,
        status = status
    ))
}

# Using the function
result <- analyze_vaccination_status(6500, 10000)
print(result)

2.5 Making Your Functions Flexible

2.5.1 Default Values and Optional Arguments

Notice in the previous example, we used a default threshold of 70%. You can easily change this:

# Using default threshold
analyze_vaccination_status(6500, 10000)

# Changing the threshold
analyze_vaccination_status(6500, 10000, threshold = 80)

2.6 Handling Potential Errors

Public health data can be messy. Your functions should be robust:

safe_vaccination_analysis <- function(vaccinated_count, total_population) {
    # Check for invalid inputs
    if (vaccinated_count < 0 || total_population <= 0) {
        stop("Invalid input: Counts must be non-negative and population must be positive")
    }
    
    if (vaccinated_count > total_population) {
        warning("Vaccinated count exceeds total population")
    }
    
    # Perform analysis
    coverage_rate <- (vaccinated_count / total_population) * 100
    return(round(coverage_rate, 2))
}

2.7 Practical Tips for Public Health Data Analysts

  1. Keep Functions Simple: Each function should do one thing well
  2. Use Clear Names: calculate_vaccination_coverage is better than func1
  3. Add Comments: Explain what your function does
  4. Test Your Functions: Try different scenarios
  5. Reuse and Adapt: Create a library of useful functions for your work

2.8 When to Create a Custom Function

Ask yourself:

  • Do I repeat this code multiple times?
  • Would a function make my analysis more readable?
  • Can I generalize this calculation?

If you answer “yes” to these, it’s time to write a function!

2.9 Conclusion

Custom functions are your friends in data analysis. They help you:

  • Reduce repetitive code
  • Minimize errors
  • Make your analysis more organized
  • Save time in the long run

Practice creating functions, and soon they’ll become second nature in your R programming toolkit.

2.10 Challenge Exercises: Creating Custom Functions for Malaria Research

2.10.1 Challenge 1: Parasite Density Calculation

Create a function that calculates parasite density from blood smear data. The function should:

  • Take inputs for total parasites counted and volume of blood examined
  • Calculate parasites per microliter
  • Provide a classification of infection intensity:
    • Low: < 1,000 parasites/µL
    • Moderate: 1,000 - 10,000 parasites/µL
    • High: > 10,000 parasites/µL
# Example expected implementation
parasite_density_analysis <- function(total_parasites, blood_volume_uL) {
    # Your code here
}

# Test cases
# parasite_density_analysis(50, 0.1)  # Should return appropriate result

2.10.2 Challenge 2: Artemisinin Treatment Efficacy

Develop a function to analyze treatment outcomes for artemisinin-based combination therapy (ACT). The function should:

  • Accept parameters for:
    • Initial parasite count
    • Final parasite count
    • Treatment duration
  • Calculate:
    • Parasite clearance rate
    • Treatment efficacy percentage
    • Flag potential drug resistance if clearance is below 99%
# Example expected implementation
act_treatment_analysis <- function(initial_count, final_count, treatment_days) {
    # Your code here
}

# Test cases
# act_treatment_analysis(10000, 100, 3)  # Should return comprehensive analysis

2.10.3 Challenge 3: Mosquito Net Coverage Calculator

Create a function to assess mosquito net coverage in a region. The function should:

  • Calculate net coverage percentage
  • Determine protection level based on:
    • Total population
    • Number of nets distributed
    • Average household size
  • Provide recommendations for additional net distribution
# Example expected implementation
mosquito_net_coverage <- function(total_population, nets_distributed, avg_household_size) {
    # Your code here
}

# Test cases
# mosquito_net_coverage(50000, 15000, 5)  # Should return detailed coverage analysis

2.10.4 Challenge 4: Age-Based Malaria Risk Stratification

Design a function that stratifies malaria risk by age group. The function should:

  • Accept demographic data
  • Categorize risk levels based on age
  • Calculate potential intervention needs
  • Provide summary statistics
# Example expected implementation
malaria_age_risk <- function(population_data) {
    # Your code here
    # population_data should be a data frame with age and other relevant columns
}

# Test cases
# sample_data <- data.frame(
#     age = c(5, 15, 25, 35, 45, 55),
#     location = c("rural", "urban", "rural", "urban", "rural", "urban")
# )
# malaria_risk_analysis(sample_data)

2.10.5 Bonus Challenge: Error Handling and Robust Design

For each of these functions, consider:

  • What happens with negative numbers?
  • How do you handle missing data?
  • Can you add informative warning messages?
  • What default values might be appropriate?

2.10.6 Submission Guidelines

For each challenge: 1. Write a function that meets the described requirements 2. Include comments explaining your logic 3. Demonstrate the function with at least two different test cases 4. Implement appropriate error checking

Tip: There’s no single “correct” solution. Focus on clear, readable, and robust code!