2  Loops

3 Loops in R: A Public Health Perspective on Analyzing Malaria Data

3.1 Why Loops Matter in Public Health Research

Imagine you’re a researcher tracking malaria across multiple villages in sub-Saharan Africa. You have datasets from different regions, each with hundreds or thousands of records. Manually processing each record would be extremely time-consuming. This is where loops become your most powerful tool.

3.1.1 The Big Picture: Malaria Data Challenges

In malaria research, you often need to:

  • Process multiple patient records
  • Calculate statistics across different regions
  • Repeat similar calculations with varying inputs
  • Analyze large datasets efficiently

3.2 What Exactly is a Loop?

Think of a loop like a dedicated worker who:

  • Follows the same set of instructions
  • Repeats those instructions for each item in a collection
  • Saves you from doing repetitive tasks manually

3.2.1 Basic Loop Types in R

R provides several ways to create loops, but we’ll focus on two primary types:

  1. For Loops: Used when you know exactly how many times you want to repeat a task
  2. While Loops: Used when you want to continue an operation until a specific condition is met

3.3 Practical Example: Analyzing Malaria Screening Data

Let’s walk through real-world scenarios to understand loops.

3.3.1 Scenario 1: Processing Patient Screening Results

# Sample screening data
patient_temperatures <- c(37.2, 38.5, 39.1, 37.6, 38.9)
fever_threshold <- 38.0

# For loop to check fever status
for (temp in patient_temperatures) {
    if (temp > fever_threshold) {
        print(paste("Patient has fever:", temp, "°C"))
    } else {
        print(paste("Patient temperature normal:", temp, "°C"))
    }
}

3.3.2 Scenario 2: Calculating Parasite Density Across Multiple Samples

# Simulated parasite count data
blood_samples <- c(50, 120, 300, 750, 1500)

# Create a storage vector for results
parasite_categories <- c()

# Loop to categorize parasite density
for (count in blood_samples) {
    if (count < 100) {
        category <- "Low Density"
    } else if (count < 500) {
        category <- "Moderate Density"
    } else {
        category <- "High Density"
    }
    
    # Store results
    parasite_categories <- c(parasite_categories, category)
}

print(parasite_categories)

3.4 Looping Over Sequences: Indexing and Object Creation

3.4.1 Sequence-Based Loops

Sometimes you want to loop using an index or generate a sequence. Here’s how you might do that:

# Simple sequence loop
for (i in 1:5) {
    print(paste("Iteration number:", i))
}

# Creating a vector of malaria test results
test_results <- numeric(10)  # Pre-allocate a vector of 10 zeros

# Simulating malaria parasite count tests
set.seed(123)  # For reproducibility
for (i in 1:10) {
    # Simulate parasite count with some randomness
    base_count <- runif(1, min = 50, max = 500)
    noise <- rnorm(1, mean = 0, sd = 50)
    
    # Calculate parasite count with some variation
    test_results[i] <- round(base_count + noise)
}

# Print the generated test results
print(test_results)

# Analyze the results
summary(test_results)

3.4.2 More Complex Sequence Example: Regional Analysis

# Simulating malaria prevalence across 5 regions
regions <- c("Northern", "Southern", "Eastern", "Western", "Central")
prevalence_data <- list()

for (i in 1:length(regions)) {
    # Simulate prevalence with some regional variation
    base_prevalence <- runif(1, min = 5, max = 25)
    
    prevalence_data[[regions[i]]] <- list(
        region = regions[i],
        prevalence_rate = round(base_prevalence, 2),
        sample_size = sample(500:2000, 1)
    )
}

# Inspect the results
print(prevalence_data)

# Access specific region data
print(prevalence_data$Northern)

3.5 Expanding on Object Creation in Loops

This technique is powerful for:

  • Building comprehensive datasets
  • Generating summary statistics
  • Collecting results across multiple iterations

3.5.1 Advanced Example: Tracking Intervention Effectiveness

# Simulate net distribution effectiveness over multiple villages
villages <- 10
intervention_results <- data.frame(
    village = character(villages),
    nets_distributed = numeric(villages),
    cases_reduced = numeric(villages),
    reduction_percentage = numeric(villages)
)

for (i in 1:villages) {
    # Simulate data for each village
    population <- sample(1000:5000, 1)
    nets <- round(population * runif(1, min = 0.3, max = 0.7))
    initial_cases <- sample(50:500, 1)
    
    # Simple reduction model
    cases_after_intervention <- round(initial_cases * runif(1, min = 0.2, max = 0.6))
    
    villages_results[i, ] <- c(
        paste("Village", i),
        nets,
        initial_cases - cases_after_intervention,
        round((initial_cases - cases_after_intervention) / initial_cases * 100, 2)
    )
}

# Convert to appropriate data types
villages_results$nets_distributed <- as.numeric(villages_results$nets_distributed)
villages_results$cases_reduced <- as.numeric(villages_results$cases_reduced)
villages_results$reduction_percentage <- as.numeric(villages_results$reduction_percentage)

# Print and analyze results
print(villages_results)
summary(villages_results)

3.6 Advanced Loop Techniques: Apply Family

R offers more efficient alternatives to traditional loops:

# Using sapply for quick calculations
region_populations <- c(5000, 7500, 10000, 15000)
net_distribution_rate <- 0.6

# Calculate estimated net coverage for each region
estimated_nets <- sapply(region_populations, function(population) {
    round(population * net_distribution_rate)
})

print(estimated_nets)

3.7 Common Pitfalls and Best Practices

3.7.1 Performance Considerations

  • Avoid growing vectors inside loops (pre-allocate when possible)
  • For large datasets, consider vectorized operations
  • Use sapply(), lapply(), or apply() for more efficient processing

3.7.2 Error Handling

  • Always include error checks
  • Provide meaningful output or warnings
  • Consider what happens with unexpected data

3.8 When to Use Loops vs. Other Methods

Ask yourself:

  • Do I need to perform the same operation on multiple items?
  • Am I working with a known set of data?
  • Do I need individual control over each iteration?

If yes, a loop might be your best approach.

3.9 Conclusion

Loops transform tedious, repetitive data processing into efficient, automated analysis. For malaria researchers, they’re not just a programming tool—they’re a way to uncover insights faster and more accurately.

3.11 Challenge Exercises: Mastering Loops in R for Malaria Data Analysis

3.11.1 Challenge 1: Parasite Density Tracking

Create a loop that:

  • Generates a vector of 20 simulated parasite counts
  • Ranges from 50 to 5000 parasites
  • Categorizes each count into density levels:
    • Low: < 500 parasites
    • Moderate: 500 - 2000 parasites
    • High: > 2000 parasites
  • Prints out the category for each sample
  • Creates a summary of how many samples fall into each category

Hint: Use sample() to generate random parasite counts, and create a results vector or list to store your categorizations.

3.11.2 Challenge 2: Net Distribution Efficiency Analysis

Develop a loop that:

  • Simulates net distribution across 15 different villages
  • Each village has a randomly generated population between 1000-10000
  • Calculate the number of nets distributed (assume 60% coverage)
  • Track the number of nets per 100 people
  • Identify villages that need additional net distribution (less than 0.5 nets per person)

Requirements:

  • Create a data frame to store results
  • Include columns for:
    • Village name
    • Total population
    • Nets distributed
    • Nets per 100 people

3.11.3 Challenge 3: Fever Surveillance Loop

Create a function with a loop that:

  • Accepts a vector of patient temperatures
  • Has a configurable fever threshold (default 38.0°C)
  • Tracks:
    • Total number of patients
    • Number of patients with fever
    • Percentage of patients with fever
  • Returns a list with these summary statistics
  • Includes error handling for invalid temperature inputs

Example input: c(37.2, 38.5, 39.1, 37.6, 38.9)

3.11.4 Challenge 4: Multi-Year Malaria Prevalence Projection

Design a loop that:

  • Simulates malaria prevalence over 10 years
  • Starts with an initial prevalence of 15%
  • Apply a yearly reduction factor (simulate intervention effectiveness)
  • Create a vector tracking prevalence each year
  • Include some random variation to make it more realistic

Additional tasks:

  • Calculate the total reduction over 10 years
  • Identify the year with the most significant reduction
  • Visualize the prevalence trend (optional)

3.11.5 Bonus Challenge: Optimize Your Loops

For each of the above challenges:

  1. Try to implement the same logic using vectorized operations
  2. Compare the performance of your loop vs. vectorized approach
  3. Discuss the pros and cons of each method

Remember: There’s no single “correct” solution. Focus on clarity, efficiency, and understanding the underlying concepts!

3.11.6 Learning Objectives

By completing these challenges, you will:

  • Practice creating loops in R
  • Understand different loop structures
  • Apply programming concepts to real-world public health scenarios
  • Develop skills in data manipulation and analysis
  • Learn to handle complex data processing tasks