Before getting started, please follow the installation instructions. This is a private R package, so the installation process is different than most other packages. Also, the package accesses remote data sources, which means you will need to have internet access to use it. Once the package has been installed, you can load it into your R session.
The primary goal of the PATHtoolsZambia
R package is to provide access to Zambia-related datasets that are “clean” and up-to-date. The list_data()
function provides that table of the available datasets, which contains the reference name and a brief description.
list_data()
#> name
#> 33 any-travel
#> 30 catchment-annual-long
#> 29 catchment-province-output
#> 13 chw-cases-2013
#> 14 chw-cases-2014
#> 15 chw-cases-2015
#> 16 chw-cases-2016
#> 17 chw-cases-2017
#> 18 chw-cases-2018
#> 19 chw-cases-2019
#> 20 chw-cases-2020
#> 21 chw-cases-2021
#> 22 chw-cases-2022
#> 34 chw-cases-2023
#> 12 chw-masterlist
#> 32 chw-travel
#> 3 district-shp
#> 11 friction-walking
#> 7 grid3-pop-rescaled
#> 9 hf-catchment-pop
#> 2 hf-georef
#> 5 hf-master-wide
#> 31 hf-travel
#> 8 hfca-pop-table
#> 23 hfca-voronoi-2016
#> 24 hfca-voronoi-2017
#> 25 hfca-voronoi-2018
#> 26 hfca-voronoi-2019
#> 27 hfca-voronoi-2020
#> 28 hfca-voronoi-2021
#> 35 hfca-voronoi-2022
#> 1 monthly-cases
#> 10 monthly-inpatient
#> 6 monthly-opd
#> 4 province-shp
#> description
#> 33 Walking time (min) to nearest health facility or worker
#> 30 Long-form HFCA population and incidence rates from province-level gravity model.
#> 29 Output table from province-level gravity model.
#> 13 NMEC CHW cases data for 2013
#> 14 NMEC CHW cases data for 2014
#> 15 NMEC CHW cases data for 2015
#> 16 NMEC CHW cases data for 2016
#> 17 NMEC CHW cases data for 2017
#> 18 NMEC CHW cases data for 2018
#> 19 NMEC CHW cases data for 2019
#> 20 NMEC CHW cases data for 2020
#> 21 NMEC CHW cases data for 2021
#> 22 NMEC CHW cases data for 2022
#> 34 NMEC CHW cases data for 2023
#> 12 CHW name, location, and history information.
#> 32 Walking time (min) to nearest health worker
#> 3 District-level (Admin2) shapefile with population totals
#> 11 Walking friction surface for estimating travel time.
#> 7 GRID3 population raster, rescaled to 18.4 million
#> 9 Annual estimated catchment sizes via gravity model.
#> 2 Georeferenced facility masterlist (one set of coordinates per UID)
#> 5 Master facility list, organized by DHIS2 UID and retaining all source information.
#> 31 Walking time (min) to nearest health facility
#> 8 Health facility catchment populations include catchment model and HMIS 2020 headcount.
#> 23 Voronoi tesselations for 2016 HFCAs
#> 24 Voronoi tesselations for 2017 HFCAs
#> 25 Voronoi tesselations for 2018 HFCAs
#> 26 Voronoi tesselations for 2019 HFCAs
#> 27 Voronoi tesselations for 2020 HFCAs
#> 28 Voronoi tesselations for 2021 HFCAs
#> 35 Voronoi tesselations for 2022 HFCAs
#> 1 Monthly malaria cases data (HMIS and NMEC combined).
#> 10 Monthly HMIS inpatient data (incl. deaths).
#> 6 Monthly OPD first attendence.
#> 4 Province-level (Admin1) shapefile with GRID3 population totals
The retrieve()
function is used to load in data, using the reference name field in the from list_data()
. For example, we can load in a list of all of the health facilities in Zambia sourced from the HMIS, NMEC, and Zambia Ministry of Health online record.
master_facility_list <- retrieve("hf-master-wide")
head(master_facility_list)
#> # A tibble: 6 × 17
#> org_unit_uid lon_HMIS lat_HMIS province_HMIS district_HMIS name_HMIS lon_NMEC
#> <chr> <dbl> <dbl> <chr> <chr> <chr> <dbl>
#> 1 sy04jreTFc0 28.3 -15.4 Lusaka Lusaka NA 28.3
#> 2 VEwpwUzaSZ8 NA NA Muchinga Kanchibiya NA NA
#> 3 bwPt010YjCo 28.4 -12.7 Copperbelt Mufulira NA 28.2
#> 4 Me0ZPMA7wvc 28.5 -12.8 Copperbelt Ndola NA NA
#> 5 IAWEwxGrcHM 28.5 -12.8 Copperbelt Ndola NA NA
#> 6 jGqu6BUf5hW 28.4 -13.1 Copperbelt Luanshya NA 28.4
#> # ℹ 10 more variables: lat_NMEC <dbl>, province_NMEC <chr>,
#> # district_NMEC <chr>, name_NMEC <chr>, lon_ZMoH <dbl>, lat_ZMoH <dbl>,
#> # province_ZMoH <chr>, district_ZMoH <chr>, name_ZMoH <chr>, type <chr>
Another useful dataset is "hf-georef"
, which contains a list of all of the health facilities that have been georeferenced, meaning each row in the table is a unique facility that has a latitude and longitude. This data is useful for constructing maps.
hf_locations <- retrieve("hf-georef")
head(hf_locations)
#> # A tibble: 6 × 10
#> org_unit_uid lon lat province district name source type geo_province
#> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 A87peYAyqsf 29.1 -8.84 Luapula Chienge Kany… NMEC Heal… Luapula
#> 2 ANtd2l36nZS 33.5 -10.4 Muchinga Mafinga Kaly… ZMoH Heal… Muchinga
#> 3 ARNhWzN9QfA 31.9 -14.4 Eastern Sinda Mng'… ZMoH Heal… Eastern
#> 4 ASnusR9MFtB 22.3 -15.0 Western Sikongo Siko… ZMoH Rura… Western
#> 5 AVmbFzKj1bY 28 -15 Lusaka Lusaka NA HMIS Other Central
#> 6 AaHjJI4XyW2 24.3 -13.6 Northweste… Kabompo Kama… NMEC Heal… Northwestern
#> # ℹ 1 more variable: geo_district <chr>
Most of the datasets are relatively small, so they should download quickly, however the large datasets such as the monthly cases records may take longer. Typically data are stored in tables, however there are some that are more complex file types such as shapefiles or rasters.
We have started to put together some quick data summaries for the datasets, using the sanity_check()
function. This can be useful for checking for errors in the data (which are certainly possible!), and providing some quick aggregations. If you have suggestions for more useful summaries, or for the package in general, please add your comments here.
Here is an example of the sanity_check()
function for the monthly cases dataset.
case_check <- sanity_check("monthly-cases")
#> Filtering case records from 2018-01-01 to 2021-03-01.
#> Warning: There was 1 warning in `dplyr::mutate()`.
#> ℹ In argument: `Total = sum(dplyr::c_across(), na.rm = T)`.
#> ℹ In row 1.
#> Caused by warning:
#> ! Using `c_across()` without supplying `cols` was deprecated in dplyr 1.1.0.
#> ℹ Please supply `cols` instead.
#> ℹ The deprecated feature was likely used in the PATHtoolsZambia package.
#> Please report the issue at
#> <https://github.com/PATH-Global-Health/PATHtoolsZambia/issues>.
#> 3146 UIDs.
#> 39 unique periods.
#> Data types: Confirmed, Confirmed_Passive_CHW, Tested, Tested_Passive_CHW, Treated_Confirmed, Treated_Clinical, Clinical
#> Age groups: Between 1-4 y., Over 5 y., Under 5 y., NA, Under 1 y.
#> Average annual cases: 12.77 million
case_check$cases_by_province
#> # A tibble: 11 × 6
#> reported_province Total yr_2018 yr_2019 yr_2020 yr_2021
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Central 3888438 888848 1028010 1532583 438997
#> 2 Copperbelt 6547586 1448887 1906933 2546797 644969
#> 3 Eastern 6991698 1706367 1899670 2680579 705082
#> 4 Luapula 5358926 1337374 1560610 1966265 494677
#> 5 Lusaka 671614 139371 158776 283348 90119
#> 6 Muchinga 3612982 785174 1000778 1461448 365582
#> 7 Northern 4366792 984689 1269207 1699522 413374
#> 8 Northwestern 5466213 1214523 1474881 2231151 545658
#> 9 Southern 481756 128878 86256 184240 82382
#> 10 Western 4100488 1126414 669896 1561744 742434
#> 11 NA 15614 NA 1408 14206 NA