The Effect of Tuberculosis on the Globe

Table of Contents

  • Introduction
  • Setting up my environment
  • Data cleaning
  • Data summary gaining insight
  • Exploratory Data Analysis
  • Inferences
  • Conclusion

Introduction

The data was gotten from WHO(world health organisation) http://www.who.int. The data was complex as three tables had to be joined together to be able take advantage of the full data set with several analysis carried out on the cases of tuberculosis, HIV-Tuberculosis cases and funding of Tuberculosis care all over the world.
** new- new cases ** sp - smear positive ** sn - smear negative ** ep - extra-plumotary ** f - female ** m - male ** u - unknown ** whoregion -

Setting up my environment

Notes: Setting up my R environment by loading needed packages like tidyverse, ggplot2, dpylr, rmarkdown, forcats, magrittr, readr, stringr, tibble and tidyr.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ stringr 1.4.0
## ✔ tidyr   1.2.0     ✔ forcats 0.5.2
## ✔ readr   2.1.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(tidyr)
library(tibble)
library(stringr)
library(readr)
library(magrittr)
## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
library(forcats)
library(knitr)

getting and loading the files into R

Notes: The file path was checked and initiated using the get function. The three different files were loaded into R and they where later joined together into a table for ease of use.

#To print the working directory file path
getwd()
## [1] "C:/Users/dell/Documents"
# to comfirm the existence of our file in the right path
file.exists("Excel/TB_notifications_2022-08-19.csv")
## [1] TRUE
file.exists("Excel/TB_outcomes_2022-08-20.csv")
## [1] TRUE
file.exists("Excel/TB_burden_countries_2022-08-19.csv")
## [1] TRUE
file.exists("Excel/TB_expenditure_utilisation_2022-08-19.csv")
## [1] TRUE
#with the help of the readr package, reading a csv file using the read_csv command
tuberculosis <- read_csv("Excel/TB_notifications_2022-08-19.csv")
## Rows: 8707 Columns: 198
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): country, iso2, iso3, iso_numeric, g_whoregion
## dbl (193): year, new_sp, new_sn, new_su, new_ep, new_oth, ret_rel, ret_taf, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tbhiv <- read_csv("Excel/TB_outcomes_2022-08-20.csv")
## Rows: 5539 Columns: 84
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): country, iso2, iso3, iso_numeric, g_whoregion
## dbl (79): year, rep_meth, new_sp_coh, new_sp_cur, new_sp_cmplt, new_sp_died,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
funding <- read_csv("Excel/TB_expenditure_utilisation_2022-08-19.csv")
## Rows: 860 Columns: 46
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): country, iso2, iso3, iso_numeric, g_whoregion
## dbl (41): year, exp_cpp_dstb, exp_cpp_mdr, exp_cpp_xdr, exp_cpp_tpt, exp_lab...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
estimates <- read_csv("Excel/TB_burden_countries_2022-08-19.csv")
## Rows: 4487 Columns: 50
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): country, iso2, iso3, iso_numeric, g_whoregion
## dbl (45): year, e_pop_num, e_inc_100k, e_inc_100k_lo, e_inc_100k_hi, e_inc_n...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Joining tables using the Join clause

#combining the four dataset
tuberculosis_join <- tuberculosis %>% 
  full_join(tbhiv, by = c("country", "year")) %>%
  full_join(estimates, by = c('country', "year")) %>%
  full_join(funding, by = c('country', "year"))

Data Cleaning

** Getting an overview of the data

# printing the structure of the data to have an overview of how the data looks like
str(tuberculosis)
## spec_tbl_df [8,707 × 198] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ country               : chr [1:8707] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ iso2                  : chr [1:8707] "AF" "AF" "AF" "AF" ...
##  $ iso3                  : chr [1:8707] "AFG" "AFG" "AFG" "AFG" ...
##  $ iso_numeric           : chr [1:8707] "004" "004" "004" "004" ...
##  $ g_whoregion           : chr [1:8707] "EMR" "EMR" "EMR" "EMR" ...
##  $ year                  : num [1:8707] 1980 1981 1982 1983 1984 ...
##  $ new_sp                : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn                : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_su                : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep                : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_oth               : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ ret_rel               : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ ret_taf               : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ ret_tad               : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ ret_oth               : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ newret_oth            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_labconf           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_clindx            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ ret_rel_labconf       : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ ret_rel_clindx        : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ ret_rel_ep            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ ret_nrel              : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ notif_foreign         : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ c_newinc              : num [1:8707] 71685 71554 41752 52502 18784 ...
##  $ new_sp_m04            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_m514           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_m014           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_m1524          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_m2534          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_m3544          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_m4554          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_m5564          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_m65            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_mu             : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f04            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f514           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f014           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f1524          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f2534          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f3544          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f4554          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f5564          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_f65            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sp_fu             : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m04            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m514           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m014           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m1524          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m2534          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m3544          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m4554          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m5564          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m65            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_m15plus        : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_mu             : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f04            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f514           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f014           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f1524          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f2534          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f3544          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f4554          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f5564          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f65            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_f15plus        : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_fu             : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_sexunk04       : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_sexunk514      : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_sexunk014      : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_sn_sexunk15plus   : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m04            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m514           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m014           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m1524          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m2534          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m3544          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m4554          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m5564          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m65            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_m15plus        : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_mu             : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f04            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f514           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f014           : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f1524          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f2534          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f3544          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f4554          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f5564          : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f65            : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_f15plus        : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_fu             : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_sexunk04       : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_sexunk514      : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_sexunk014      : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_sexunk15plus   : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ new_ep_sexunkageunk   : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ rel_in_agesex_flg     : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##  $ agegroup_option       : num [1:8707] NA NA NA NA NA NA NA NA NA NA ...
##   [list output truncated]
##  - attr(*, "spec")=
##   .. cols(
##   ..   country = col_character(),
##   ..   iso2 = col_character(),
##   ..   iso3 = col_character(),
##   ..   iso_numeric = col_character(),
##   ..   g_whoregion = col_character(),
##   ..   year = col_double(),
##   ..   new_sp = col_double(),
##   ..   new_sn = col_double(),
##   ..   new_su = col_double(),
##   ..   new_ep = col_double(),
##   ..   new_oth = col_double(),
##   ..   ret_rel = col_double(),
##   ..   ret_taf = col_double(),
##   ..   ret_tad = col_double(),
##   ..   ret_oth = col_double(),
##   ..   newret_oth = col_double(),
##   ..   new_labconf = col_double(),
##   ..   new_clindx = col_double(),
##   ..   ret_rel_labconf = col_double(),
##   ..   ret_rel_clindx = col_double(),
##   ..   ret_rel_ep = col_double(),
##   ..   ret_nrel = col_double(),
##   ..   notif_foreign = col_double(),
##   ..   c_newinc = col_double(),
##   ..   new_sp_m04 = col_double(),
##   ..   new_sp_m514 = col_double(),
##   ..   new_sp_m014 = col_double(),
##   ..   new_sp_m1524 = col_double(),
##   ..   new_sp_m2534 = col_double(),
##   ..   new_sp_m3544 = col_double(),
##   ..   new_sp_m4554 = col_double(),
##   ..   new_sp_m5564 = col_double(),
##   ..   new_sp_m65 = col_double(),
##   ..   new_sp_mu = col_double(),
##   ..   new_sp_f04 = col_double(),
##   ..   new_sp_f514 = col_double(),
##   ..   new_sp_f014 = col_double(),
##   ..   new_sp_f1524 = col_double(),
##   ..   new_sp_f2534 = col_double(),
##   ..   new_sp_f3544 = col_double(),
##   ..   new_sp_f4554 = col_double(),
##   ..   new_sp_f5564 = col_double(),
##   ..   new_sp_f65 = col_double(),
##   ..   new_sp_fu = col_double(),
##   ..   new_sn_m04 = col_double(),
##   ..   new_sn_m514 = col_double(),
##   ..   new_sn_m014 = col_double(),
##   ..   new_sn_m1524 = col_double(),
##   ..   new_sn_m2534 = col_double(),
##   ..   new_sn_m3544 = col_double(),
##   ..   new_sn_m4554 = col_double(),
##   ..   new_sn_m5564 = col_double(),
##   ..   new_sn_m65 = col_double(),
##   ..   new_sn_m15plus = col_double(),
##   ..   new_sn_mu = col_double(),
##   ..   new_sn_f04 = col_double(),
##   ..   new_sn_f514 = col_double(),
##   ..   new_sn_f014 = col_double(),
##   ..   new_sn_f1524 = col_double(),
##   ..   new_sn_f2534 = col_double(),
##   ..   new_sn_f3544 = col_double(),
##   ..   new_sn_f4554 = col_double(),
##   ..   new_sn_f5564 = col_double(),
##   ..   new_sn_f65 = col_double(),
##   ..   new_sn_f15plus = col_double(),
##   ..   new_sn_fu = col_double(),
##   ..   new_sn_sexunk04 = col_double(),
##   ..   new_sn_sexunk514 = col_double(),
##   ..   new_sn_sexunk014 = col_double(),
##   ..   new_sn_sexunk15plus = col_double(),
##   ..   new_ep_m04 = col_double(),
##   ..   new_ep_m514 = col_double(),
##   ..   new_ep_m014 = col_double(),
##   ..   new_ep_m1524 = col_double(),
##   ..   new_ep_m2534 = col_double(),
##   ..   new_ep_m3544 = col_double(),
##   ..   new_ep_m4554 = col_double(),
##   ..   new_ep_m5564 = col_double(),
##   ..   new_ep_m65 = col_double(),
##   ..   new_ep_m15plus = col_double(),
##   ..   new_ep_mu = col_double(),
##   ..   new_ep_f04 = col_double(),
##   ..   new_ep_f514 = col_double(),
##   ..   new_ep_f014 = col_double(),
##   ..   new_ep_f1524 = col_double(),
##   ..   new_ep_f2534 = col_double(),
##   ..   new_ep_f3544 = col_double(),
##   ..   new_ep_f4554 = col_double(),
##   ..   new_ep_f5564 = col_double(),
##   ..   new_ep_f65 = col_double(),
##   ..   new_ep_f15plus = col_double(),
##   ..   new_ep_fu = col_double(),
##   ..   new_ep_sexunk04 = col_double(),
##   ..   new_ep_sexunk514 = col_double(),
##   ..   new_ep_sexunk014 = col_double(),
##   ..   new_ep_sexunk15plus = col_double(),
##   ..   new_ep_sexunkageunk = col_double(),
##   ..   rel_in_agesex_flg = col_double(),
##   ..   agegroup_option = col_double(),
##   ..   newrel_m04 = col_double(),
##   ..   newrel_m59 = col_double(),
##   ..   newrel_m1014 = col_double(),
##   ..   newrel_m514 = col_double(),
##   ..   newrel_m014 = col_double(),
##   ..   newrel_m1519 = col_double(),
##   ..   newrel_m2024 = col_double(),
##   ..   newrel_m1524 = col_double(),
##   ..   newrel_m2534 = col_double(),
##   ..   newrel_m3544 = col_double(),
##   ..   newrel_m4554 = col_double(),
##   ..   newrel_m5564 = col_double(),
##   ..   newrel_m65 = col_double(),
##   ..   newrel_m15plus = col_double(),
##   ..   newrel_mu = col_double(),
##   ..   newrel_f04 = col_double(),
##   ..   newrel_f59 = col_double(),
##   ..   newrel_f1014 = col_double(),
##   ..   newrel_f514 = col_double(),
##   ..   newrel_f014 = col_double(),
##   ..   newrel_f1519 = col_double(),
##   ..   newrel_f2024 = col_double(),
##   ..   newrel_f1524 = col_double(),
##   ..   newrel_f2534 = col_double(),
##   ..   newrel_f3544 = col_double(),
##   ..   newrel_f4554 = col_double(),
##   ..   newrel_f5564 = col_double(),
##   ..   newrel_f65 = col_double(),
##   ..   newrel_f15plus = col_double(),
##   ..   newrel_fu = col_double(),
##   ..   newrel_sexunk04 = col_double(),
##   ..   newrel_sexunk514 = col_double(),
##   ..   newrel_sexunk014 = col_double(),
##   ..   newrel_sexunk15plus = col_double(),
##   ..   newrel_sexunkageunk = col_double(),
##   ..   rdx_data_available = col_double(),
##   ..   newinc_rdx = col_double(),
##   ..   rdxsurvey_newinc = col_double(),
##   ..   rdxsurvey_newinc_rdx = col_double(),
##   ..   rdst_new = col_double(),
##   ..   rdst_ret = col_double(),
##   ..   rdst_unk = col_double(),
##   ..   conf_rrmdr = col_double(),
##   ..   conf_mdr = col_double(),
##   ..   rr_sldst = col_double(),
##   ..   all_conf_xdr = col_double(),
##   ..   conf_rr_nfqr = col_double(),
##   ..   conf_rr_fqr = col_double(),
##   ..   unconf_rrmdr_tx = col_double(),
##   ..   conf_rrmdr_tx = col_double(),
##   ..   rrmdr_014_tx = col_double(),
##   ..   unconf_mdr_tx = col_double(),
##   ..   conf_mdr_tx = col_double(),
##   ..   conf_xdr_tx = col_double(),
##   ..   unconf_rr_nfqr_tx = col_double(),
##   ..   conf_rr_nfqr_tx = col_double(),
##   ..   conf_rr_fqr_tx = col_double(),
##   ..   mdrxdr_bdq_used = col_double(),
##   ..   mdrxdr_bdq_tx = col_double(),
##   ..   mdrxdr_alloral_used = col_double(),
##   ..   mdrxdr_alloral_tx = col_double(),
##   ..   mdrxdr_dlm_used = col_double(),
##   ..   mdrxdr_dlm_tx = col_double(),
##   ..   mdr_shortreg_used = col_double(),
##   ..   mdr_shortreg_tx = col_double(),
##   ..   mdr_tx_adverse_events = col_double(),
##   ..   mdr_alloral_short_used = col_double(),
##   ..   mdr_alloral_short_tx = col_double(),
##   ..   mdr_tx_adsm = col_double(),
##   ..   newrel_tbhiv_flg = col_double(),
##   ..   newrel_hivtest = col_double(),
##   ..   newrel_hivpos = col_double(),
##   ..   newrel_art = col_double(),
##   ..   tbhiv_014_flg = col_double(),
##   ..   newrel_hivtest_014 = col_double(),
##   ..   newrel_hivpos_014 = col_double(),
##   ..   newrel_art_014 = col_double(),
##   ..   hivtest = col_double(),
##   ..   hivtest_pos = col_double(),
##   ..   hiv_cpt = col_double(),
##   ..   hiv_art = col_double(),
##   ..   hiv_tbscr = col_double(),
##   ..   hiv_reg = col_double(),
##   ..   hiv_ipt = col_double(),
##   ..   hiv_reg_new = col_double(),
##   ..   hiv_ipt_reg_all = col_double(),
##   ..   hiv_reg_all = col_double(),
##   ..   hiv_tbdetect = col_double(),
##   ..   hiv_reg_new2 = col_double(),
##   ..   hiv_elig_all_tpt = col_double(),
##   ..   hiv_elig_all = col_double(),
##   ..   hiv_elig_new_tpt = col_double(),
##   ..   hiv_elig_new = col_double(),
##   ..   hiv_all_tpt = col_double(),
##   ..   hiv_all = col_double(),
##   ..   hiv_new_tpt = col_double(),
##   ..   hiv_new = col_double(),
##   ..   hiv_all_tpt_completed = col_double(),
##   ..   hiv_all_tpt_started = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
  1. Deleting columns that are not required li>
  2. Handling zero values li>
  3. Checking and droping null values li>
  4. Removing duplicate values li>
  5. Transforming the data from wide to long using tidyr li>
  6. Separating a column into several for simplicity li>
  7. Adjusting and classifying the age data (Adding age range: 0-19, 20-34, 35-44, 45-65, 65+) li>
  8. Filtered for year 2000 to 2020 li>
  9. Changing values
tuber <- tuberculosis %>% 
  gather(key = type, value = cases, new_sp_m04:newrel_f65,-agegroup_option,
         -rel_in_agesex_flg, na.rm = TRUE) %>%
  select(country, g_whoregion, year, type, cases) %>%
  mutate(
    type = stringr::str_replace(type, "newrel", "new_rel")
  ) %>%
  separate(type, c("new", "var", "sexage")) %>%
  mutate(sexage = stringr::str_replace(sexage, "unk", "u")) %>%
  mutate(sexage = stringr::str_replace(sexage, "sexu", "u")) %>%
  separate(sexage, c("sex", "age"), sep = 1) %>%
  #filtering for only active cases i.e cases greater than one and from 2000 till 2020
  filter(cases > 0, year >= 2000) %>%
  
  # the 15plus values are removed from the data set because it is the combination of all ages from 15 and above thus causing the number of cases to be repeated and calculated repeatedly thereby providing an inaccurate analysis
  filter(age != "15plus")
tuber %>% head(10)
## # A tibble: 10 × 8
##    country     g_whoregion  year new   var   sex   age   cases
##    <chr>       <chr>       <dbl> <chr> <chr> <chr> <chr> <dbl>
##  1 Afghanistan EMR          2010 new   sp    m     04        4
##  2 Afghanistan EMR          2011 new   sp    m     04        2
##  3 Albania     EUR          2006 new   sp    m     04        1
##  4 Albania     EUR          2008 new   sp    m     04        1
##  5 Angola      AFR          2011 new   sp    m     04      108
##  6 Angola      AFR          2012 new   sp    m     04       58
##  7 Argentina   AMR          2006 new   sp    m     04       19
##  8 Argentina   AMR          2007 new   sp    m     04       14
##  9 Argentina   AMR          2008 new   sp    m     04       11
## 10 Argentina   AMR          2009 new   sp    m     04        8

Adjusting and classifying the age data (Adding age range: 0-19, 20-34, 35-44, 45-65, 65+)

tuber$age[tuber$age == "04"] <- "0-19"
 
 tuber$age[tuber$age == "514"] <- "0-19"
 
 tuber$age[tuber$age == "014"] <- "0-19"
 
 tuber$age[tuber$age == "59"] <- "0-19"
 
 tuber$age[tuber$age == "1014"] <- "0-19"
 
 tuber$age[tuber$age == "1519"] <- "0-19"
 
 tuber$age[tuber$age == "1524"] <- "20-34"
 
 tuber$age[tuber$age == "2024"] <- "20-34"
 
 tuber$age[tuber$age == "2534"] <- "20-34"
 
 tuber$age[tuber$age == "3544"] <- "35-44"
 
 tuber$age[tuber$age == "4554"] <- "45-64"
 
 tuber$age[tuber$age == "5564"] <- "45-64"
 
 tuber$age[tuber$age == "65"] <- "65+"
 
 
 #changing the abbreviation of the g_whoregion for proper comprehension
 tuber$g_whoregion[tuber$g_whoregion == "AFR"] <- "Africa"
 
 tuber$g_whoregion[tuber$g_whoregion == "AMR"] <- "America"
 
 tuber$g_whoregion[tuber$g_whoregion == "EMR"] <- "Eastern Mediterrenian"
 
 tuber$g_whoregion[tuber$g_whoregion == "EUR"] <- "Europe"
 
 tuber$g_whoregion[tuber$g_whoregion == "SEA"] <- "South-East Asia"
 
 tuber$g_whoregion[tuber$g_whoregion == "WPR"] <- "Western Pacific"
kable(head(tuber))
country g_whoregion year new var sex age cases
Afghanistan Eastern Mediterrenian 2010 new sp m 0-19 4
Afghanistan Eastern Mediterrenian 2011 new sp m 0-19 2
Albania Europe 2006 new sp m 0-19 1
Albania Europe 2008 new sp m 0-19 1
Angola Africa 2011 new sp m 0-19 108
Angola Africa 2012 new sp m 0-19 58

Data summary gaining insight

Note: The data was briefly summarized according to whoregion to show common trends of year, age and sex across regions

# showing the trend of tuberculosis cases from 1980 - 2020 for each WHO regions
 ggplot(data = tuber, mapping = aes(x = year, y = cases)) +
   geom_col() +
   facet_wrap(~g_whoregion)

# showing the distribution of tuberculosis cases according to age categories for each WHO regions
 ggplot(data = tuber, mapping = aes(x = age, y = cases)) +
   geom_col() +
   facet_wrap(~g_whoregion)

# showing the distribution of tuberculosis cases between male and female  
 ggplot(data = tuber, mapping = aes(x = sex, y = cases)) +
   geom_col() +
   facet_wrap(~g_whoregion)

tuber %>% 
   group_by(., g_whoregion) %>%
   summarise(., cases = sum(cases)) %>%
   arrange(., desc(cases)) %>% head()
## # A tibble: 6 × 2
##   g_whoregion              cases
##   <chr>                    <dbl>
## 1 South-East Asia       35196995
## 2 Western Pacific       20327430
## 3 Africa                18562209
## 4 Eastern Mediterrenian  6061551
## 5 Europe                 4418793
## 6 America                4004211

Explanatory Data Analysis

Question 1: What has been the trend of tuberculosis cases from 2000 to 2020?

Which year has the highest tuberculosis cases?

by_year <- tuber %>%
  group_by(.,year) %>%
  summarise(., total_cases = sum(cases)) 
kable(by_year)
year total_cases
2000 1148524
2001 1237444
2002 1523369
2003 1860645
2004 2184958
2005 2364897
2006 3064892
2007 3933707
2008 3680748
2009 3932522
2010 4295276
2011 4390442
2012 4377419
2013 3422746
2014 5674335
2015 6041262
2016 6524581
2017 6536013
2018 7202478
2019 8237527
2020 6937404

plotting the trend of tuberculosis cases by year

by_year %>% 
  ggplot(aes (x = year)) +
  geom_line(aes(y = total_cases))


The year with the highest tuberculosis cases

#What year has the highest number of tuberculosis cases?
by_year %>%
  arrange(desc(total_cases)) %>%
  head(1)
## # A tibble: 1 × 2
##    year total_cases
##   <dbl>       <dbl>
## 1  2019     8237527

Question 2: What is the trend of tuberculosis cases in countries across the globe?

What are the 10 countries with the top tuberculosis cases? What are the 10 countries with the least tuberculosis cases?

# grouping cases by country
by_country <- tuber %>%
  group_by(.,country) %>%
  summarise(., total_cases = sum(cases))

What are the 10 countries with the top tuberculosis cases?

# selecting the top 10 country with the highest tuberculosis cases through all
# the years
top_10 <- by_country %>%
  arrange(desc(total_cases)) %>%
  head(10)
kable(top_10)  
country total_cases
India 21337820
China 13048274
Indonesia 6311768
South Africa 5179958
Pakistan 3717044
Philippines 3567369
Bangladesh 3137372
Democratic Republic of the Congo 1782554
Myanmar 1611582
Russian Federation 1526742
  # a plot of the most country with the lowest tuberculosis cases
  ggplot(top_10, aes(total_cases, fct_reorder(country, total_cases))) +
  geom_point() 


#### What are the 10 countries with the least tuberculosis cases?

# selecting the least 10 country with the least tuberculosis cases through all
# the years
least_10 <- by_country %>%
  arrange(total_cases) %>%
  head(10)
kable(least_10)
country total_cases
San Marino 1
Tokelau 2
Anguilla 3
Montserrat 4
Monaco 5
Niue 5
British Virgin Islands 6
Cook Islands 19
Bermuda 20
Curaçao 28
#a plot of the least country with the lowest tuberculosis cases
ggplot(least_10, aes(total_cases, fct_reorder(country, total_cases))) +
geom_point()

Question 3: What is the pattern of tuberculosis cases across age groups and sex

checking for pattern across age groups and noting age groups that has the highest frequency of tuberculosis cases

# grouping data by age_group 
by_age_group <- tuber %>%
  group_by(.,age) %>%
  summarise(., total_cases = sum(cases))

kable(by_age_group)
age total_cases
0-19 9931668
20-34 31620389
35-44 15195618
45-64 22407587
65+ 8896305
u 519622
# a plot of the age group that frequently occurs
ggplot(by_age_group, aes(x = age, y = total_cases)) +
  geom_col()


It is shown the the middle aged 20-64 has the highest case of tuberculosis

checking the sex with the highest cases of tuberculosis

by_sex <- tuber %>%
  group_by(.,sex) %>%
  summarise(., total_cases = sum(cases)) %>%
  arrange(., desc(total_cases))

kable(by_sex)
sex total_cases
m 54847196
f 33455750
u 268243
# a plot of the sex that frequently occurs
ggplot(by_sex, aes(x = sex, y = total_cases)) +
  geom_col()

Question 4: Tuberculosis cases with HIV across the globe

Selecting TBHIV cases from the general table ‘tuberculosis_join’

tbhiv1 <- select(tuberculosis_join, country, year,
                 success_tbhiv_treatment = tbhiv_succ, failed_tbhiv_treatment = tbhiv_fail,
                 tbhiv_death = tbhiv_died,
                 tbhiv_lost) %>% 
  drop_na() %>%
  mutate(tbhiv_total = success_tbhiv_treatment + failed_tbhiv_treatment + tbhiv_death + tbhiv_lost)
Country with the most TBHIV cases all over the world for the year 2019
tbhiv1 %>%
  group_by(.,country, year) %>%
  summarise(., total_tbhiv_cases = sum(tbhiv_total)) %>%
  filter(., total_tbhiv_cases > 0, year == 2019)%>%
  arrange(desc(total_tbhiv_cases)) %>%
  head(10)
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
## # A tibble: 10 × 3
## # Groups:   country [10]
##    country                      year total_tbhiv_cases
##    <chr>                       <dbl>             <dbl>
##  1 South Africa                 2019             95969
##  2 India                        2019             35419
##  3 Mozambique                   2019             31753
##  4 Kenya                        2019             20980
##  5 United Republic of Tanzania  2019             18792
##  6 Zambia                       2019             16002
##  7 Uganda                       2019             15709
##  8 Zimbabwe                     2019             12257
##  9 Nigeria                      2019             12214
## 10 Russian Federation           2019             10635

Note: 8 out of the 10 countries with the most TBHIV cases were African countries and a huge amount of the countries are in the southern part of the continent

Viewing countries with the highest TBHIV rate (2019)
# viewing countries with the most percentage for year 2019 of hiv to tuberculosis cases estimates
tbhiv1 %>% 
  full_join(by_year_country, by = c("country", "year")) %>%
  select(., country, year, total_cases, tbhiv_total) %>%
  drop_na()%>%
  mutate(., hivtb_rate = (tbhiv_total/total_cases)*100) %>%
  arrange(desc(hivtb_rate)) %>% 
  filter(., hivtb_rate < 100, total_cases > 100000, year == 2019)
## # A tibble: 12 × 5
##    country       year total_cases tbhiv_total hivtb_rate
##    <chr>        <dbl>       <dbl>       <dbl>      <dbl>
##  1 South Africa  2019      226005       95969    42.5   
##  2 Kenya         2019      111049       20980    18.9   
##  3 Nigeria       2019      126611       12214     9.65  
##  4 Brazil        2019      111332        6462     5.80  
##  5 Myanmar       2019      184951        9509     5.14  
##  6 Viet Nam      2019      104124        2643     2.54  
##  7 Indonesia     2019      629171       10370     1.65  
##  8 India         2019     2883264       35419     1.23  
##  9 China         2019      837860        6143     0.733 
## 10 Philippines   2019      530626        1516     0.286 
## 11 Pakistan      2019      373759         662     0.177 
## 12 Bangladesh    2019      303925         117     0.0385

South Africa is the country with the most TBHIV rate all over the world as at 2019 with a TBHIV to total tuberculosis cases rate at 42.5%, Kenya was the second highest with 18.9% and Nigeria third with 9.65%

checking for the trend of TBHIV cases from 2012 till 2019

tbhiv_by_year <- tbhiv1 %>%
  group_by(.,year) %>%
  summarise(., tbhiv_total = sum(tbhiv_total))
kable(tbhiv_by_year)
year tbhiv_total
2012 363207
2013 360953
2014 392464
2015 446016
2016 403686
2017 416406
2018 380606
2019 390368
# plotting total tuberculosis and HIV cases against year
tbhiv_by_year %>% ggplot(aes(x = year))+
  geom_point(aes(y = tbhiv_total))+
  geom_line(aes(y = tbhiv_total))

Note: The year with the highest TBHIV rate is 2015 and it has been falling since then

Question 5: What is the tuberculosis mortality rate across the globe?

# calculation the estimated mortality recorded and the rate of mortality to the 
# total tuberculosis cases
tb_mortality <- select(tuberculosis_join, country, e_pop_num, e_mort_num) %>%
  group_by(., country) %>%
  drop_na() %>%
  summarise(., total_population = max(e_pop_num, na.rm = TRUE)
            , tb_death = mean(e_mort_num, na.rm = TRUE)) %>%
  full_join(by_country, by = "country") %>%
  mutate(., mortality_rate = (tb_death/total_cases)*100) %>%
  drop_na() %>%
  arrange(., desc(mortality_rate)) 
kable(tb_mortality)
country total_population tb_death total_cases mortality_rate
Anguilla 15002 1.047619e+00 3 34.9206349
Mozambique 31255435 2.376190e+04 120779 19.6738711
Serbia & Montenegro 10101170 3.780000e+02 2710 13.9483395
Nigeria 206139587 1.382857e+05 1373968 10.0646969
Central African Republic 4829764 9.380952e+03 120664 7.7744417
Ghana 31072945 1.647619e+04 231460 7.1183749
South Sudan 11193729 6.730000e+03 95267 7.0643560
United Republic of Tanzania 59734213 6.595238e+04 1014853 6.4987127
Cameroon 26545864 1.809524e+04 279837 6.4663494
Malawi 19129955 1.479048e+04 235457 6.2816039
Guinea-Bissau 1967998 2.052381e+03 33490 6.1283397
Equatorial Guinea 1402985 8.500000e+02 14133 6.0142928
Nepal 29136808 2.261905e+04 426709 5.3008133
Lao People’s Democratic Republic 7275556 4.261905e+03 81363 5.2381362
Lesotho 2142252 6.990476e+03 135129 5.1731872
Kenya 53771300 6.509524e+04 1476253 4.4094907
Papua New Guinea 8947027 6.323810e+03 144143 4.3871777
Grenada 112519 1.952381e+00 45 4.3386243
Somalia 15893219 9.819048e+03 226946 4.3266009
Côte d’Ivoire 26378275 1.375714e+04 320104 4.2977104
Angola 32866268 2.052381e+04 484618 4.2350490
Ethiopia 114963583 5.547619e+04 1403111 3.9537991
Gambia 2416664 6.061905e+02 15357 3.9473235
Madagascar 27691019 1.271429e+04 326929 3.8890052
Gabon 2225728 2.414286e+03 62658 3.8531165
Congo 5518092 4.180952e+03 110152 3.7956209
Eswatini 1160164 3.306190e+03 88105 3.7525571
Niger 24206636 5.028571e+03 136131 3.6939209
Eritrea 3546427 9.442857e+02 26391 3.5780596
Comoros 869595 5.638095e+01 1623 3.4738726
Burundi 11890781 3.804762e+03 112033 3.3961082
Sudan 44053386 9.619048e+03 283706 3.3904985
Liberia 5057677 3.200000e+03 94522 3.3854552
Myanmar 54409794 5.452381e+04 1611582 3.3832476
Mauritania 4649660 1.154286e+03 34233 3.3718509
Saint Kitts and Nevis 53192 1.142857e+00 34 3.3613445
Zambia 18383956 1.814286e+04 540084 3.3592658
Burkina Faso 20903278 2.447619e+03 73031 3.3514796
Namibia 2540916 4.800000e+03 145488 3.2992412
Chad 16425859 4.866667e+03 149455 3.2562756
United Arab Emirates 9890400 5.071429e+01 1615 3.1402034
Barbados 287371 2.523810e+00 83 3.0407344
Democratic Republic of the Congo 89561404 5.414286e+04 1782554 3.0373754
Turkmenistan 6031187 9.304762e+02 31503 2.9536114
Afghanistan 38928341 1.252381e+04 424368 2.9511673
Dominica 71991 2.523810e+00 87 2.9009305
Guinea 13132792 5.214286e+03 185716 2.8076664
Timor-Leste 1318442 9.210526e+02 33995 2.7093768
India 1380004385 5.642857e+05 21337820 2.6445331
Uganda 45741000 1.861905e+04 722953 2.5754161
Thailand 69799978 2.042857e+04 793538 2.5743659
Sierra Leone 7976985 4.657143e+03 182622 2.5501543
South Africa 59308690 1.308571e+05 5179958 2.5262202
Botswana 2351625 2.438095e+03 101160 2.4101376
Senegal 16743930 2.995238e+03 125159 2.3931464
Ukraine 48838058 1.009524e+04 432085 2.3364010
Bangladesh 164689383 7.328571e+04 3137372 2.3358950
Curaçao 164100 6.363636e-01 28 2.2727273
Mali 20250834 2.076190e+03 97439 2.1307592
Djibouti 988002 5.195238e+02 24853 2.0903867
Benin 12123198 1.366667e+03 66849 2.0444085
Zimbabwe 14862927 1.154762e+04 567028 2.0365165
Algeria 43851043 2.957143e+03 149070 1.9837277
Sao Tome and Principe 219161 4.614286e+01 2412 1.9130538
Togo 8278737 8.190476e+02 43522 1.8819163
Antigua and Barbuda 97928 1.285714e+00 69 1.8633540
Yemen 29825968 2.357143e+03 127975 1.8418776
Indonesia 273523621 1.114286e+05 6311768 1.7654098
Iceland 341250 3.190476e+00 191 1.6704064
Libya 6871287 4.357143e+02 26284 1.6577168
Viet Nam 97338583 2.252381e+04 1369950 1.6441337
Saint Vincent and the Grenadines 110947 2.714286e+00 171 1.5873016
Saudi Arabia 34813867 9.585714e+02 62551 1.5324638
Russian Federation 146404890 2.307143e+04 1526742 1.5111544
Belarus 9871635 7.833333e+02 57333 1.3662870
Vanuatu 307150 2.452381e+01 1813 1.3526646
Azerbaijan 10139175 7.828571e+02 58866 1.3298970
Netherlands Antilles 198662 7.000000e-01 53 1.3207547
Pakistan 220892331 4.885714e+04 3717044 1.3144085
Saint Lucia 183629 2.809524e+00 216 1.3007055
Wallis and Futuna Islands 15098 4.285714e-01 33 1.2987013
Dominican Republic 10847904 8.385714e+02 66243 1.2659019
Rwanda 12952209 1.170952e+03 93814 1.2481638
Guyana 786559 1.469048e+02 11908 1.2336644
Bolivia (Plurinational State of) 11673029 1.833333e+03 151430 1.2106804
Puerto Rico 3670308 1.614286e+01 1337 1.2073940
Chile 19116209 5.533333e+02 45840 1.2070971
Estonia 1399111 5.914286e+01 4949 1.1950466
Haiti 11402533 3.052381e+03 265507 1.1496424
Ecuador 17643060 9.900000e+02 88066 1.1241569
occupied Palestinian territory, including east Jerusalem 5101416 5.142857e+00 459 1.1204482
Latvia 2384150 1.587619e+02 14293 1.1107668
Finland 5540718 5.857143e+01 5301 1.1049128
Japan 128555196 3.914286e+03 360088 1.0870359
Bhutan 771612 1.514286e+02 14613 1.0362593
Cambodia 16718971 5.357143e+03 520488 1.0292539
France 65273512 7.980952e+02 78081 1.0221376
Uzbekistan 33469199 3.080952e+03 301911 1.0204836
Hungary 10220509 1.811429e+02 18091 1.0012871
Greece 11234993 8.380952e+01 8410 0.9965461
Tajikistan 9537642 8.738095e+02 88628 0.9859294
Lithuania 3501842 2.742857e+02 28097 0.9762100
Turks and Caicos Islands 38718 4.761905e-01 49 0.9718173
Croatia 4428075 9.980952e+01 10464 0.9538372
Republic of Moldova 4202659 5.776190e+02 60595 0.9532454
Jamaica 2961161 1.890476e+01 2002 0.9442938
Suriname 586634 2.066667e+01 2217 0.9321906
Guatemala 17915567 5.090476e+02 54672 0.9310938
Panama 4314768 2.671429e+02 29156 0.9162535
Cabo Verde 555988 3.528571e+01 4059 0.8693204
Portugal 10604066 3.876190e+02 44802 0.8651825
Italy 60673694 4.347619e+02 50359 0.8633251
Aruba 106766 9.047619e-01 105 0.8616780
North Macedonia 2083458 5.319048e+01 6343 0.8385697
Honduras 9904608 4.880952e+02 58232 0.8381908
Philippines 109581085 2.942857e+04 3567369 0.8249377
Bahamas 393248 7.380952e+00 895 0.8246874
Mauritius 1271767 1.895238e+01 2322 0.8162093
Bosnia and Herzegovina 3765422 1.635238e+02 20175 0.8105269
Mexico 128932753 3.128571e+03 389743 0.8027268
Kazakhstan 18776707 2.295714e+03 288492 0.7957636
Trinidad and Tobago 1399491 3.342857e+01 4250 0.7865546
Peru 32971846 3.166667e+03 405028 0.7818390
Greenland 56968 7.000000e+00 907 0.7717751
Belize 397621 1.390476e+01 1818 0.7648384
Iraq 40222503 1.104762e+03 145762 0.7579218
Costa Rica 5094114 6.419048e+01 8867 0.7239255
Norway 5421242 3.233333e+01 4490 0.7201188
Marshall Islands 59194 2.204762e+01 3087 0.7142086
Micronesia (Federated States of) 115021 1.723810e+01 2416 0.7134973
Venezuela (Bolivarian Republic of) 30081827 9.666667e+02 139174 0.6945742
Solomon Islands 686878 4.928571e+01 7115 0.6927015
Poland 38556699 7.961905e+02 115385 0.6900294
Kyrgyzstan 6524191 7.090476e+02 102769 0.6899431
Nicaragua 6624554 2.480952e+02 36682 0.6763405
Armenia 3069597 1.391429e+02 20906 0.6655642
Colombia 50882884 1.519048e+03 230162 0.6599906
Serbia 9193818 1.542500e+02 23736 0.6498568
New Caledonia 285491 4.476191e+00 691 0.6477844
Sri Lanka 21413250 1.010476e+03 159305 0.6343029
Slovenia 2078932 1.823810e+01 2939 0.6205544
Austria 9006400 5.785714e+01 9390 0.6161570
Uruguay 3473727 8.914286e+01 14634 0.6091489
Bulgaria 7997951 2.087619e+02 34331 0.6080857
Paraguay 7132530 2.766667e+02 45569 0.6071379
Sint Maarten (Dutch part) 42882 1.818182e-01 30 0.6060606
Fiji 896444 3.128571e+01 5229 0.5983116
French Polynesia 280904 5.333333e+00 900 0.5925926
Slovakia 5459643 4.109524e+01 6944 0.5918093
China, Macao SAR 649342 3.661905e+01 6336 0.5779521
Spain 47084242 5.247619e+02 91350 0.5744520
Mongolia 3278292 3.828571e+02 66759 0.5734914
Germany 83783945 4.361905e+02 76671 0.5689119
Kiribati 119446 4.185714e+01 7376 0.5674775
Seychelles 98340 1.428571e+00 257 0.5558644
Czechia 10708982 5.971429e+01 10832 0.5512766
Sweden 10099270 5.119048e+01 9327 0.5488418
Ireland 4937796 3.295238e+01 6056 0.5441278
Romania 22137423 1.607619e+03 296657 0.5419117
Andorra 84461 5.238095e-01 97 0.5400098
Nauru 10834 6.190476e-01 116 0.5336617
Northern Mariana Islands 58412 4.095238e+00 777 0.5270577
Morocco 36910558 2.785714e+03 530361 0.5252487
Tonga 105697 1.333333e+00 258 0.5167959
Brazil 212559409 7.542857e+03 1488878 0.5066135
Lebanon 6859408 5.414286e+01 10731 0.5045462
Republic of Korea 51269183 3.023810e+03 600809 0.5032897
Tuvalu 11792 1.952381e+00 391 0.4993302
United States of America 331002647 9.514286e+02 192395 0.4945183
Palau 19861 1.095238e+00 224 0.4889456
Malaysia 32365998 1.847619e+03 382075 0.4835750
China 1439323774 6.280952e+04 13048274 0.4813627
Canada 37742157 1.331429e+02 27800 0.4789311
Denmark 5792203 2.647619e+01 5563 0.4759337
Belgium 11589616 7.457143e+01 16070 0.4640412
Argentina 45195777 8.595238e+02 185463 0.4634476
Luxembourg 625976 2.285714e+00 494 0.4626952
Egypt 102334403 7.119048e+02 158614 0.4488285
Cyprus 1207361 3.380952e+00 755 0.4478083
Israel 8655541 3.533333e+01 7941 0.4449482
Iran (Islamic Republic of) 83992953 7.623810e+02 177944 0.4284387
Maldives 540542 9.904762e+00 2326 0.4258281
Brunei Darussalam 437483 1.685714e+01 3963 0.4253632
United Kingdom of Great Britain and Northern Ireland 67886004 4.709524e+02 113020 0.4166983
Guam 168783 6.428571e+00 1546 0.4158196
Netherlands 17134873 6.390476e+01 15594 0.4098035
Tunisia 11818618 1.657143e+02 41880 0.3956884
Switzerland 8654618 3.485714e+01 8880 0.3925354
Cuba 11339255 5.209524e+01 13321 0.3910760
Georgia 4362184 2.342857e+02 60364 0.3881216
New Zealand 4822233 1.809524e+01 5207 0.3475175
Türkiye 84339067 7.809524e+02 228680 0.3415045
Jordan 10203140 2.204762e+01 6639 0.3320925
Cayman Islands 65720 1.428571e-01 44 0.3246753
El Salvador 6486201 1.328571e+02 41875 0.3172708
Oman 5106622 1.938095e+01 6177 0.3137600
Australia 25499881 6.438095e+01 21080 0.3054125
Malta 441539 2.666667e+00 900 0.2962963
Bahrain 1701583 9.142857e+00 3107 0.2942664
China, Hong Kong SAR 7496988 2.285714e+02 83901 0.2724299
Singapore 5850343 8.533333e+01 31895 0.2675445
Bermuda 66260 4.761900e-02 20 0.2380952
Albania 3129701 1.404762e+01 7744 0.1814000
Kuwait 4270563 2.247619e+01 12708 0.1768665
Montenegro 628062 2.250000e+00 1574 0.1429479
Samoa 198410 2.047619e+00 1687 0.1213764
American Samoa 59684 4.761900e-02 45 0.1058201
Syrian Arab Republic 21362541 4.419048e+01 63208 0.0699128
Qatar 2881060 5.809524e+00 8619 0.0674037
British Virgin Islands 30237 0.000000e+00 6 0.0000000
Cook Islands 19094 0.000000e+00 19 0.0000000
Monaco 39244 0.000000e+00 5 0.0000000
Montserrat 4999 0.000000e+00 4 0.0000000
Niue 1902 0.000000e+00 5 0.0000000
San Marino 33938 0.000000e+00 1 0.0000000
Tokelau 1550 0.000000e+00 2 0.0000000

visualising countries with the highest tb mortality rate and has tuberculosis cases of over 100000

# visualising countries with the highest tb mortality rate and has tuberculosis
# cases of over 100000
tb_mortality %>% 
  filter(total_cases > 100000) %>%
  head(10) 
## # A tibble: 10 × 5
##    country                     total_population tb_death total_cases mortality…¹
##    <chr>                                  <dbl>    <dbl>       <dbl>       <dbl>
##  1 Mozambique                          31255435   23762.      120779       19.7 
##  2 Nigeria                            206139587  138286.     1373968       10.1 
##  3 Central African Republic             4829764    9381.      120664        7.77
##  4 Ghana                               31072945   16476.      231460        7.12
##  5 United Republic of Tanzania         59734213   65952.     1014853        6.50
##  6 Cameroon                            26545864   18095.      279837        6.47
##  7 Malawi                              19129955   14790.      235457        6.28
##  8 Nepal                               29136808   22619.      426709        5.30
##  9 Lesotho                              2142252    6990.      135129        5.17
## 10 Kenya                               53771300   65095.     1476253        4.41
## # … with abbreviated variable name ¹​mortality_rate

Note: All top 10 countries with highest tuberculosis mortality rate are African countries. This shows the lack and access to quality health care services in virtually all countries in Africa.

#worldwide tuberculosis death rate 
tb_mortality %>% 
  summarise(., total_cases = sum(total_cases, na.rm = TRUE),
            total_death = sum(tb_death, na.rm = TRUE)) %>% 
  mutate(., rate = (total_death/total_cases)*100)
## # A tibble: 1 × 3
##   total_cases total_death  rate
##         <dbl>       <dbl> <dbl>
## 1    87203222    1925091.  2.21

Note: The global tuberculosis mortality rate is 2.21

Conclusion

After thoroughly analyzing the WHO data set and answering questions through visualizations, below are the insight gotten from the data by the Exploratory analysis carried out on the data.
#### The first question
What has been the trend of tuberculosis cases from 2000 to 2020?” reveals the tuberculosis trends throughout all the years. I was seen that from 2000 till 2020 the trend has been progressing upward but a drastic drop was recorded in 2013. 2019 is the year with the highest ever recorded tuberculosis cases ever.
#### The second question
“What is the trend of tuberculosis cases in countries across the globe?” gave us insight to know countries that are heavy burdened with tuberculosis, we further checked the least burdened countries with tuberculosis. The top and least ten countries with tuberculosis cases were analyze and plotted for easy visualization. The nine out of ten countries with the most tuberculosis cases were Asian countries this is due to the large population in the region.
#### The third question
“What is the pattern of tuberculosis cases across age groups and sex” reveals the age group that are predominant. It was gathered that what causes tuberculosis is the weakening of the immune system of individuals by the bacteria Mycobacterium tuberculosis. Mid-aged individuals (20-64) mostly engage in activities and habit such as smoking, alcohol intake and substance abuse that weakens the human immune system. Checking the total cases amongst male and females. The male gender are likely possible of engaging in health hazarduous activities than their female conterpart. #### The fourth question
“Tuberculosis cases with HIV across the globe”