Thomas Farrar
06/04/2020
South Africa is in the midst of a lockdown in which most of us are confined to our homes. Most of us have some time on our hands and a desire to better understand the global COVID-19 pandemic. One can easily find some nice infographics online such as that of The Financial Times. For budding young data scientists out there, a timely challenge could be to create your own basic COVID-19 infographic. In this post we will learn how to do just that in R.
Getting Started
In order to proceed you will need to download R and ideally RStudio as your programming user interface. This post assumes some basic familiarity with R programming.
Downloading the Data
The first thing we need is data. The European Centre for Disease Prevention and Control has got us covered with a freely downloadable spreadsheet that is updated daily (albeit usually about 24 hours behind the latest numbers available through mainstream media). Now, we may want to update our infographic repeatedly. To save us from having to repeatedly access that website in a browser and download the spreadsheet, we can do the following. First, use setwd()
to set the working directory to your folder path of choice. Then, the following code will download today’s version of the data to the folder you specified.
todayurl <- paste0("https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-", Sys.Date(), ".xlsx")
todayfilename <- paste0("COVID-19-geographic-disbtribution-worldwide-", Sys.Date(), ".xlsx")
download.file(url = todayurl, destfile = todayfilename, mode = "wb")
Note that the above code will probably throw an error if you run it at two minutes past midnight, because the daily spreadsheet is updated sometime in the morning European time.)
Next, we need to import the Excel spreadsheet into R. For this, we can use the R package xlsx
. If you don’t already have this package installed, make sure you’re connected to the Internet and run the command install.packages("xlsx")
. Then, execute the following code:
# Import spreadsheet and save as data.frame object called dat
dat <- as.data.frame(xlsx::read.xlsx(file = todayfilename, sheetIndex = 1))
# Extract South African data and save as data.frame object called zadat
zadat <- dat[dat$countriesAndTerritories == "South_Africa", ]
if (nrow(zadat) == 0) zadat <- dat[dat$Countries.and.territories == "South_Africa", ]
nrow(zadat)
## [1] 30
# Save South African daily number of new cases to a variable newcases
newcases <- rev(zadat$cases)
days <- rev(zadat$dateRep)
The reason for the if
statement above is that the exact name of the ‘Countries and Territories’ column seems to change from day to day. If nrow(zadat)
returns 0, just type zadat$
and confirm the exact spelling of the ‘Countries and Territories’ column and amend the code accordingly. (Similarly, you may need to change zadat$cases
to zadat$Cases
if that changes at some point.)
Line Plot of South African Daily New COVID-19 Cases
One basic infographic we may want is a line plot showing the number of new COVID-19 cases and deaths over time.
par(mar = c(6, 4, 0.1, 0.1))
plot(1:length(days), newcases, type = "p", pch = 20, xlab = "", axes = FALSE,
ylab = "", col = "blue", ylim = c(0, max(newcases) * 1.1))
box()
axis(side = 1, at = 1:length(days), labels = days, las = 2, cex.axis = 0.75)
axis(side = 2, las = 1)
lines(1:length(days), newcases, col = "blue")
abline(v = which(days == "2020-03-27"), lty = "dashed")
text(23.5, max(newcases) * 1.1, labels = "Lockdown begins", cex = 0.6)
mtext("Date", side = 1, line = 5)
mtext("Number of New Cases", side = 2, line = 3)
The lockdown began at midnight on Friday 27 March. By indicating this on the graph with a vertical line, we can more clearly see the impact the lockdown has apparently had in “flattening the curve.” It is a striking change to what was, up to that point, resembling an exponential growth curve. This appears to vindicate the wisdom of President Ramaphosa’s decision to impose the lockdown.
And there you have it! Without too much effort we have produced an attractive and meaningful COVID-19 infographic. The best part is, if you save your R script and then open it in a week and run it again, your infographic should automatically update to reflect the latest COVID-19 case data.