FIGURES UPDATED 3-15-20
You’ve probably seen various graphs online showing the increase in coronavirus cases in various countries.
I don’t know that I am adding any value here, but I decided to reproduce those graphs using the convenient data from Johns Hopkins Coronavirus Research Center. (I can’t vouch for the reliability of the Johns Hopkins data, but it seems to be what most news organizations are relying on.) Here’s one showing cases for all countries other than China that have reported at least 100 cases. The x axis is days after the 100-case mark was reached.
What we see here is that most countries show a consistent 35-45 percent daily growth in reported cases. Only Japan, at 20 percent, and Singapore and more recently Korea, at around 10 percent, depart significantly from this. It’s also interesting how stable the growth of cases in Japan has been over the past three weeks.
Now here is the same figure, but for US counties that have reported at least 10 cases. The x axis here is days since the first day with a least 10 cases. [UPDATE: I have stopped updating this graph since the Johns Hopkins site now reports cases only for US states.]
What surprises me here is that we basically see the same ~40 percent daily growth rate in cases. I would have thought that given all the insitutional differences and issues around testing, the US picture would have looked somehow different. But it seems like we might reasonably extrapolate from the international experience, that New York or Seattle could reach 10,000 cases in the next two weeks.
I have no expertise whatsoever on infectious diseases, so I am not going to say anything else about this.
Anyway, here is the R code if you want to produce figures like these from the most current data on the Johns Hopkins site. They also give latitude and longitude for every place included, so it wouldn’t be much more work to make an interactive map. Could be an interesting excercise for anyone teaching a statistics or data science course.
UPDATE: I had to change the code because Johns Hopkins is no longer reporting data for US places other than states. Here is the equivalent state-level figure. As you can see, the states with significant numbers of cases all show the same 40 percent daily growth rate.
#install.packages('reshape2')
#install.packages('ggplot2')
library(reshape2)
library(ggplot2)
corona <- read.csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv',
stringsAsFactors = F)
state.abbrevs <- read.csv('https://raw.githubusercontent.com/jasonong/List-of-US-States/master/states.csv', stringsAsFactors = F)
corona[,-1:-4] <- apply(corona[,-1:-4], 1:2, FUN=as.numeric)
US <- corona[corona$Country.Region=='US', ]
USplaces <- US[grep(',',US[,1]),][,-2:-4]
USstates <- US[grep(',',US[,1], invert=T),][,-2:-4]
USstates <- USstates[grep('Princess',USstates[,1], invert=T),]
states.temp <- USstates
for (i in 1:nrow(USstates)){
abbrev <- state.abbrevs[state.abbrevs[,1]==USstates[i,1],2]
rows <- grep(abbrev, USplaces[,1])
states.temp[i, -1] <- colSums(USplaces[rows, -1])
}
states.temp<- states.temp[!is.na(states.temp[,1]),]
USstates[,-1] <- USstates[,-1] + states.temp[,-1]
corona <- rbind(corona, c('', 'United States', NA, NA, colSums(USplaces[,-1])))
corona[,-1:-4] <- apply(corona[,-1:-4], 1:2, FUN=as.numeric)
china <- corona[corona$Country.Region=='China',-2:-4]
corona <- rbind(corona, c('', 'China', NA, NA, colSums(china[,-1])))
corona[,-1:-4] <- apply(corona[,-1:-4], 1:2, FUN=as.numeric)
countries <- corona[corona$Province.State =='', c(-1, -3:-4)]
makePlot <- function (x, threshold){
cases <- redate(x, threshold)
data <- melt(cases, id.vars=1)
names(data) <- c('place', 'day', 'cases')
p <- ggplot(data, aes(x=day, y=cases, group=place)) +
geom_line(aes(color=place))+
geom_point(aes(color=place, shape=place))
p <- p + scale_shape_manual(values=1:length(levels(as.factor(data$place))))
p <- p + theme(axis.text.x = element_text(angle=45))
p <- p + scale_y_continuous(trans='log10')
p
}
redate <- function (x, threshold) {
out <- data.frame(place='', stringsAsFactors = F)
values <- matrix(nrow=99, ncol=99)
n <- 0
for (i in 1:nrow(x)) {
v <- x[i,-1][x[i,-1] > threshold]
l <- length(v)
if (l > 1) {
n <- n + 1
out[n,1] <- x[i,1]
values[n,1:l] <- v
}
}
values <- values[!is.na(values[,1]),]
cols <- length(colSums(values, na.rm=T)[colSums(values, na.rm=T) > 0])
values <- values[,1:cols]
out <- cbind(out, values)
}
makePlot(USplaces, 9)
makePlot(USstates, 10)
makePlot(countries[countries$Country.Region != 'China',], 99)