Covid-19: A case fatality rate app for US counties

I made a web app on estimating the case fatality rate (CFR) of Covid-19 across all the US counties. I use a binomial GLMM with nested random effects (state, state:county) using the R package lme4. Every time you reload the app, it fetches the most recent data and re-estimates the model.

The model “shrinks” the simple CFR estimates (dividing deaths by cases) at the county level by “learning” across the other counties within the state and by “learning” across the states themselves. The model pulls in or pushes out estimates that are too large or too small because they come from a county with a small sample size. It’s a bit like trying to estimate the true scoring rate of the NHL teams after watching only the first 10 games of the season. There will be a couple of blow-outs and shut-outs and we need to correct for those atypical results in small samples – but we should probably shrink the Leafs ability to score down to zero just to be safe 😉

The CFR data has limitations because the people who get tested are biased toward being sick, often very sick. The infection fatality rate (IFR), which is what we all really want to know, requires testing far more people. Recent evidence suggests that the IFR will end up much lower than the current CFR estimates.

The app shows the how the naive empirical estimate of the CFR compares to the shrunk estimator from the model. I also provide a forest plot to show the prediction intervals of the model estimates, including the contributions from the random effects. The prediction intervals I report are conservative. I use R’s merTools predictInterval() to include uncertainty from the residual (observation-level) variance, and the uncertainty in the grouping factors by drawing values from the conditional modes of the random effects using the conditional variance-covariance matrix. I partially corrected for the correlation between the fixed and random effect. Prediction interval estimation with mixed models is a thorny subject and short of a full Bayesian implementation, a full bootstrap of the lme4 model is required for the best estimates of the prediction interval. Unfortunately, bootstrapping my model takes too long for the purposes of my app (and so does the MCMC in a Bayesian implementation!). For details on use of the use of merTools::predictInterval(), see Prediction Intervals from merMod Objects by Jared Knowles and Carl Frederick.

Hopefully Covid-19 will pass soon. Stay safe.

2 thoughts on “Covid-19: A case fatality rate app for US counties”

  1. Hi David,

    How are you?
    Great job!

    BTW what tutorial would you reccomend for R package lme4?
    Or would you rather reccomend another programming language and library (especially if this one requires a paid subscription?)?
    I did take an introductory MS ML course with R a few years ago…

    Finally when do we play tennis again? 🙂
    Cheers,
    Marian

    1. Hello Marian,

      A place to start would be the vignettes associated with lme4 on CRAN: https://cran.r-project.org/web/packages/lme4/index.html

      You might also find these articles useful: https://www.jstatsoft.org/article/view/v067i01 and https://arxiv.org/pdf/1406.5823.pdf
      (The last one having a nicer display for code blocks).

      I like R. It’s a bit of a peculiar language, but it’s super powerful and a lot fun when you get used to it. The tidyverse is a big help. Shiny, for making interactive websites, is really impressive.

      When do we get back to tennis – who know?! I hope it’s sooner than later.

Leave a Reply

Your email address will not be published.