Serology studies are front-and-center in the news these days. Reports out of Santa Clara county, California, San Miguel County, Colorado, and Los Angeles suggest that a non-trivial fraction, more than 1%, of the population has SARS-CoV-2 antibodies in their bloodstream. European cities are following suit – they too are conducting serology studies and finding important fractions as well. The catch is that many of these studies find an antibody prevalence comparable to the false positive rate of their respective serology tests. The low statistical power associated with each study has invited criticism, in particular, that the results cannot be trusted and that the study authors should temper their conclusions.
But all is not lost. Jerome Levesque (also a federal data scientist and the manager of the PSPC data science team) and I performed a meta-analysis on the results from Santa Clara County (CA), Los Angeles County (CA), San Miguel County (CO), Chelsea (MA), Geneve (Switzerland), and Gangelt (Germany). We used hierarchical Bayesian modelling with Markov Chain Monte Carlo (MCMC), and also generalized linear mixed modelling (GLMM) with bootstrapping. By painstakingly sleuthing through pre-prints, local government websites, scientific briefs, and study spokesperson media interviews, we not only obtained the data from each study, but we also found information on the details of the serology test used in each study. In particular, we obtained data on each serology test’s false positive rate and false negative rate through manufacturer websites and other academic studies. We take the data at face value and we do not correct for any demographic bias that might exist in the studies.
Armed with this data, we build a generalized linear mixed model and a full Bayesian model with a set of hyper-priors. The GLMM does the usual shrinkage estimation across the study results, and across the serology test false positive/negative rates while the Bayesian model ultimately generates a multi-dimensional posterior distribution, including not only the false positive/negative rates but also the prevalence. We use Stan for the MCMC estimation. With the GLMM, we estimate the prevalence by bootstrapping with the shrunk estimators, including the false positive/negative rates. Both methods give similar results.
We find that there is evidence of high levels of antibody prevalence (greater than 1%) across all reported locations, but also that a significant probability mass exists for levels lower than the ones reported in the studies. As an important example, Los Angeles show a mode of approximately 4%, meaning that about 400,000 people in that city have SARS-CoV-2 antibodies. Given the importance of determining societal-wide exposure to SARS-CoV-2 for correct inferences of the infection fatality rate and for support to contact tracing, we feel that the recent serology studies contain an important and strongly suggestive signal.
Our inferred distributions for each location: