My data science team continues to research COVID-19 propagation and measures that we can take in work environments to limit spread. We keep a sharp eye on the literature for interesting and novel statistical techniques applied to COVID-19 data and we recently came across a wonderful paper by Simon N. Wood. Readers of this blog might recognize Professor Wood’s name from a previous blog post where I promoted his book on Generalized Additive Models.
In his new paper Did COVID-19 infections decline before the UK lockdown?, Professor Wood examines the arrival dates of fatal infections across the England and Wales and determines when fatal infections peaked. He finds that fatal infections were in substantial decline at least five or six days before the lockdowns started. Furthermore, he finds that the fatal infection profile does not exhibit a regime change around the lockdown date and that the profile for England and Wales follows a similar trajectory as Sweden. The result here is important because Professor Wood focuses on the most reliably collected data – deaths due to COVID-19. Studies that focus on case counts to infer epidemiological parameters are always compromised by data that is highly truncated and censored, often in ways that are largely unknown to the researcher. While we can gain some insight from such data, results are often as informed by prior beliefs as much as by the data itself leaving us in an unfortunate position for constructing scientifically based policy.
Death data are different. In this case, the clinical data directly measure the epidemiological quantities of interest. Death data from COVID-19, while not perfect, are much better understood and recorded than other COVID-19 quantities. To understand the effect of interventions from lockdowns, what can we learn from the arrival of fatal infections without recourse to strong modelling or data assumptions? This is where Professor Wood’s paper really shines.
Before discussing Professor Wood’s paper and results, let’s take a trip down epidemiological history lane. In September 1854, London experienced an outbreak of cholera. The outbreak was confined to Broad Street, Golden Square, and adjoining streets. Dr. John Snow painstakingly collected data on infections and deaths, and carefully curated the data into geospatial representations. By examining the statistical patterns in the data, questioning outliers, and following up with contacts, Dr. Snow traced the origin of the outbreak to the Broad Street public water pump. He made the remarkable discovery that cholera propagated through a waterborne pathogen. The handle to the pump was removed on September 8, 1854, and the outbreak subsided.
But did removing the pump handle halt the cholera outbreak? As a cause and effect statement, Dr. Snow got it right, cholera transmission occurs through contaminated water, but evaluation of the time series data show that the removal of the handle of the Broad Street water pump is not conclusively linked to the cause of the outbreak subsiding. Edward Tufte has a wonderful discussion of the history of Dr Snow’s statistical work in Visual Explanations. 5th edition. Cheshire, Connecticut: Graphics Press, 1997, Chapter 2, Visual and Statistical Thinking: Displays of Evidence for Making Decisions. Let’s look at the time series of deaths in the area of London afflicted by the cholera outbreak in the plots below.
We clearly see that deaths were on the decline prior to the pump handle’s removal. People left the area, and people modified their behaviour. While the removal of the pump handle probably prevented future outbreaks and Dr. Snow’s analysis certainly contributed heavily to public health, it’s far from clear that the pump handle’s removal was a leading cause in bringing the Broad Street outbreak under control. Now, if we aggregate the data we can make it look like removing the pump handle was the most important effect. See the lower plot in the above figure. Tufte shows what happens if we aggregate on a weekly basis, and the confounding becomes even greater if we move the date ahead by two days to allow for the lag between infection and death. With aggregation we arrive at a very misleading picture, all an artifact of data manipulation. Satirically, Tufte imagines what our modern press would have done with Dr. Snow’s discovery and the public health intervention of removing the handle with the following graphic:
Fast forward to 2020 – Professor Wood is our modern day Dr. Snow. The ultimate question that Professor Wood seeks to answer is: When did the arrival of fatal infections peak? He is looking to reconstruct the time course of infections from the most reliable data sources available. We know from the COVID-19 death data that deaths in the UK eventually declined after the lockdowns came into effect (March 24, 2020) which seems to point to the effectiveness of the intervention. But an infection that leads to a fatality takes time. Professor Wood builds a model, without complex assumptions, to account for this lag and infer the infection date time series. He works with COVID-19 death data from the Office of National Statistics for England and Wales, the National Health System hospital data, and the Folkhälsomyndigheten daily death data for Sweden. In the figure below we see his main result: In the UK, COVID-19 fatal infections were in decline prior to the lockdowns, peaking 5 to 6 days earlier. The UK profile follows Sweden which did not implement a lockdown.
The technique he uses is rather ingenious. He uses penalized smoothing splines with a negative binomial counting process, while allowing for weekly periodicity. The smooth that picks up the trend in deaths is mapped back to the arrival of the fatal infections using the distribution of infection to death. Based on hospitalization data and other sources, the distribution is well described by a lognormal with a mean of 26.8 days and standard deviation of 12.4 days. The mapping matrix that uses the distribution is near singular but the smoothing penalty handles this problem.
One might be tempted to think that the time series reconstruction might be biased in the sense that an intervention will always see the peak behind the intervention date and that the distribution of time until death from a fatal infection smears the the peak backward. Thus, we might be fooled into believing that a peak with a decline through the intervention date might not be caused by the intervention when in fact the effect was generated by the intervention with a sharp discontinuity. Professor Wood model checks with simulated data in which fatal infections arrive at high rate and then plummet at a lockdown intervention. He then tests how well the method captures the extreme discontinuity. We can see that method does very well in picking up the discontinuity in the figure below.
There are issues that could undermine the conclusions and Wood expounds on them in his paper. The problem of hospital acquired infections is important. People already in the hospital are often weak and frail and thus the duration of COVID-19 until death will be shortened should they become fatally infected. Professor Wood is focusing on community transmission since it is this effect that lockdowns and social distancing targets. Hospital acquired transmissions will bias the inference, but the proportion of hospital acquired infections in the death data would have to be quite high for it to radically alter the conclusions of Wood’s results. He discusses a mixture model to help understand this effect. There are also problems concerning bias in the community acquired fatal disease duration including the possibility of age dependent effects. Again, to substantially change the conclusions, the effects would have to be large.
Professor Wood is careful to point out that his paper does not prove that peak fatal infections occurred in England and Wales prior to the lockdowns. But the results do show that in the absence of strong assumptions, the most reliable data suggest that fatal infections in England and Wales were in decline before the lockdowns came into effect with a profile similar to that of Sweden. Like Dr. Snow’s pump handle, the leading effects that caused the decline in deaths in the UK may not have been the lockdowns, but the change in behaviour that had already started by early March, well before the lockdowns.
Professor Wood’s results may have policy implications and our decision makers would be wise to include his work in their thinking. We should look to collect better data and use similar analysis to understand what the data tell us about the effectiveness of any public health initiative. At the very least, this paper weakens our belief that the blunt instrument of lockdowns is the primary mechanism by which we can control COVID-19. And given the large public health issues that lockdowns also cause – everything from increased child abuse to future cancer patients who missed routine screening to increasing economic inequality – we must understand the tradeoffs and the benefits of all potential actions to the best of our ability.