AWS EC2 with RStudio login through a browser

Over the course of the last few years, I have become reliant on AWS for running large jobs. The COVID-19 modelling work I was involved with required some high performance computing and AWS is an excellent solution.

There are some great resources for getting started with AWS EC2. I highly recommend the site, Running R on AWS EC2 and Logging into RStudio from Anywhere, for details on how to get an instance with R and RStudio up and running in a step-by-step fashion.

I have created a Bash script (see below) that you can scp to your instance and run. It will set up an instance with R, RStudio IDE that you can log into through a browser, and a Shiny server. Make sure to open port 80 (HTTP) and add custom TCP rules for ports 8787 and 3838 in the security configuration details through your EC2 dashboard. RStudio will run through port 8787 and Shiny through 3838. Also, ensure that you make the script executable with the command: chmod +x aws_R_setup.sh (or whatever you choose to call the bash script).

Once you have run the script, add a user (e.g. sudo adduser rstudiouser) and then add the new user to the sudo group. The commands are at the bottom of the script commented out.

After you have set the user, login to RStudio through your browser using your ec2 address with port 8787 (e.g., ec2-35-183-78-154.ca-central-1.compute.amazonaws.com:8787). And volia! You will have the RStudio IDE running through your browser that you login to using the username and password you created. For the Shiny server, access using port 3838 (e.g., ec2-35-183-78-154.ca-central-1.compute.amazonaws.com:3838).

Of course it is always possible to use an existing AMI on EC2, but having the flexibility to create your own system can sometimes be useful.

#EC2 script for setting up an R instance on AWS
totalmem=$(free -g | grep -oP '\d+' | head -n 1)
if [ "$totalmem" -lt "2" ]; then
	sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=2048
	sudo /sbin/mkswap /var/swap.1
	sudo /sbin/swapon /var/swap.1
	sudo sh -c 'echo "/var/swap.1 swap swap defaults 0 0 " >> /etc/fstab'
	sudo echo "created swap file memory to avoid using bigger instance"
fi

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9

sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'

# Update ubuntu package repo, to get latest R
sudo apt -y update
sudo apt -y upgrade

# Install R
sudo apt -y install r-base r-base-dev

# Install shiny before shiny-server
sudo R -e "install.packages('shiny')"

# Install debian package manager, gdebi
sudo apt install gdebi-core


# Install Shiny Server 
wget https://download3.rstudio.org/ubuntu-14.04/x86_64/shiny-server-1.5.16.958-amd64.deb
sudo gdebi shiny-server-1.5.16.958-amd64.deb
sudo rm shiny-server-1.5.16.958-amd64.deb

sudo apt -y install libcurl4-openssl-dev 

sudo apt -y install libssl-dev libxml2-dev libmariadbclient-dev build-essential libcurl4-gnutls-dev


# Install RStudio

wget https://download2.rstudio.org/server/bionic/amd64/rstudio-server-1.4.1717-amd64.deb
sudo gdebi rstudio-server-1.4.1717-amd64.deb
sudo rm rstudio-server-1.4.1717-amd64.deb


sudo R -e "install.packages('RCurl', repos='http://cran.rstudio.com')"
sudo R -e "install.packages('devtools', repos='http://cran.rstudio.com')"
sudo R -e "install.packages('tidyverse')"

sudo apt -y install default-jdk
sudo R CMD javareconf


# Change permissions for R library
sudo chmod 777 -R /usr/local/lib/R/site-library

#Now add user info to login RStudio for example:
#sudo adduser rstudiouser

#Then add rstudiouser to sudo group:
#sudo usermod -aG sudo rstudiouser


COVID-19 spring redux: more cowbell

Although we’re more than a year into the COVID-19 pandemic, our ability to protect vulnerable people hasn’t improved much. On a medical front, we’ve advanced with better triage, better treatments, and now a vaccination program – but our public policy looks pretty much the same as it did last spring, expect in Ontario, this time with enhanced police powers.

Ontario is seeing a tidal wave of new infections. While new variants apparently increase risk among younger demographics, the 60+ age group and those with comorbidities experience the majority of the severe illnesses and deaths. Earlier this month, Dr. Micheal Warner pointed out that a person under 50 dies from COVID-19 in Ontario ICUs every 2.8 days. That rate among the young is higher than in the past, but with a current Ontario COVID-19 death rate of about 20 per day, that means about 60 old people are dying for every person under 50, who almost certainly has comorbidities. It’s pretty clear that COVID-19 and its variants are vastly riskier for the elderly and those with comorbidities. Unfortunately, our vaccination program has a long way to go to protect the vulnerable.

Since this pandemic began, our governments at all levels have focused on lockdowns, school closures, travel restrictions, and curfews as the primary mechanism to control COVID-19 community spread. For the last year these policies have shifted most of the suffering onto the poorest people in our society – our essential workers – who are employed in everything from meat-packing plants to supermarkets. Even the homeless haven’t escaped. In January, Ottawa homeless shelters paused intake due to an outbreak among residents and staff. Not only have the poorest faced the brunt of COVID-19, but the vaccination programs have had the greatest penetration with the well-off. For all the rhetoric and all the treasure we’ve spent, we have not protected the vulnerable.

Is this the best we can do?

Last spring, our governments implemented emergency measures in a state of overwhelming panic. This time around, we appear to be recycling the same emergency policy playbook, and in the process, discarding a year’s worth of hard won knowledge about COVID-19. Public Health Ontario has publicly stated that focused measures to protect the vulnerable are not possible:

While age may be the major driver of differential mortality risk, there are important equity-based considerations in understanding risk of dying from COVID-19 in non-elderly populations if public health measures were to focus on the risk to the elderly only….Public health agencies globally have supported the use of physical distancing measures and lockdowns to control community transmission of COVID-19, recognizing that high community transmission renders protection or shielding of higher-risk populations virtually impossible due to the connections within the broader community in which they live. Further, an approach to strict isolation of at-risk populations can also be considered inhumane and unethical.

I am not sure how Public Health Ontario comes to the conclusion that shielding higher-risk populations is virtually impossible, but it certainly points to a lack of creativity and no sense of resolve. The situation reminds me of the Berlin Airlift.

In 1948, as the Cold War was getting started, the Soviet Union prevented all surface traffic from non-Soviet zones from entering West Berlin. Stalin hoped to drive the Western allies out of Berlin, undermine the Marshall Plan for rebuilding the German economy, and place the Soviet Union as the dominant power in Europe. At the time of the blockade, West Berlin had about a month’s worth of food and fuel. The Soviets figured it was impossible to keep West Berlin supplied and the West would quickly capitulate. We didn’t. We organized an airlift that kept West Berlin provisioned for 15 months. In the end it was the Soviet Union that relented. At its peak, the airlift had a plane landing in Berlin every 30 seconds, planes filled with everything the city needed from coal to potatoes. Our response to COVID-19 looks nothing like this.

Constructing an optimal policy should use all available information. We know who the most vulnerable are: the elderly and those with comorbidities such as obesity, hypertension, and diabetes. We must use all we know about COVID-19, pandemics, and public health more generally. Two points stand out:

1) We now know that, for those over age 70, the survival rate from COVID-19 is approximately 95%. That is a terrible problem. For the vast majority of us who are at little risk, the survival rate is at least 99.95% and most likely higher. Comorbidities also increase risk. Healthy people under age 25 have a survival rate of effectively 100%. That is, the high risk group is thousands of times more likely to suffer a severe outcome. Given the extreme differences in survival rates between different population subgroups, it would be surprising if the optimal policy did not include that information.

2) As a biological fact, all pandemics end by herd immunity either by natural infection, vaccination, or some combination of both. The optimal policy that ends the pandemic gets us to herd immunity with the least total harm. Covid-zero is not possible with such a widespread pathogen and the social determinants of health stretch well beyond modern medicine.

Implicitly, our governments are telling us that the first point is irrelevant. They are saying that the optimal policy that offers the best protection to the vulnerable from COVID-19 is to treat the disease as though the risk of severe illness or death is exactly the same for all people in society. Our governments have said that for all the money we’ve spent and for all the economic burden that we have placed on our children and the least financially well-off in our society, there is little that we can do differently. In their view, our polices are close to optimal, and that our knowledge of who is most affected by the virus is of no practical significance for public health policy development. Is it really true that no other policy which uses this information is feasible, politically or otherwise?

The second point has become hopelessly and needlessly politicized. Herd immunity has somehow been equated to a political statement. It is not; it’s biology. As the Harvard epidemiologist Martin Kulldorff says, “it’s weird and stunning to have this discussion about herd immunity—flockimmunitet in Swedish. You wouldn’t have physicists talking about whether we believe in gravity or not. Or two airline pilots saying, ‘Should we use the gravity strategy to get the airplane down on the ground?’ Whatever way they fly that plane—or not fly it—gravity will ensure eventually that the plane is going to hit the ground.” However we get there, herd immunity is the only thing that will end the pandemic.

So what would an optimal policy look like?

First, anyone who is at high risk and who has to work in settings ripe for spread should immediately be put on long term disability that replaces their wage. We should not be sending pre-diabetic, hypertensive, heavy-set 50+ year-olds to essential workplaces. We need legions of young healthy people to step up, train quickly, and fill in. If 20 year-olds with minimal training – often less than a few hours in a Spitfire or a Hurricane – won the Battle of Britain, we can find a way to use our young healthy population to replace high risk workers in everything from meat-packing plants to hospitals. Second, for those in the high risk group who cannot self-isolate, such as poor elderly immigrants in multi-generational homes, re-purpose our hotels. Again, have legions of young and low risk people isolate with them, on rotation, to run operations. Ensure high levels of testing. For those who can self-isolate, our army of young people can still help by delivering groceries, medications, and other necessities. Third, let everyone else go about their normal lives to the greatest extent possible – especially children in schools. As we vaccinate our high risk population with an ever expanding vaccination program, and as the virus spreads among the low risk group, we will reach herd immunity and the pandemic will end. The current policy looks very much like let it rip in slow motion.

Perhaps these ideas are infeasible; perhaps our current policies of lockdowns and curfews are the best we can do. Maybe ideas like mobilizing our young is impossible. Maybe it’s too late. But I challenge anyone to rigorously show that we could not have done better or that we can’t do better now.

With bold leadership and a call to civic duty, our accomplishments during WWII and at the beginning of the Cold War showed how we can do great things under enormous stress, against terrible odds, at extremely high stakes. From the Dunkirk Evacuation to the Berlin Airlift, we never accepted the answer “it’s virtually impossible”.

We’ve been told that our civic duty is to stay home, wear a mask during essential trips, and wait for a vaccine all the while we push Covid-19 onto vulnerable essential workers and their relatives.

I bet we can do better. Let’s focus our protection.

Ontario ICU arrivals and departures as a Skellam process

Figure 1: Net daily change in the Ontario ICU census with Skellam inferred rates. The red lollipop points are days of net inflow and blue lollipop points are days of net outflow. The red ribbon and blue ribbons are the inferred daily ICU arrival and departure rates from the Skellam process. Lockdown imposition and lifting dates indicated.

As Ontario enters a new lockdown, the main concern is a potential overrun of our ICU capacity. A year ago, the Ontario government announced a planned major expansion of provincial ICU capacity, but, given our current limits, it appears that either the plan never came to fruition or that the efforts woefully underestimated possible COVID-19 ICU demand. The Ontario COVID-19 data portal gives the daily ICU COVID-19 census from which we can compute daily changes. As of April 4, Ontario has 451 ICU patients associated with COVID-19. The daily changes give us the net flow but the counting does not tell us how many people actually arrived or departed each day. When we see a daily uptick of 25 people, is that 25 arrivals with no departures or is it 50 arrivals with 25 departures? Both scenarios give the same net change.

Even though the COVID-19 data shows only the net change, we can infer the arrival and departure rates using the Skellam distribution. The Skellam distribution is the difference of two Poisson distributions, k = n_1 - n_2, given by the convolution with a negative sum, namely,

(1)   \begin{equation*} s(k; \mu_1,\mu_2) = \sum_{n=-\infty}^\infty\, p(k+n; \mu_1) p(n; \mu_2),\end{equation*}

where s(k; \mu_1, \mu_2) has support on \{\ldots, -2,-1,0,1,2,\dots\}, and p(k;\mu) = \frac{\mu^k}{k!}e^{-\mu} is the Poisson. The resulting distribution can be expressed in terms of the modified Bessel function,

(2)   \begin{equation*}s(k; \mu_1, \mu_2) = e^{-(\mu_1 + \mu_2)}\left(\frac{\mu_1}{\mu_2}\right)^{k/2} I_{|k|}(2\sqrt{\mu_1\mu_2}) .\end{equation*}

I take the daily difference in the Ontario ICU census data and estimate the Skellam distribution over a 21 day sliding window. The Skellam parameters are the ribbons in figure 1, smoothed with cubic splines with cross validation, and superimposed on the daily net flow. The red ribbon is the arrival process and the blue ribbon is the departure process. Each lollipop point is the net change in the ICU occupation on that day. In figure 2 I show a Q-Q plot of the Pearson residuals from the Skellam model – the model is a reasonably good description of the data generating process. In my analysis I use the R packages Skellam and mgcv (for spline estimation). For clarity, I indicate the lockdown imposition and lockdown lift dates. As of early April 2021, it looks like we are approaching an average of 50 admissions to the ICU every day with an average departure of about 40.

Figure 2: Regression diagnostics

Causation is hard to determine in regressions such as this model, but we see that after the lockdown date on December 26, 2020, the arrival rate quickly plateaued only to begin rising again near the end of the lockdown period. It appears that the lockdown this winter had the effect of tempering arrivals into the ICU. But as always with statistics, we need to be cautious about making causal statements. The arrivals in the ICU are from people who were infected at an earlier date, probably in the neighbourhood of one week earlier. Thus, we see that the arrival rate remains high from the beginning of the lockdown on December 26 through the end of December and early January. We should expect this inertia from infections that occurred near or before the imposition of the lockdown, although there is a distinct possibility that people started to change their behaviour before the lockdown date. But notice that the arrival rate begins to rise sharply almost exactly coincidental with the end of the lockdown on February 16, 2021. This observation suggests that infections were beginning to accelerate at least a week, maybe two, prior to the end of the lockdown. It appears that we would have seen a surge at some level in Ontario’s ICUs even if the lockdown had been maintained through the end of February or longer. Of course it’s quite probable that the resulting surge would have been smaller than the one we are currently witnessing, but to gain insight into such a counterfactual we need to know what caused the acceleration in infections prior to the end of the lockdown. If infections started to rise before the end of the lockdown, is it the result of the new more contagious variants? Is it lockdown fatigue? Is it spread among essential workers? Maybe it’s a combination of effects; I don’t know. Lockdown efficacy rests on that understanding.

Notice that the arrival process turned downward, or at least plateaued, about two weeks after the lockdown date of December 26, 2020, but also notice that the daily number of departures has been steadily increasing throughout the entire winter. Of course we expect that as arrivals increase, departures will eventually increase as well – everyone who enters the ICU eventually leaves. In a steady-state queue, the average departure rate will equal the average arrival rate. With time variation in the arrival process the departure rate will lag as the arrival rate waxes and wanes. It might be instructive to think about the average server load (ICU occupation) of an M/M/c queue (exponentially distributed arrival time, exponentially distributed server times, with c servers) as an approximation to the ICU dynamics. In an M/M/c queue the average number of busy servers (ICU bed occupation) is,

(3)   \begin{equation*} \bar c = \frac{\mu}{\tau}, \end{equation*}

where \mu is the arrival rate and \tau is the average service rate. For example, given an arrival rate of 50 people per day and an average length of stay of 14 days in the ICU, the steady-state average number of people in the ICU would be 700 patients, provided the number of beds (servers) is larger than 700. I don’t know if the length of stay has been varying over time. It would be great to do a full time dependent competing risk survival analysis on that data, but the steady rise in the departure rate that we see over the winter might indicate that the length of stay in the ICU has also been changing with the arrival rate. I don’t know.

There are a number of ways to extend this work. The estimator is crude – instead of a naive 21 day sliding window, it could be built with a Bayesian filter. The Q-Q plot in figure 2 shows that the naive sliding window generates a good fit to the data but if we want to back-out robust estimates of the timing of the infections associated with the arrivals, it might be worth exploring filtering. Second, it is possible to promote the parameters of the Skellam process to functions of other observables such as positive test rates, lagged test rates, time, or any other appropriate regressor and then estimate the model through non-linear minimization. In fact, initially I approached the problem using a polynomial basis over time with regression coefficients, but the sliding window generates a better fit.

Where does this leave us? The precipitous drop in ICU arrivals throughout the summer of 2020 with low social distancing stringency and no lockdowns, along with the steady rise in arrivals beginning in the fall, suggest that Covid-19 will become part of the panoply of seasonal respiratory viruses that appear each year. Hopefully with the vaccines and natural exposure, future seasonal case loads will occur with muted numbers relative to our current experience. Given that the largest mitigating effect, next to vaccines, seems to be the arrival of spring and summer, lockdown or not, the warming weather accompanied by greater opportunities to be outdoors should have a positive effect. Covid-19 cases are plummeting across the southern US states like Texas and Alabama, yet cases are rising in colder northern states like New York and New Jersey. While increased herd immunity in the US from higher levels of vaccination and higher levels of natural exposure must be having an effect on reducing spread, it really does seem like Covid-19 behaves as a seasonal respiratory illness. Hopefully we can expect better days ahead, but, in the end, it is very hard to maintain a large susceptible population in the face of a highly contagious virus that has already become widespread. Perhaps we should have been looking for better public health policies all along.

COVID-19 death count conspiracy theories don’t add up, but won’t someone think of the children?

The Canadian media reminds us daily of the COVID-19 death count. They never fail to mention each time we reach a grim milestone. But I got to thinking, how are COVID-19 deaths really counted? What is the effect size on mortality? In some cases it might be obvious, but I am sure there are many medical cases which are difficult to label, especially since severe illness from COVID-19 strongly tracks the febrile, people with comorbidities, and people of advanced age. Are we under-counting or over-counting COVID-19 deaths? Conspiracy theories abound with wild claims about COVID-19 inflated death counts.

One way we might be able to help calibrate our counting is looking into changes in the Canadian mortality rate. Sure, labelling a COVID-19 death might be difficult, but counting deaths themselves should be much easier. Statistics Canada has several weekly mortality time series for the Canadian population. Using the website https://www.mortality.org/, I download the Statistics Canada mortality data. I have no way of determining the data accuracy, but I will assume that the data are complete enough for my modelling purposes. As always, conclusions are only as good as the data we have.

The first plot shows the raw Statistics Canada mortality rate data. The data are split across cohorts along with the total population. We see some interesting patterns. First, the mortality rate in Canada has been increasing over the last ten years. This increase is not surprising as Canada’s median age is increasing – the effect is probably due to our aging population. Second, the death rate in our oldest cohorts is decreasing. We are probably seeing healthier people reaching these cohorts each year given that smoking rates decreased over the last 30 years and we are probably also seeing the effect of better medicine. That we see declines in the mortality rate of the oldest cohorts but an increasing rate for the entire population is an example of Simpson’s Paradox. The fraction of the Canadian population entering the oldest cohorts is increasing, but we all die at some point!

Annualized Canadian mortality rates by cohort with weekly observations.

A casual glance at the plot doesn’t seem to scream a large COVID-19 effect in 2020, but we have to be careful – looks can be deceiving. We have a long term trend superimposed on a noisy periodic time series and we are trying to detect changes at the tail end of the process. A nice way to proceed is to use generalized additive models (GAM). I will be using Simon Wood’s mgcv R package.

The model I have in mind is a GAM with a cyclic cubic spline with a 52 week period, s_1, and a trend spline, s_2,

(1)   \begin{equation*}\text{rate}_i = s_1(\text{week.number}_i) + s_2(\text{time}_i) + \epsilon_i.\end{equation*}

It’s a simple model, but it should be able to pull out a signal or trend, if one exists.

Given that COVID-19 has proven more dangerous to the 75-84 year old cohort, I will start there. The second plot shows the results of the model normalized to the downward linear drift in the data. My model does well fitting the data – well enough to understand the broad features – but an examination of the residuals, while normal in their distribution, exhibit some autocorrelation. There is subtle time series structure that my model misses but for the purposes of detecting the long term trend, this model does reasonably well. We see that the cyclic part peaks during the fall/winter, and the spline trend term sharply moves up through 2020, the effect from COVID-19. The trend component also shows the years with a particularly strong flu season.

GAM: Cyclic and trend components with normalized mortality data for 75-84 year old cohort.

In the next figure I show the long term trend in the mortality rate for the 75-84 year old cohort relative to the background. We see that COVID-19 has lifted the mortality rate for this group by about 4.5%, about 4 times the effect of a bad flu year. The final figure shows the long term trend with the original data. Repeating this analysis for the entire population, we see that the over all effect of COVID-19 on the Canadian population is to lift the mortality rate by about 5% above background. We can see the effect in the next two figures.

GAM trend component relative to the background mortality rate in the 75-84 year old cohort.

This helps us answer the question about over counting. If the effect is about 5% and if it sustains for all of 2020, given that about 280,000 Canadians died in 2019, the results suggest that the annual total COVID-19 effect is 15,000 excess in Canada. While it looks like the current official number of 21,000 COVID-19 deaths might be a little bit high, we need to caution that conclusion based on substitution effects. People travelled less in 2020, so some types of accidents are probably down with deaths from COVID-19 making up part of the difference. Those types of substitution effects would mask the severity of COVID-19 in the mortality rate data. On the other hand, there are other substitution effects, such as self-harm, that have probably increased over 2020, which push up the overall mortality rate from non-COVID-19 (or at least indirect) causes. A much more detailed analysis of all the substitution effects would be required to really know just how much over or under counting is present in the COVID-19 death totals. The official count broadly concords with my simple GAM analysis on the mortality rate. My guess is that if there is bias in the official tally, it would probably lean a touch in the over counting direction, but nothing like the crazy conspiracy theories.

GAM: inferred trend in the mortality rate for the 75-84 year old cohort.
GAM: inferred trend in the mortality rate for all Canadians.

The shocking part of what the data teach us is in the 0-14 year old cohort. I repeat the same analysis and show the long term trend in the figures below. We see that in 2020, the mortality rate for children under 15 years old is up over 20%! That equates to approximately an extra 10 childhood deaths each week across Canada. We know that COVID-19 has a very low mortality rate with children, lower in fact than the seasonal flu, and yet we see a dramatic rise in childhood mortality in the data. This analysis is not causal, but if COVID-19 is an unlikely culprit, it must be some other environmental factor. We know that child abuse and domestic violence has increased as the result of stay-at-home orders. Teachers and school staff have a difficult time detecting abuse and neglect in online classes. When this pandemic ends, it will be very much worth while combing through the 0-14 year old cohort data and unpacking the effects of COVID-19 mitigation efforts on childhood well-being.

GAM trend component relative to the background mortality rate in the 0-14 year old cohort.
GAM: inferred trend in the mortality rate for children under 15 years old.

The CBC and the chattering class

A few months ago I wrote a blog post about the limits of science in forming public policy. Since then the chattering class has only gotten worse and the CBC cheerleads it all the way.

On Thursday November 26, the CBC ran an article Secret recordings reveal political directives over Alberta’s pandemic response in what can only be described as form of gutter-level journalism that will serve to damage the working relationships between Alberta’s scientific advisors, specifically Dr. Deena Hinshaw, and government decision makers in the response to COVID-19. The CBC provides a platform to University of Alberta associate law professor Ubaka Ogbogu who, quoted in the article, said “If the government is not following scientific advice, if it is not interested in measures that will effectively control a pandemic that is killing Albertans, then Hinshaw owes us the responsibility of coming out and saying, ‘They are not letting me do my job’…The focus needs to be on the disease, on how you stop it, not the economy. Nothing is more important.” On Friday the CBC followed up with a response from Dr. Hinshaw, who feels personally betrayed, and added, “I was not elected by Albertans. The final decisions are up to elected officials who were chosen by Alberta. This is how democracy works.” Dr. Hinshaw is the only professional in this sordid mess.

With their article, the CBC is appealing to the I-believe-the-science crowd, an intolerant wokeist slogan that reflects a distrust in democracy, incrementalism, consensus building, and an attitude with little recognition for the difficulties in real decision making. Increasingly, the new chattering class feels that the job of our elected officials is to hand the keys of the state over to the “experts” obediently accept “solutions” (I believe the science, after all) and leave the room. Our democracy doesn’t work that way for a reason – experts are not accountable like our elected officials. Liberal democracy may not be perfect, but it has given us a quality of life unprecedented in human history with stability and peace that is the envy of the world. Accountable legislatures are among our institutions that make it all possible. You’d think that this ground would have been covered in middle school. I hope the forthcoming investigation finds those responsible for the leak, and if they are civil servants, fire them.

As a scientist, I really detest this trend of looking to science to provide the answer to our social and moral questions. From COVID-19 to climate change, the chattering class appeals to “settled science” as a mechanism to prescribe policy solutions. Science can do no such thing. Scientific discoveries by themselves do not have moral implications. This wokester attitude of believing-the-science for public policy is not unlike the 15th century Catholic Church’s persecution of Galileo in reverse. What we discover about the world does not tell us what we should do or how we should treat our fellow human beings. When it comes to public policy, science cannot make the decision for us. In the public sphere everything is a trade-off. Those trade-off decisions create suffering and make losers out of some people regardless of which choice we make. How we balance that suffering – how we gain the acceptance of those who lose – is part of the grand bargain built into our liberal democracy. It’s how we create legitimacy. That’s not a scientific question! Believe-the-science is nothing but vacuous wokeist sloganeering.

With COVID-19, our elected officials have a monstrously difficult task. Science is an input, but it cannot offer the solution. COVID-19 mitigation strategies will cost lives no matter what we do. Our polices risk a debt crisis, falling productivity, business destruction, and a potential for the gutting of human capital formation with long term consequences. Balancing those choices and risks against protecting society from the immediate harm of COVID-19 is not obvious; it’s not as easy as professor Ogbogu would have us believe. A policy that saves as many near-term lives as possible, damn the consequences, is a moral choice, a political choice, not a scientific one. In a liberal democracy our elected representatives make decisions that best reflect our collective values. The CBC did us all a disservice by reporting these secret recordings. The CBC undermined Alberta’s efforts to build legitimacy and made Dr. Hinshaw’s job even more difficult than it already is.

Shame on the CBC.

I met Ezekiel Bulver last month

Richard Feynman once said, “I’ve concluded that it’s not a scientific world.” He observed that people often believe so many wonderful things that the real message behind the scientific method has failed to percolate through societya method of systematic doubt in which nothing is ever certain and concepts only ever reside on a graduated scale somewhere between, but never at the ends of, absolute falsity or absolute truth. He saved his worst scorn for the many scientists who, by their training, are supposed to know better. In my experience, little has changed in the nearly half century since his remarks.

I have the opportunity to share my work on COVID-19 and interact with epidemiologists from around the country and around the world, yet the most shocking part of my experience is running into Ezekiel Bulver. C.S. Lewis met him first:

...Ezekiel Bulver, whose destiny was determined at the age of five when he heard his mother say to his father—who had been maintaining that two sides of a triangle were together greater than a third— “Oh you say that because you are a man.” “At that moment”, E. Bulver assures us, “there flashed across my opening mind the great truth that refutation is no necessary part of argument. Assume that your opponent is wrong, and explain his error, and the world will be at your feet. Attempt to prove that he is wrong or (worse still) try to find out whether he is wrong or right, and the national dynamism of our age will thrust you to the wall.” That is how Bulver became one of the makers of the Twentieth Century.

Of course Bulver never existed, he is a rhetorical device created by C.S. Lewis himself, but this behaviour, bulverismthe logical fallacy of assuming someone’s argument is invalid or false at the outset, and then attempting to explain how the person became so mistaken by hypothesizing about her beliefs, psychology or motives, with no regards to the actual argument itselfhas no place in scientific discourse. Bulversim gets us nowhere. The other side could just as easily use the argument against you. Not only is bulverism disrespectful but it’s pure foolishness. C.S. Lewis again,

If you try to find out which [ideas] are tainted by speculating about the wishes of the thinkers, you are merely making a fool of yourself. You must first find out on purely logical grounds which of them do, in fact, break down as arguments. Afterwards, if you like, go on and discover the psychological causes of the error...You must show that a man is wrong before you start explaining why he is wrong.

Bulverism is antithetical to the scientific method and scientists who use it have stopped being scientists. You would’ve embarrassed yourself in front of Feynman practicing it. Arguing that papers or ideas on COVID-19 from highly respectable epidemiologists and other researchers, who argue in good faith, should be ignored at face value because of perceived ideology or inferred political beliefs is not science. There is plenty of uncertainty around COVID-19 and plenty of room for legitimate scientific disagreement. If we want to serve the public good, let’s stop inviting Mr. Bulver to the conversation.

Are lockdowns effective at stopping Covid-19?

My data science team continues to research COVID-19 propagation and measures that we can take in work environments to limit spread. We keep a sharp eye on the literature for interesting and novel statistical techniques applied to COVID-19 data and we recently came across a wonderful paper by Simon N. Wood. Readers of this blog might recognize Professor Wood’s name from a previous blog post where I promoted his book on Generalized Additive Models.

In his new paper Did COVID-19 infections decline before the UK lockdown?, Professor Wood examines the arrival dates of fatal infections across the England and Wales and determines when fatal infections peaked. He finds that fatal infections were in substantial decline at least five or six days before the lockdowns started. Furthermore, he finds that the fatal infection profile does not exhibit a regime change around the lockdown date and that the profile for England and Wales follows a similar trajectory as Sweden. The result here is important because Professor Wood focuses on the most reliably collected data – deaths due to COVID-19. Studies that focus on case counts to infer epidemiological parameters are always compromised by data that is highly truncated and censored, often in ways that are largely unknown to the researcher. While we can gain some insight from such data, results are often as informed by prior beliefs as much as by the data itself leaving us in an unfortunate position for constructing scientifically based policy.

Death data are different. In this case, the clinical data directly measure the epidemiological quantities of interest. Death data from COVID-19, while not perfect, are much better understood and recorded than other COVID-19 quantities. To understand the effect of interventions from lockdowns, what can we learn from the arrival of fatal infections without recourse to strong modelling or data assumptions? This is where Professor Wood’s paper really shines.

Before discussing Professor Wood’s paper and results, let’s take a trip down epidemiological history lane. In September 1854, London experienced an outbreak of cholera. The outbreak was confined to Broad Street, Golden Square, and adjoining streets. Dr. John Snow painstakingly collected data on infections and deaths, and carefully curated the data into geospatial representations. By examining the statistical patterns in the data, questioning outliers, and following up with contacts, Dr. Snow traced the origin of the outbreak to the Broad Street public water pump. He made the remarkable discovery that cholera propagated through a waterborne pathogen. The handle to the pump was removed on September 8, 1854, and the outbreak subsided.

But did removing the pump handle halt the cholera outbreak? As a cause and effect statement, Dr. Snow got it right, cholera transmission occurs through contaminated water, but evaluation of the time series data show that the removal of the handle of the Broad Street water pump is not conclusively linked to the cause of the outbreak subsiding. Edward Tufte has a wonderful discussion of the history of Dr Snow’s statistical work in Visual Explanations. 5th edition. Cheshire, Connecticut: Graphics Press, 1997, Chapter 2, Visual and Statistical Thinking: Displays of Evidence for Making Decisions. Let’s look at the time series of deaths in the area of London afflicted by the cholera outbreak in the plots below.

From Tufte: Visual Explanations

We clearly see that deaths were on the decline prior to the pump handle’s removal. People left the area, and people modified their behaviour. While the removal of the pump handle probably prevented future outbreaks and Dr. Snow’s analysis certainly contributed heavily to public health, it’s far from clear that the pump handle’s removal was a leading cause in bringing the Broad Street outbreak under control. Now, if we aggregate the data we can make it look like removing the pump handle was the most important effect. See the lower plot in the above figure. Tufte shows what happens if we aggregate on a weekly basis, and the confounding becomes even greater if we move the date ahead by two days to allow for the lag between infection and death. With aggregation we arrive at a very misleading picture, all an artifact of data manipulation. Satirically, Tufte imagines what our modern press would have done with Dr. Snow’s discovery and the public health intervention of removing the handle with the following graphic:

From Tufte: Visual Explanations

Fast forward to 2020 – Professor Wood is our modern day Dr. Snow. The ultimate question that Professor Wood seeks to answer is: When did the arrival of fatal infections peak? He is looking to reconstruct the time course of infections from the most reliable data sources available. We know from the COVID-19 death data that deaths in the UK eventually declined after the lockdowns came into effect (March 24, 2020) which seems to point to the effectiveness of the intervention. But an infection that leads to a fatality takes time. Professor Wood builds a model, without complex assumptions, to account for this lag and infer the infection date time series. He works with COVID-19 death data from the Office of National Statistics for England and Wales, the National Health System hospital data, and the Folkhälsomyndigheten daily death data for Sweden. In the figure below we see his main result: In the UK, COVID-19 fatal infections were in decline prior to the lockdowns, peaking 5 to 6 days earlier. The UK profile follows Sweden which did not implement a lockdown.

From Simon N. Wood: https://arxiv.org/abs/2005.02090

The technique he uses is rather ingenious. He uses penalized smoothing splines with a negative binomial counting process, while allowing for weekly periodicity. The smooth that picks up the trend in deaths is mapped back to the arrival of the fatal infections using the distribution of infection to death. Based on hospitalization data and other sources, the distribution is well described by a lognormal with a mean of 26.8 days and standard deviation of 12.4 days. The mapping matrix that uses the distribution is near singular but the smoothing penalty handles this problem.

One might be tempted to think that the time series reconstruction might be biased in the sense that an intervention will always see the peak behind the intervention date and that the distribution of time until death from a fatal infection smears the the peak backward. Thus, we might be fooled into believing that a peak with a decline through the intervention date might not be caused by the intervention when in fact the effect was generated by the intervention with a sharp discontinuity. Professor Wood model checks with simulated data in which fatal infections arrive at high rate and then plummet at a lockdown intervention. He then tests how well the method captures the extreme discontinuity. We can see that method does very well in picking up the discontinuity in the figure below.

From Simon N. Wood: https://arxiv.org/abs/2005.02090

There are issues that could undermine the conclusions and Wood expounds on them in his paper. The problem of hospital acquired infections is important. People already in the hospital are often weak and frail and thus the duration of COVID-19 until death will be shortened should they become fatally infected. Professor Wood is focusing on community transmission since it is this effect that lockdowns and social distancing targets. Hospital acquired transmissions will bias the inference, but the proportion of hospital acquired infections in the death data would have to be quite high for it to radically alter the conclusions of Wood’s results. He discusses a mixture model to help understand this effect. There are also problems concerning bias in the community acquired fatal disease duration including the possibility of age dependent effects. Again, to substantially change the conclusions, the effects would have to be large.

Professor Wood is careful to point out that his paper does not prove that peak fatal infections occurred in England and Wales prior to the lockdowns. But the results do show that in the absence of strong assumptions, the most reliable data suggest that fatal infections in England and Wales were in decline before the lockdowns came into effect with a profile similar to that of Sweden. Like Dr. Snow’s pump handle, the leading effects that caused the decline in deaths in the UK may not have been the lockdowns, but the change in behaviour that had already started by early March, well before the lockdowns.

Professor Wood’s results may have policy implications and our decision makers would be wise to include his work in their thinking. We should look to collect better data and use similar analysis to understand what the data tell us about the effectiveness of any public health initiative. At the very least, this paper weakens our belief that the blunt instrument of lockdowns is the primary mechanism by which we can control COVID-19. And given the large public health issues that lockdowns also cause – everything from increased child abuse to future cancer patients who missed routine screening to increasing economic inequality – we must understand the tradeoffs and the benefits of all potential actions to the best of our ability.

Covid-19 branching model: details and calibration to data

Last month my collaborators, Jerome Levesque and David Shaw, and I built a branching process model for describing Covid-19 propagation in communities. In my previous blog post, I gave a heuristic description of how the model works. In this post, I want to expand on some of the technical aspects of the model and show how the model can be calibrated to data.

The basic idea behind the model is that an infected person creates new infections throughout her communicable period. That is, an infected person “branches” as she generates “offspring”. This model is an approximation of how a communicable disease like Covid-19 spreads. In our model, we assume that we have an infinite reservoir of susceptible people for the virus to infect. In reality, the susceptible population declines – over periods of time that are much longer than the communicable period, the recovered population pushes back on the infection process as herd immunity builds. SIR and other compartmental models capture this effect. But over the short term, and especially when an outbreak first starts, disease propagation really does look like a branching process. The nice thing about branching processes is that they are stochastic, and have lots of amazing and deep relationships that allow you to connect observations back to the underlying mechanisms of propagation in an identifiable way.

In our model, both the number of new infections and the length of the communicable period are random. Given a communicable period, we model the number of infectious generated, Q(t), as a compound Poisson process,

(1)   \begin{equation*}Q(t) = \sum_{i=1}^{N(t)} \, Y_i,\end{equation*}

where N(t) is the number of infection events that arrived up to time t, and Y_i is the number infected at each infection event. We model Y_i with the logarithmic distribution,

(2)   \begin{equation*}\mathbb{P}(Y_i =k) = \frac{-1}{\ln(1-p)}\frac{p^k}{k}, \hspace{2em} k \in {1,2,3,\ldots}.\end{equation*}

which has mean, \mu = -\frac{p}{(1-p)\ln(1-p)}. The infection events arrive exponentially distributed in time with arrival rate \lambda. The characteristic function for Q(t) reads,

(3)   \begin{align*}\phi_{Q(t)}(u) &=\mathbb{E}[e^{iuQ(t)}] \\ &= \exp\left(rt\ln\left(\frac{1-p}{1-pe^{iu}}\right)\right) \\ &= \left(\frac{1-p}{1-pe^{iu}}\right)^{rt},\end{align*}

with \lambda = -r\ln(1-p) and thus Q(t) follows a negative binomial process,

(4)   \begin{equation*}Q(t) \sim \mathrm{NB}(rt,p).\end{equation*}

The negative binomial process is important here. Clinical observations suggest that Covid-19 is spread mostly by a minority of people in large quantities. Research suggests that the negative binomial distribution describes the number of infections from infected individuals. In our process, during a communicable period, t, an infected individual infects Q(t) people based on a draw from the negative binomial with mean rtp/(1-p). The infection events occur continuously in time according to the Poisson arrivals. However, the communicable period, t, is in actuality a random variable, T, which we model as a gamma process,

(5)   \begin{equation*}f_{T(t)}(x) = \frac{b^{at}}{\Gamma(at)} x^{at-1}e^{-b x},\end{equation*}

which has a mean of \mean{T} = at/b. By promoting the communicable period to a random variable, the negative binomial process changes into a Levy process with characteristic function,

(6)   \begin{align*}\mathbb{E}[e^{iuZ(t)}] &= \exp(-t\psi(-\eta(u))) \\ &= \left(1- \frac{r}{b}\ln\left(\frac{1-p}{1-pe^{iu}}\right)\right)^{-at},\end{align*}

where \eta(u), the Levy symbol, and \psi(s), the Laplace exponent, are respectively given by,

(7)   \begin{align*}\mathbb{E}[e^{iuQ(t)}] &= \exp(t\,\eta(u)) \\\mathbb{E}[e^{-sT(t)}] &= \exp(-t\,\psi(s)), \end{align*}

and so,

(8)   \begin{align*}\eta(u) &= r\ln\left(\frac{1-p}{1-pe^{iu}}\right), \\\psi(s) &= a\ln\left(1 + \frac{s}{b}\right).\end{align*}

Z(t) is the random number of people infected by a single infected individual over her random communicable period and is further over-dispersed relative to a pure negative binomial process, getting us closer to driving propagation through super-spreader events . The characteristic function in eq.(6) for the number of infections from a single infected person gives us the entire model. The basic reproduction number R_0 is,

(9)   \begin{align*}R_0 &= \left(\frac{at\lambda}{b}\right)\left(\frac{-p}{\ln(1-p)(1-p)}\right).\end{align*}

From the characteristic function we can compute the total number of infections in the population through renewal theory. Given a random characteristic \chi(t), such as the number of infectious individuals at time t, (e.g., \chi(t) = \mathbb{I}(t \in [0,\lambda_x)) where \lambda_x is the random communicable period) the expectation of the process follows,

(10)   \begin{equation*}\mathbb{E}(n(t)) = \mathbb{E}(\chi(t)) + \int_0^t\mathbb{E}(n(t-u))\mathbb{E}(\xi(du)).\end{equation*}

where \xi(du) is the counting process (see our paper for details). When an outbreak is underway, the asymptotic behaviour for the expected number of counts is,

(11)   \begin{equation*}\mathbb{E}(n_\infty(t)) \sim \frac{e^{\alpha t}}{\alpha\beta},\end{equation*}

where,

(12)   \begin{align*}\alpha &= \lambda\mu \left(1 - \left(\frac{b}{\alpha +b}\right)^{at}\right) \\\beta & = \frac{1}{\alpha}\left(1 - \frac{at\lambda \mu}{b}\left(\frac{b}{\alpha +b}\right)^{at+1}\right).\end{align*}

The parameter \alpha > 1 is called the Malthusian parameter and it controls the exponential growth of the process. Because the renewal equation gives us eq.(11), we can build a Bayesian hierarchical model for inference with just cumulative count data. We take US county data, curated by the New York Times, to give us an estimate of the Malthusian parameter and therefore the local R-effective across the United States. We use clinical data to set the parameters of the gamma distribution that controls the communicable period. We treat the counties as random effects and estimate the model using Gibbs sampling in JAGS. Our estimation model is,

(13)   \begin{align*}\log(n) &= \alpha t + \gamma + a_i t + g_i + \epsilon \nonumber \\a_i &\sim \text{N}(0,\sigma_1^2) \nonumber \\g_i & \sim \text{N}(0,\sigma_2^2) \nonumber \\\epsilon &\sim \text{N}(0,\sigma^2),\end{align*}

where i is the county label; the variance parameters use half-Cauchy priors and the fixed and random effects use normal priors. We estimate the model and generate posterior distributions for all parameters. The result for the United States using data over the summer is the figure below:

Summer 2020 geographical distribution of R-effective across the United States: 2020-07-01 to 2020-08-20.

Over the mid-summer, we see that the geographical distribution of R_{eff} across the US singles out the Midwestern states and Hawaii as hot-spots while Arizona sees no county with exponential growth. We have the beginnings of a US county based app which we hope to extend to other countries around the world. Unfortunately, count data on its own does not allow us to resolve the parameters of the compound Poisson process separately.

If we have complete information, which might be possible in a small community setting, like northern communities in Canada, prisons, schools, or offices, we can build a Gibbs sampler to estimate all the model parameters from data without having to rely on the asymptotic solution of the renewal equation.

Define a complete history of an outbreak as a set of N observations taking the form of a 6-tuple:

(14)   \begin{equation*}(i,j,B_{i},D_{i},m_{i},o_{i}),\end{equation*}

where,

i: index of individual, j: index of parent, B_{i}: time of birth, D_{i}: time of death, m_{i}: number of offspring birth events, o_{i}: number of offspring.

With the following summary statistics:

(15)   \begin{align*}L & = \sum_{i} D_{i} - B_{i};\,\,  \Lambda  = \prod_{i} (D_{i} - B_{i}) \nonumber \\ M & = \sum_{i} m_{i};\,\,  O = \sum_{i} o_{i} \nonumber \end{align*}

we can build a Gibbs sampler over the models parameters as follows:

(16)   \begin{align*}p\,|\,r,L,O & \sim \text{Beta}\left(a_{0} + O,b_{0} + r L\right) \nonumber \\r\,|\,p,L,M & \sim \text{Gamma}\left(\eta_{0}+M,\rho_{0}-L\log(1-p)\right) \nonumber \\b\,|\,a,L,N & \sim \text{Gamma}\left(\gamma_{0}+aN,\delta_{0}+L\right)\nonumber \\a\,|\,b,\Lambda,N & \sim\text{GammaShape}\left(\epsilon_{0}\Lambda,\zeta_{0}+N,\theta_{0}+N\right)\end{align*}

where a_0, b_0, \eta_0, \rho_0, \gamma_0, \zeta_0, \epsilon_0, \theta_0 are hyper-parameters.

Over periods of time that are comparable to the communicable window, such that increasing herd immunity effects are negligible, a pure branching process can describe the propagation of Covid-19. We have built a model that matches the features of this disease – high variance in infection counts from infected individuals with a random communicable period. We see promise in our model’s application to small population settings as an outbreak gets started.

An effective theory of Covid-19 propagation

To fight Covid-19, we need an understanding of how the virus propagates in a community. Right now, the workhorse engine in the literature and in the media are compartmental models: (S)usceptible (I)nfected (R)ecovered and its cousins. The most popular versions of these models start with a set of coupled ordinary differential equations which, when solved, generate paths for each compartment. For example, in the simple SIR model, the infection starts with a large susceptible population which diminishes as the infected population rises. The infected population eventually declines as the recovered population increases and eventually the infected population goes to zero as the outbreak ends. The differential equations govern the dynamics of each compartment, generating equations of motion for the population.

Covid-19 propagation as a gamma subordinated negative binomial branching process

SIR models work well when we are discussing large populations, when we are interested in population averages, and when the random nature of the transmission doesn’t particularly matter. The solution to the coupled set of differential equations is deterministic. But when an outbreak gets started or when we are interested in the dynamics in the small population limit, we need more than deterministic models. SIR compartmental model with its focus on averages is not enough when dealing with the very early stages of an outbreak – and it’s the early stages where we really want our mitigation strategies to be the most successful. We need a stochastic model of transmission to understand the early stages of an outbreak.

Together with my colleagues Jerome Levesque, and David Shaw, we built a branching model of Covid-19 propagation. The idea is that an infected person randomly infects other people over the course of the communicable period. That is, we model transmission by imagining that an infected person generates “offspring”, continuous in time, during the communicable period and that each “child” has the same statistical law for generate more “offspring”. The infections branch out from each infected person into a tree that makes up the infected community. So while on average an infected person will infect R0 other people (the basic reproduction number) during the communicable period, there are a range of possible outcomes. We could get lucky and an initially infected person might not spread the virus at all, or we could get unlucky and the initially infected person might become a super-spreader, all in a model with the same R0. In fact, even with R0>1, there can be a substantial probability that the outbreak will go extinct on its own, all depending on the statistical branching nature of transmission.

In some research communities, there is a tendency to use agent based models to capture the stochastic nature of an outbreak. Such models simulate the behaviour of many different individuals or “agents” in a population by assigning a complicated set of transmission rules to each person. In the quest for high fidelity, agent based models tend to have lots of parameters. While agent based approaches have merit, and they have enjoyed success in many fields, we feel that in this context these models are often too difficult to interpret, contain many layers of hidden assumptions, are extraordinarily difficult to calibrate to data while containing lots of identifiability issues, are easily susceptible to chaotic outputs, and obscure trade-off analysis for decision makers. In a decision making context we need a parsimonious model, one that gets the essentials correct and generates insight for trade-off analysis that decision makers can use. We need an effective theory of Covid-19 propagation in which we project all the complicated degrees of freedom of the real world down to a limited number of free parameters around which we can build statistical estimators.

The abstract of our paper:

We build a parsimonious Crump-Mode-Jagers continuous time branching process of Covid-19 propagation based on a negative binomial process subordinated by a gamma subordinator. By focusing on the stochastic nature of the process in small populations, our model provides decision making insight into mitigation strategies as an outbreak begins. Our model accommodates contact tracing and isolation, allowing for comparisons between different types of intervention. We emphasize a physical interpretation of the disease propagation throughout which affords analytical results for comparison to simulations. Our model provides a basis for decision makers to understand the likely trade-offs and consequences between alternative outbreak mitigation strategies particularly in office environments and confined work-spaces.

We focus on two parameters that decision makers can use to set policy: the average waiting time between infectious events from an infected individual, and the average number of people infected at an event. We fix the communicable period (distribution) from clinical data. Those two parameters go into the probabilistic model for branching the infection through the population. The decision maker can weigh trade-offs like restricting meeting sizes and interaction rates in the office while examining the extinction probabilities, growth rates, and size distributions for each choice.

You can find our paper here: https://medrxiv.org/cgi/content/short/2020.07.08.20149039v1

Covid-19 and decisions under uncertainty

Three excellent essays recently appeared in the Boston Review by Jonathan Fuller, John Ioannidis, and Marc Lipsitch on the nature of epidemiology, and the use of data in making public health decisions. Each essay makes great points, especially Professor Ioannidis’ emphasis that Covid-19 public health decisions constitute trade-offs – other people will die based on our decisions to mitigate Covid-19. But I think all of all three essays miss the essential question, which transcends Covid-19 or even public health:

What is the optimal way to make irreversible decisions under uncertainty?

The answer to this question is subtle because it involves three competing elements: time, uncertainty, and irreversibility. In a decision making process, time gives us the opportunity to learn more about the problem and remove some of the uncertainty, but it’s irreversibility that makes the problem truly difficult. Most important problems tend to have a component of irreversibility. Once we make the decision, there is no going back or it is prohibitively expensive to do so, and our opportunity to learn more about the problem is over.

Irreversibility coupled with uncertainty and time means there is value in waiting. By acting too soon, we lose the opportunity to make a better decision, and by waiting too long, we miss the opportunity altogether. Waiting and learning incurs a cost, but that cost is often more than offset by the chance to make a better and more informed decision later. The value of waiting in finance is part of option pricing and that value radically changes optimal decision making. There is an enormous amount of research on option valuation with irreversible decisions. The book Investment Under Uncertainty by Avinash Dixit and Robert Pindyck provides a wonderful introduction to the literature. When faced with an irreversible decision, the option value can be huge, even dwarfing the payoff from immediate action. At times, learning is the most valuable thing we can do. But for the option to have value, we must have time to wait. In now-or-never situations, the option loses its value completely simply because we have no opportunity to learn. The take-a-way message is this: the more irreversible the decision, the more time you have, and the more uncertainty you face, the larger the option value. The option value increases along each of these dimensions, thereby increasing the value of waiting and learning.

Covid-19 has all of these ingredients – time, uncertainty, and irreversibility. Irreversibility appears through too many deaths if we wait too long, and economic destruction requiring decades of recovery if we are too aggressive in mitigation (while opening up global financial risks). There is a ton of uncertainty surrounding Covid-19 with varying degrees of limited time windows in which to act.

Those who call for immediate and strong Covid-19 mitigation strategies recognize irreversibility – we need to save lives while accepting large economic costs – and that even though we face enormous uncertainty, the costs incurred from waiting are so high compared to immediate action that the situation is ultimately now-or-never. There is no option value. Those who call for a more cautious and nuanced approach also see the irreversibility but feel that while the costs from learning are high and time is short, the option value is rescued by the enormous uncertainty. With high uncertainty, it can be worth a lot to learn even a little. Using the lens of option valuation, read these two articles by Professor Ioannidis and Professor Lipsitch from this March and you can see that the authors are actually arguing over the competing contributions of limited time and high uncertainty to an option’s value in an irreversible environment. They disagree on the value of information given the amount of time to act.

So who’s right? In a sense, both. We are not facing one uncertain irreversible decision; we face a sequence of them. When confronted by a new serious set of problems, like a pandemic, it can be sensible to down-weight the time you have and down-weight the uncertainty (by assuming the worst) at the first stage. Both effects drive the option value to zero – you put yourself in the now-or-never condition and you act. But for the next decision, and the next one after that, with decreasing uncertainty over time, you change course, and you use information differently by recognizing the chain of decisions to come. Johan Giesecke makes a compelling argument about the need for a course change with Covid-19 by thinking along these lines.

While option valuation can help us understand the ingredients that contribute to waiting, the uncertainty must be evaluated over some probability measure, and that measure determines how we weigh consequences. There is no objectively correct answer here. How do we evaluate the expected trade-off between excess Covid-19 deaths among the elderly vs a lifetime of lost opportunities for young people? How much extra child abuse is worth the benefit of lockdowns? That weighing of the complete set of consequences is part of the totality of evidence that Professor Ioannidis emphasizes in his essay.

Not only does time, uncertainty, and irreversibility drive the option value, but so does the probability measure. How we measure consequences is a value judgment, and in a democracy that mesaure must rest with our elected officials. It’s here that I fundamentally disagree with Professor Lipsitch. In his essay, he increasingly frames the philosophy of public health action in terms of purely scientific questions. But public action, the decision to make one kind of costly trade-offs against another – and it’s always about trade-offs – is a deeply political issue. In WWII, President Truman made the decision to drop the bomb, not his generals. Science can offer the likely consequences of alternative courses of public health actions, but it is largely silent on how society should weigh them. No expert, public health official, or famous epidemiologist has special insight into our collective value judgment.