Become a GAMM-ateur climate scientist with mgcv

I love tennis. I play tennis incessantly. I follow it like a maniac. This January, my wife and I attended the Australian Open, and then after the tournament we played tennis every day for hours in the awesome Australian summer heat. During a water break one afternoon, I checked the weather app on my phone; the mercury reached 44 C!

The Aussie Open 2019: Rafael Nadal prepares to serve in the summer heat.

It got me to thinking about climate change and one of the gems in my library, Generalized Additive Models: An introduction with R by Professor Simon N. Wood – he is also the author of the R package mgcv (Mixed GAM Computation Vehicle with Automatic Smoothness Estimation).

First, Wood’s book on generalized additive models is a fantastic read and I highly recommend it to all data scientists – especially for data scientists in government who are helping to shape evidence based policy. In the preface the author says:

“Life is too short to spend too much time reading statistical texts. This book is of course an exception to this rule and should be read cover to cover.”

I couldn’t agree more. There are many wonderful discussions and examples in this book with breadcrumbs into really deep waters, like the theory of soap film smoothing. Pick it up if you are looking for a nice self-contained treatment of generalized additive models, smoothing, and mixed modelling. One of the examples that Wood works through is the application of generalized additive mixed modelling to daily average temperatures in Cairo Egypt (section 7.7.2 of his book). I want to expand on that discussion a bit in this post.

Sometimes we hear complaints that climate change isn’t real, that there’s just too much variation to reveal any signal. Let’s see what a bit of generalized additive modelling can do for us.

A generalized linear mixed modelling (GLMM) takes the standard form:

    \begin{align*}\boldsymbol{\mu}^b &= \mathbb{E}({\bf y}\mid{\bf b}), \\ g(\mu_i^b) &= {\bf X}_i\boldsymbol{\beta}+ {\bf Z}_i{\bf b}, \\ {\bf b} &\sim N({\bf 0}, {\boldsymbol{\psi}}_\theta), \\ y_i\mid{\bf b} &\sim \text{exponential family dist.,}\end{align}

where g is a monotonic link function, {\bf b} contains the random effects with zero expected value and with a covariance matrix {\boldsymbol{\psi}}_\theta parameterized by \theta. A generalized additive model uses this structure, but the design matrix {\bf X} is built from spline evaluations with a “wiggliness” penalty, not on the regressors directly (coefficients correspond to the coefficients of the spline). For details, see Generalized Additive Models: An Introduction with R, Second Edition.

The University of Dayton has a website with daily average temperatures from a number of different cities across the world. Let’s take a look at Melbourne, Australia – the host city of the Australian Open. The raw data has untidy bits, and in my R Markdown file I show my code and the clean up choices that I made.

The idea is to build an additive mixed model with temporal correlations. Wood’s mgcv package allows us to build rather complicated models quite easily. For details on the theory and the implementations mgcv, I encourage you to read Wood’s book. The model I’m electing to use is:

    \begin{equation*} \text{temp}_i = s_1(\text{time.of.year}_i) + s_2(\text{time}_i) + e_i,\end{equation}


e_i = \phi_1 e_{i-1} + \phi_2 e_{i-2}+ \epsilon_i, \epsilon_i \sim N(0,\sigma^2), s_1(\cdot) is a cyclic cubic smoothing spline that captures seasonal temperature variation on a 365 day cycle, and s_2(\cdot) is a smoothing spline that tracks a temperature trend, if any. I’m not an expert in modelling climate change, but this type of model seems reasonable – we have a seasonal component, a component that captures daily autocorrelations in temperature through an AR(2) process, and a possible trend component if it exists. To speed up the estimation, I nest the AR(2) residual component within year.

The raw temperature data for Melbourne, Australia is:

Daily mean temperature in Melbourne: 1995 – 2019.

We see a clear season pattern in the data, but there is also a lot of noise. The GAMM model will reveal the presence of a trend:

Climate change trend in Melbourne: 1995 – 2019.

We can see that Melbourne has warmed over the last two decades (by almost 2 C). Using the Dayton temperature dataset, I created a website based on the same model that shows temperature trends across about 200 different cities. Ottawa, Canada (Canada’s capital city) is included among the list of cities and we can see that the temperature trend in Ottawa is a bit wonky. We’ve had some cold winters in the last five years and while the Dayton data for Ottawa is truncated at 2014, I’m sure the winter of 2018-2019 with its hard cold spells would also show up in the trend. This is why the phenomenon is called climate change – the effect is, and will continue to be, uneven across the planet. If you like, compare different cities around the world using my website.

As a point of caution, climate change activists should temper their predictions about how exactly climate change will affect local conditions. I recall that in 2013 David Suzuki wrote about what climate change could mean for Ottawa, saying

…one of Canada’s best-loved outdoor skating venues, Ottawa’s Rideau Canal, provides an example of what to expect…with current emissions trends, the canal’s skating season could shrink from the previous average of nine weeks to 6.5 weeks by 2020, less than six weeks by 2050 and just one week by the end of the century. In fact, two winters ago, the season lasted 7.5 weeks, and last year it was down to four. The canal had yet to fully open for skating when this column was written [January 22, 2013].

The year after David Suzuki wrote this article, the Rideau Skateway enjoyed the longest consecutive days of skating in its history and nearly one of the longest seasons on record. This year (2019) has been another fantastic skating season, lasting 71 days (with a crazy cold winter). My GAMM analysis of Ottawa’s daily average temperature shows just how wild local trends can be. Unfortunately, statements like the one David Suzuki made fuels climate change skeptics. Some people will point to his bold predictions for 2020, see the actual results, and then dismiss climate change altogether. I doubt that David Suzuki intends that kind of advocacy! Climate change is complicated, not every place on the planet will see warming and certainly not evenly. And if the jet stream becomes unstable during the North American winter, climate change may bring bitterly cold winters to eastern Canada on a regular basis – all while the Arctic warms and melts. There are complicated feedback mechanisms at play; so persuading people about the phenomenon of climate change with facts instead of cavalier predictions is probably the best strategy.

Now, establishing that climate change is real and persuading people of its existence is only one issue – what to do about it is an entirely different matter. We can agree that climate change is real and mostly anthropogenic, but it does not imply that the climate lobby’s policy agenda inexorably follows. Given the expected impact of climate change on the global economy and how to think about its economic consequences in a world of scarce resources, we should seek the best evidence based policy solutions available, see for example:

Let’s use the best evidence, both from climate science and economics, as our guide for policy in an uncertain future.

7 thoughts on “Become a GAMM-ateur climate scientist with mgcv”

  1. Excellent work, Dave — and an enjoyable read! If you’re interested in acquiring more (Canadian) data to run a GAMM on a more expanded temperature dataset for Canadian cities, I’d encourage the use of the `weathercan` R package (, from which you can pull all of ECCC’s weather data.

    1. Thanks, Raman. Some cool packages – really nice to see. Simon Wood in his book uses Canadian weather data to present functional data analysis. Apparently, Canadian weather data sets provide the canonical example for such techniques!

  2. Fantastic post – I’ve been enjoying your entertaining educational data science stories so please keep posting them.
    I’m actually a Federal Public Servant as well; an Enterprise Architect getting into the Data Science space so looking for opportunities to tie in theory with the real world.

    1. Thanks Steven,

      Data Science is very much a team sport and having enterprise architecture/database skills can add a lot of value to data science teams. I glad you enjoy the posts!

Leave a Reply

Your email address will not be published.