# Goodbye 400 ppm: The statistics of the Keeling Curve

## There are two iconic graphs of climate science. The first of these is the “hockey-stick” curve of temperature since 1000 CE, originally published by Michael E. Mann and his colleagues in 1998. Despite decades of controversy, the general picture of an anomalously fast rise in temperature beginning in the 20th century still stands. The second iconic graph is that of atmospheric carbon dioxide measured at Mauna Loa since 1958 by the Scripps Institution of Oceanography, under the direction of Charles David Keeling (Keeling passed away in 2005, he was succeeded by his son Ralph Keeling). When Keeling first measured CO2, the December 1958 value was 314.7 parts per million (ppm); sixty years later, the value has reached 408.9 ppm. This more than 25% increase is the most direct and incontrovertible evidence we have of recent human impact on the chemistry of the atmosphere. We long ago passed 400 ppm and there is no looking back.

I teach an introductory data analysis class for Earth and environmental sciences. The more than 700 monthly values of Keeling Curve are not only highly relevant, it is also an excellent data set for teaching some basics of time series analysis. As my late advisor Jack Sepkoski taught me, the first step in any statistical study is to draw a picture, in this case a time-series plot (Figure 1). At first glance the curve seems to show a nearly linear increase in CO2 over time, so the naïve next step would be to do a linear regression of carbon dioxide on time, where we model the relationship between the two as a straight line.

The regression (Figure 2) yields a correlation coefficient on 0.989, yielding an R2 of 0.977 (a measure of how well the points fit the line); which seems quite high and is indeed highly significant. But a closer look at the plot of the linear regression line makes it obvious that the best fit to the curve is not a straight line; the beginning and end of the curve are above the line and the middle is below it. This is supported by looking at the residuals (Figure 3); i.e., the vertical deviations of the values from the straight line.

This strongly suggests that we should try a more complex polynomial fit. Figure 4 shows the result of fitting a second-degree polynomial line (one of the model terms is squared). The fit is clearly better; this is confirmed by the higher R2 value of 0.993 and therefore much lower residuals. Note: all analyses here are done with the free software package PAST (http://folk.uio.no/ohammer/past/)

Of course, we could theoretically fit a highly complex polynomial that could fit a line through every point, so a high R2 value can be misleading. Statisticians, therefore, have developed metrics that balance the goodness-of-fit of data to a numerical model with the complexity of the model. One of best known of these is the Akaike information criterion (AIC). Without going into detail, when comparing two models, one with a lower value of AIC is better. In the case of the Keeling curve, the AIC is 12397 for the linear model, whereas for the two-degree polynomial it is 3553. The latter is thus indeed the superior model. This has important consequences for the science; not only is CO2 increasing, the rate of increase is itself increasing.

Again looking at Figure 1, you may notice repeated small dips in the curve. To get a better understanding of these, we need to remove the overwhelming influence of the long-term upward trend. One way to do this is take the first differences of values; that is, subtract each value from the one before. We are thus looking at month-to-month changes in carbon dioxide. Figure 5 shows part of the differenced sequence (from 1970–1980). It shows two distinct features. The first of these is the large cycle with a wavelength of one year; on top of these is a smaller dip , also annual in spacing and about six months long. It was Keeling who first explained these patterns. They represent warm season plant growth, which draws down CO2. The larger amplitude cycle is for the Northern Hemisphere, which has far more land area than the Southern Hemisphere, which produces the smaller dip six months later. We are seeing the seasonal breathing of the biosphere!

The differenced data set can also be used to demonstrate two fundamental techniques for time-series analysis: autocorrelation and Fourier (spectral analysis). The first of these compares the time series to an identical copy of itself, but one that is shifted (lagged). A correlation is then calculated between the series and its shifted copy. If the correlation is high, then the series has a repeat of its pattern at the amount of shift. Figure 6 shows a plot (autocorrelogram) of correlation versus lag for up to 50 months. Notice that the correlation is high and positive every 12 months; this is the annual cycle for Northern Hemisphere. There are also high negative correlations every 12 months, but six months delayed from the positive correlations. Note that the absolute values of the negative correlations are less that that positive correlations; this is again due to the growth of vegetation in the Southern Hemisphere summer.

Fourier (spectral) analysis replaces the lags with sinusoidal curves (e.g. sine curves). It asks what are the amplitudes and frequencies of sinusoidal curves that best model the data. Figure 7shows a plot of a measure of the amplitudes of these curves versus frequency (a periodogram). There are two strong peaks: one of these has a frequency of about 0.08 (wavelength = 12 months), representing the annual cycle of the Northern Hemisphere; the smaller peak at the lower frequency of about 0.17 (wavelength = 6 months) are the smaller superimposed fluctuations due to the Southern Hemisphere.

This is not meant to be a primer on time series analysis or the global carbon cycle ; there are multiple large and often highly technical textbooks on both subjects. The goal, instead, is to show how a detailed and painstakingly collected dataset can be used to both illustrate some basic statistical methods and, more importantly, how the results of these analyses can be used to document some key facts about our climate system. Climate change is not a myth, it is a well-documented and strongly supported fact. Our response is lagged and urgent.