Probabilistic Forecasting of NIÑO3 Using Statistical Models

contributed by Simon J. Mason

Scripps Institution of Oceanography, La Jolla, California



Forecasts of monthly NIÑO3 sea surface temperature anomalies with lead-times of up to 11 months were produced using predictive discriminant analysis, canonical variate analysis, and 4 forms of generalized linear models. Full details of the models have been submitted to Journal of Climate (Mason and Mimmack 2001). The forecast presented here represents an average of the forecast probabilities from these 6 statistical models. The first five unrotated principal components of gridded monthly sea surface temperatures over the tropical Pacific (25 N 25 S, 110 E 70) were used as the only predictors. Skillful forecasts of NIÑO3 sea surface temperature anomalies can be developed relatively simply using only prior temperatures in the region as predictors (Barnston and Ropelewski 1992; Penland and Sardeshmukh 1995; Latif et al. 1998).

The optimal combination of predictors for the discriminant analysis and logistic regression models was identified using a procedure that is similar to the "maximum-posterior-probability/leave-one-out" method of variable selection advocated by Huberty (1994). Model parameters were estimated using all possible combinations of one or two variables (from the five available principal components), and the set of predictors that provided the best cross-validated fit over the training periods was selected. The cross-validation window was defined as 5 years to ensure that the categories remain equi-probable. The goodness of fit was measured by calculating the ranked probability score (Epstein 1969b; Daan 1985; Wilks 1995) over the training period.

For the canonical variate analysis model, all five predictors were included, but the number of retained canonical variates was varied. The selection criteria for the canonical variate analysis model therefore differ slightly from those for the other models. A total of ng-1 canonical variates will explain all the between-group variance, and so the maximum number of canonical variates was four, but to maintai n some form of similarity to the other models, the maximum number of canonical variates retained was restricted to two. The first variate was always retained, while the second variate was included only if doing so resulted in an improvement in the ranked probability score.

Careful assessments of the operational levels of forecast skill have been made by using a retroactive forecast procedure: the models were trained over a 30-yr training period, and then used to produce 20 years of retroactive forecasts of monthly NIÑO3 sea surface temperature anomaly categories, with the models being updated every five years. Five categories of anomalies were defined ranging from "La Niña", through "cool", "normal", "warm", to "El Niño". Probabilities for each of the categories over the 20-yr retroactive period January 1981 to December 2000 were calculated. The training period was initially set as 30 years (1951 80), and retroactive predictions for the following five years were then made using the optimal model. After this 5-yr period the model was retrained over the period 1951 85, possibly selecting different variables and a different number of retained variables, and predictions for 1986 90 were made. This procedure was repeated until a set of 20 years of retroactive predictions had been made. At each stage, the definitions of the five categories were reset to ensure that the categories remained equi-probable a priori. While the categories are defined as equi-probable over the training periods, this is not necessarily the cases for the verifications over the independent period. For 1981- 85, the verifications were categorized on the basis of the 1951-80 training period; the verifications for 1986-90 were categorized on the basis of the 1951-85 training period, etc. For the forecasts presented here, the training period was 1951-2000, and anomalies are defined with reference to this same period.

A combined forecast was calculated by averaging the forecast probabilities from the various models. No attempt was made to weight the probabilities from the different models by a measure of model skill, since ranked model performance is sensitive to the precise skill measure used, and can be conditional upon the actual outcome. Reliability diagrams for the combined forecasts were constructed, as shown in Fig. 1. Good reliability is demonstrated for forecasts of all categories except "normal".

Ranked probability skill scores (RPSS's) were calculated for each month separately, using the combined forecast probabilities, and comparing the forecasts to a strategy of climatology. The scores for six months are shown in Fig. 2, where they are compared to the skill of persistence forecasts. The seasonal dependence of skill is clearly apparent for both the model and the persistence forecasts. For forecasts of NIÑO3 at any time of year, skill drops rapidly for forecasts extending through about April and May. Forecasts for January and March are therefore skillful at long lead-times (top panels), whereas forecasts for May and July are skillful only at short lead-times (middle panels). Although detailed comparisons of the skill scores should be avoided because of the small number of forecasts involved, the model forecasts appear to outscore persistence at all times of year for lead -times longer than about only 1 month. The strength of the model forecasts is most clearly evident during March. At 0-month lead, the skill of the model and persistence forecasts is about equal, except in March (when the model forecasts are superior), and November (when persistence forecasts are better). Thus forecasts of NIÑO3 anomalies made from May show marginal positive skill out to March of the following year.

References:

Barnston, A. G., and C. F. Ropelewski, 1992: Prediction of ENSO using canonical correlation analysis. J. Climate, 5, 1316 1345.

Daan, H., 1985: Sensitivity of verification scores to the classification of the predictand. Mon. Wea. Rev., 113, 1384-1392.

Epstein, E. S., 1969b: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8, 985-987.

Huberty, C. J., 1994: Applied Discriminant Analysis. Wiley, 466 pp.

Mason, S. J., and G. M. Mimmack, 2001: Comparison of some statistical methods of probabilistic forecasting of ENSO. J. Climate, submitted.

Penland, C., and P. D. Sardeshmukh, 1995: The optimal growth of tropical sea surface temperature anomalies. J. Climate, 8, 1999-2024.

Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.

The forecast probabilities averaged across the six models are presented in the table below. Probabilities are highest for the "normal" category to the end of 2001, and there are indications of a beginning of a warm/El Niño episode in the second quarter of 2002.
Month Lead-time "La Niña" "Cool" "Normal" "Warm" "El Niño"
Jun 2001 0 0.290 0.364 0.233 0.101 0.012
Jul 2001 1 0.146 0.218 0.357 0.186 0.092
Aug 2001 2 0.146 0.179 0.302 0.219 0.155
Sep 2001 3 0.110 0.244 0.298 0.209 0.139
Oct 2001 4 0.122 0.260 0.284 0.201 0.134
Nov 2001 5 0.101 0.231 0.296 0.224 0.149
Dec 2001 6 0.122 0.242 0.273 0.271 0.091
Jan 2002 7 0.221 0.261 0.209 0.198 0.110
Feb 2002 8 0.390 0.158 0.152 0.199 0.102
Mar 2002 9 0.319 0.198 0.140 0.211 0.132
Apr 2002 10 0.242 0.139 0.130 0.218 0.272
May 2002 11 0.122 0.219 0.132 0.216 0.312



Fig. 1.Reliability diagram for retroactive combined forecasts at increasing lead-times of "La Niña" (solid thin line), "cool" (dashed thin line), "normal" (dotted line), "warm" (dashed thick line), and "El Niño" (solid thick line) conditions for the 20-year period January 1981-December 2000. Forecasts at all lead-times and for all months are pooled. The histograms indicate the frequency of forecasts with probabilities in the ranges 0.0 0.05, 0.05 0.15, 0.15 0.25, , 0.95 1.0. The y-axes range to 1700. The top histogram is for "El Niño" conditions, the second top for "warm" conditions etc..

Fig. 2.Ranked probability skill scores for retroactive combined forecasts at increasing lead-times of monthly NIÑO3 sea surface temperature anomaly categories for the 20-year period January 1981-December 2000. The skill scores are calculated with reference to a strategy of forecasting climatology. The black bars represent the scores for the models, and the gray bars are for forecasts of persisted anomaly categories. The light gray bands indicate the April May period, which approximates the "spring barrier" in predictability.