Statistical Prediction of January - June 2001 Rainfall in Northeast

Brazil Using Input From Multiple Regression and Discriminant Analysis

contributed by A. Colman, M. Davey

Met Office, Bracknell, United Kingdom

1. Introduction

Statistical real-time forecasts of NE Brazil rainfall have been issued by the Met Office following work by Ward & Folland (1991). Previous forecasts (eg. Colman, 1998) have been made for rainfall between February and May using November-January Sea Surface Temperature Anomalies (SSTA). This is the time of year when predictability is at its maximum. Recent research has shown that the lead time of NE Brazil forecasts can be increased. Predictability from November SSTA is lower than from later SSTA but still significant so this year we are again issuing a long lead forecast based on November SSTA.

For this forecast the definition of NE Brazil is the mean rainfall for four gridpoints located at 41.25W,5S, 41.25W,7.5S, 37.5W,5S and 37.5W, 7.5S (fig. 1). Gridpoint values are calculated by interpolating observations from neighboring stations and then averaged to produce the NE Brazil rainfall index. These gridded values of NE Brazil rainfall have been adopted to enable easy comparisons with GCM (Global Circulation Model) forecasts made for this grid. In the future, this will enable these empirical forecasts to be combined with GCM forecasts. The wet season in this region peaks in March and April but substantial rainfall occurs between January and June. Hindcast experiments show the extended (January-June) season to be slightly more predictable than shorter periods so this forecast is for January-June.

The amplitude time series of two SST anomaly patterns have been found to be skilful predictors of NE Brazil rainfall. The patterns are (1) the 30N-30S portion of the third covariance-based EOF of Atlantic SST for all seasons (fig. 2a), and (2) the first Empirical Orthogonal Function (EOF) of Pacific SST for Dec-Jan-Feb (fig. 2b). The amplitude time series of each of these 2 patterns (fig. 3) are used to predict North Nordeste rainfall with multiple regression (giving a point forecast) and discriminant analysis (giving probabilities for each of five climatologically equiprobable [for 1961-1990] rainfall amount categories). These two predictors have been used to make empirical regular real-time predictions of NE Brazil seasonal rainfall since 1988, some of which appear in previous issues of this bulletin, eg. Colman et al.,(1996, 1998). The Atlantic EOF pattern reflects the SST anomaly immediately off the North Nordeste east coast and the large scale north-south SST gradient structure, while the Pacific EOF pattern provides an index of the ENSO situation.

Details about the EOF analyses, the physical relevance of the predictors, and the two forecasting methods are given in Ward and Folland (1991). Multiple regression develops optimal weights for each predictor in order that the resulting linear equation minimizes squared errors between forecasts and corresponding observations over the training periods (1912-2000, 1946-2000). In discriminant analysis, 5 categories of rainfall amount are defined which were equiprobable over 1961-1990 and from given values of the predictors, probabilities of each of the rainfall categories are determined using Bayes' theorem. Less linear constraint is imposed here than in multiple regression, as the probabilities do not necessarily change smoothly as a function of category.

2. Forecast Skill

Forecast skill is estimated from the performance of multiple regression and discriminant analysis hindcasts. The 1999 forecast (Colman & Davey, 1998) included evaluations of an extensive series of hindcasts providing a rationale for using November SST to predict NE Brazil rainfall. This year, a smaller number of hindcasts using data up to 1999 have been evaluated. Multiple regression and discriminant analysis Jackknife hindcasts based on the SST for November have been made for the period 1951-2000 (Tables 1 and 2). Jackknife prediction equations are calculated using data for the whole period minus the predicted year and the two years subsequent to the predicted year. The two subsequent years are excluded to remove positive skill bias due to persistence. Hindcasts were also made for the 1981-2000 period using data from 1912-80 (Table 3). There is an overlap of the SST EOF analysis period (1901-80) and the first evaluation period (1951-2000) but the second evaluation period is completely independent of the SST EOF analysis.

The regression forecasts were assessed by comparing the point estimate rainfall predicted by the regression with the observed value and the category within which the regression forecast lies with the observed category. The point estimates were assessed using: (a) correlations between forecast and observed series, (b) LEPS (Linear Estimation in Probability space) which is a measure of how close the forecasts are to observations in terms of the probability density function of the observations (Potts et al, 1996) and (c) RMSSS (Root Mean Square error Skill Scores) which is a score comparing the RMS of forecasts with persistence.

RMSSS= 1-(RMS(forecast)/RMS(persistence))

The Categorical forecasts are assessed using hit rates, where hit rate = number of times the correct category is forecast/ the total number of forecasts). For these assessments, the observations for 1961-1990 are divided into 5 equal sized categories (quints), 3 equal sized categories (terces) and 2 equal sized categories (sign) and the forecasts are allocated into these categories. The discriminant analysis forecast skill was assessed by comparing the observed category with the most likely category according to the hindcasts over the period 1951-2000 (Table 2b).

The skill measures and the contingency tables show significant (5% level) prediction skill, particularly for the extreme (very dry and very wet) categories and no depreciation in skill for the fully independent years (1981- 2000). These hindcasts show that there is skill from November SSTA but investigations (Colman & Davey, 1998) show there is no benefit from using SSTA from previous months so predictor time series are calculated using November SSTA alone.

Table 1

Jack-knife assessments of regression forecasts 1951-2000.

SSTA

period

Rainfall

period

Corr. LEPS Hit Quints rate

Terces



Sign

%RMSSS

chance

perfection

0.000

1.000

0.000

1.000

0.200

1.000

0.333

1.000

0.500

1.000

0.000

100.00

Nov Jan-June 0.580 0.410 0.380 0.600 0.660 24.169

Table 2.

Contingency table showing performance of a) Regression and b) Discriminant analysis hindcasts of Jan-June rainfall categories during 1951-2000. Q1=very dry, Q2=dry, Q3=average, Q4=wet,Q5=very wet.

a) OBSERVED
Q1 Q2 Q3 Q4 Q5
Q1 7 4 0 2 0
HINDCAST Q2 1 1 4 0 4
Q3 2 3 6 4 1
Q4 2 2 2 1 1
Q5 0 0 1 2 4

b) OBSERVED
Q1 Q2 Q3 Q4 Q5
Q1 6 5 3 2 0
HINDCAST Q2 2 0 0 0 0
Q3 4 5 8 5 2
Q4 0 0 1 0 0
Q5 0 0 1 2 4

Table 3

Verification of hindcasts made using multiple regression against observed (obs.) values 1981-2000

Rainfall

period

Corr. LEPS Hit

Quints

rate

Terces

Sign %RMSSS Bias

(SU)

Chance 0.000 0.000 0.200 0.333 0.500 0.000 0.0
Jan-June

Mar-April

0.685

0.588

0.484

0.344

0.550

0.400

0.700

0.500

0.650

0.550

33.734

22.086

0.248

0.028

Jan-June

persistence

0.264 0.112 0.250 0.450 0.700 0.000 -0.030

SU=Standardized units

Predictability is quite high for the January-June (extended season) rainfall total and for March-April (peak season) rainfall but much lower for early and late season (Jan-Feb and May-June) rainfall (not shown) so our forecasts are for January-June and March-April.

3. The 2001 Forecast

3.1. State of predictors:

At the time of forecast, the SSTA in the tropical Pacific and Atlantic which are related to NE Brazil rainfall are quite weak favoring a near average forecast.

Figures 3a and b show the monthly time series of the Atlantic and Pacific SST anomaly EOF predictors used in the regression and discriminant analysis prediction models.

An example multiple regression equation of the 1-month lead forecast for the January-June standardized rainfall index from standardized SST predictors based on 1912-1999 data is:

0.060 - 0.232 * P - 0.393 * A

where A = Atlantic , P=Pacific EOF time coefficients.

3.2. Empirical Forecasts

For the linear regression predictions, the average of predictions made using training periods 1912-2000 and 1946-2000 is calculated:
Rainfall

Period

Forecast Quint Range Standard Error
Standardized Units Quint Standardized Units
Jan-June +0.385 3 (average) 0.21 TO .43 0.75
March-April +0.347 3 (average) -.18 TO .50 0.75

The following forecast probabilities produced by discriminant analysis show the probability of the 2000 January-June index being in each of the quintiles:

Very Dry Dry Average Wet Very Wet
Jan-June .12 .16 .38 .23 .11
Mar-April .07 .33 .27 .14 .18
Chance .20 .20 .20 .20 .20

3.3. Summary:

The linear regression forecasts favor the AVERAGE category for January-June and March-April. The discriminant forecasts show highest probability for the AVERAGE category for January-June and above chance probability for the AVERAGE category for March-April. Hence our best estimate is for the AVERAGE category for January-June and March-April.

References:

Colman, A.W, Davey, M., Harrison, M., and Richardson, D. 1996: Multiple regression, discriminant analysis and unevaluated AGCM Predictions of Mar-Apr-May 1996 Rainfall in Northeast Brazil. Exp. Long-lead Bull. Vol 5 no.1 CPC/NOAA/USA

Colman, A.W, Davey, M., Harrison, M., and Evans, A. 1998: Prediction of March-April-May 1998 Rainfall in Northeast Brazil using input from Multiple Regression, Discriminant Analysis and an Atmospheric Global Circulation model. Exp. Long-lead Bull. Vol 7 no.1 COLA/USA

Colman, A.W, and Davey, M. 1998: Statistical Prediction of January-June 1999 Rainfall in Northeast Brazil using input from Multiple Regression and Discriminant Analysis. Exp. Long-lead Bull. Vol 7 no.4 COLA/USA

Folland, C.K, Colman, A.W, Rowell, D.P and Davey, M.K 2001: Predictability of Northeast Brazil Rainfall and Real-time Forecast Skill, 1987-1998. Accepted by J.Climate

Potts, J.M., Folland, C.K., Jolliffe, I.T., and Sexton, D. 1996: Revised "LEPS" scores for assessing climate model simulations and long-range forecasts. J.Clim. 9,34-53.

Ward, M.N. and Folland, C.K. 1991: Prediction of seasonal rainfall in the North Nordeste of Brazil using eigenvectors of sea surface temperature. Int. J. Climatol., 11,711-743.

Figure captions:

Fig. 1: Locations of the 4 grid points (+) and the 27 stations (*) used to compile the NE Brazil time series.

Fig. 2. The (a) Pacific and (b) Atlantic SST Eigenvector patterns, whose amplitude time series are used as predictors.

Fig. 3a: Amplitude timeseries for the Atlantic eigenvector for Jan 1991 to Nov 1999. Positive values (e.g. SST anomalies warm in north tropical Atlantic, cool in south tropical Atlantic) are associated with drier conditions. November values are marked with *

Fig. 3b: Amplitude timeseries for the Pacific eigenvector for Jan 1991 to Nov 1999. Positive values (e.g. SST anomalies warm in the central-east equatorial Pacific, cool in the north-west and south-west Pacific) are associated with drier conditions. November values are marked with *