Statistical Prediction of January-June 2000 Rainfall in Northeast Brazil using input from Multiple Regression and Discriminant Analysis

contributed by Andrew Colman and M. Davey

UK Meteorological Office, Bracknell, United Kingdom

1. Introduction

Statistical real-time forecasts of NE Brazil have been issued by the UK Meteorological Office following work by Ward and Folland (1991). Previous forecasts (eg. Colman, 1998) have been made for rainfall between February and May using November-January Sea Surface Temperature Anomalies (SSTA). This is the time of year when predictability is at its maximum. Recent research has shown that the lead time of NE Brazil forecasts can be increased. Predictability from November SSTA is lower than from later SSTA but still significant so this year we are issuing a long lead forecast based on November SSTA.

For this forecast the definition of NE Brazil is the mean rainfall for four gridpoints located at 41.25W,5S, 41.25W,7.5S, 37.5W,5S and 37.5W, 7.5S (Fig. 1). Gridpoint values are calculated by interpolating observations from neighboring stations and then averaged to produce the NE Brazil rainfall index. These gridded values of NE Brazil rainfall have been adopted to enable easy comparisons with GCM (Global Circulation Model) forecasts made for this grid. In the future, this will enable these empirical forecasts to be combined with GCM forecasts. The wet season in this region peaks in March and April but substantial rainfall occurs between January and June. Hindcast experiments show the extended (January-June) season to be slightly more predictable than shorter periods so this forecast is for January-June.

The amplitude time series of two SST anomaly patterns have been found to be skillful predictors of NE Brazil rainfall. The patterns are (1) the 30N-30S portion of the third covariance-based EOF of Atlantic SST for all seasons (fig. 2a), and (2) the first Empirical Orthogonal Function (EOF) of Pacific SST for Dec-Jan-Feb (fig. 2b). The amplitude time series of each of these 2 patterns (fig. 3) are used to predict North Nordeste rainfall with multiple regression (giving a point forecast) and discriminant analysis (giving probabilities for each of five climatologically equiprobable [for 1961-1990] rainfall amount categories). These two predictors have been used to make empirical regular real-time predictions of NE Brazil seasonal rainfall since 1988, some of which appear in previous issues of this Bulletin (eg. Colman et al.,1996; 1998). The Atlantic EOF pattern reflects the SST anomaly immediately off the North Nordeste east coast and the large scale north-south SST gradient structure, while the Pacific EOF pattern provides an index of the ENSO situation.

Details about the EOF analyses, the physical relevance of the predictors, and the two forecasting methods are given in Ward and Folland (1991). Multiple regression develops optimal weights for each predictor in order that the resulting linear equation minimizes squared errors between forecasts and corresponding observations over the training periods (1912-98, 1946-98). In discriminant analysis, 5 categories of rainfall amount are defined which were equiprobable over 1961-1990 and from given values of the predictors, probabilities of each of the rainfall categories are determined using Bayes' theorem. Less linear constraint is imposed here than in multiple regression, as the probabilities do not necessarily change smoothly as a function of category.

2. Forecast Skill

Forecast skill is estimated from the performance of multiple regression and discriminant analysis hindcasts. Last year's forecast (Colman and Davey, 1998) included evaluations of an extensive series of hindcasts providing a rationale for using November SST to predict NE Brazil rainfall. This year, a smaller number of hindcasts using data up to 1999 have been evaluated. Multiple regression and discriminant analysis Jackknife hindcasts based on the SST for November have been made for the period 1950-1999 (Table 1 and Table 2). Jackknife prediction equations are calculated using data for the whole period minus the predicted year and the two years subsequent to the predicted year. The two subsequent years are excluded to remove positive skill bias due to persistence. Hindcasts were also made for the 1981-1999 period using data from 1912-80 (Table 3). There is an overlap of the SST EOF analysis period (1901-80) and the first evaluation period (1948-1997) but the second evaluation period is completely independent of the SST EOF analysis.

The regression forecasts were assessed by comparing the point estimate rainfall predicted by the regression with the observed value and the category within which the regression forecast lies with the observed category. The point estimates were assessed using: (a) correlations between forecast and observed series, (b) LEPS (Linear Estimation in Probability space) which is a measure of how close the forecasts are to observations in terms of the probability density function of the observations (Potts et al, 1996) and (c) RMSSS (Root Mean Square error Skill Scores) which is a score comparing the RMS of forecasts with persistence.

RMSSS= 1-(RMS(forecast)/RMS(persistence))

The categorical forecasts are assessed using hit rates, where hit rate= number of times the correct category is forecast/ the total number of forecasts). For these assessments, the observations for 1961-1990 are divided into 5 equal sized categories (quints), 3 equal sized categories (terces) and 2 equal sized categories (sign) and the forecasts are allocated into these categories. The discriminant analysis forecast skill was assessed by comparing the observed category with the most likely category according to the hindcasts over the period 1950-99 (Table 2b).

The skill measures and the contingency tables show significant (5% level) prediction skill, particularly for the extreme (very dry and very wet) categories and no depreciation in skill for the fully independent years (1981- 1999). These hindcasts show that there is skill from November SSTA but investigations (Colman and Davey, 1998) show there is no benefit from using SSTA from previous months so predictor time series are calculated using November SSTA alone.

Predictability is quite high for the January-June (extended season) rainfall total and for March-April (peak season) rainfall but much lower for early and late season (Jan-Feb and May-June) rainfall (not shown) so our forecasts are for January-June and March-April.

3. The 2000 Forecast

3.1 State of predictors:

At the time of forecast, above average SSTA in the Atlantic between 0 and 20S and La Niña conditions in the Pacific favor above average rainfall in NE Brazil. A weaker signal from above average SSTA in the tropical north Atlantic favors below average rainfall.

Figures 3a and b show the monthly timeseries of the Atlantic and Pacific SST anomaly EOF predictors used in the regression and discriminant analysis prediction models.

Example multiple regression equations of the 1-month lead forecast for the January-June standardized rainfall index from standardized SST predictors based on 1912-1999 data are:

0.083 - 0.237 * P - 0.439 * A

where A = Atlantic, P=Pacific EOF time coefficients.

3. 2 Empirical Forecasts

For the linear regression predictions, the average of predictions made using training periods 1912-99 and 1946-99 is calculated:
Rainfall Period Forecast Quint Range Standard

Error

Standardized units Quint Standardized units
Jan-June +.99 5 (very wet) > .53 0.77
Mar-April +1.01 5 (very wet) > .66 0.77



The following forecast probabilities produced by discriminant analysis show the probability of the 2000 January-June index being in each of the quintiles:
Very Dry Dry Average Wet Very Wet
Jan-June .06 .09 .35 .22 .28
Mar-Apr .02 .33 .20 .09 .37
Chance .20 .20 .20 .20 .20





Summary:

The regression prediction is for the very wet category for the January-June and March-April periods. The discriminant analysis predictions are above chance probabilities for the very wet category for the two forecasts periods and above chance probabilities for the average and dry categories for January-June and March-April respectively. The discriminant analysis forecasts are bimodal reflecting differing signals from Pacific and Atlantic SST. (Pacific SST anomalies favor the very wet category but the Atlantic favors the average or dry category). The regression technique produces a best estimate forecast which cannot display bimodality.

Both methods favor the very wet category for the two forecast periods so our best estimate forecast is for the VERY WET category (4) for the two forecast periods. Confidence in the forecast is low as the discriminant analysis predictions also favor drier categories.

References:

Colman, A.W, Davey, M., Harrison, M., and Richardson, D. 1996: Multiple regression, discriminant analysis and unevaluated AGCM Predictions of Mar-Apr-May 1996 Rainfall in NorthEast Brazil. Exp. Long-lead Bull. Vol 5 no.1 CPC/NOAA/USA.

Colman, A.W, Davey, M., Harrison, M., and Evans, A. 1998: Prediction of March-April-May 1998 Rainfall in Northeast Brazil using input from Multiple Regression, Discriminant Analysis and an Atmospheric Global Circulation model. Exp. Long-lead Bull. Vol 7 no.1 COLA/USA

Colman, A.W, and Davey, M. 1998 Statistical Prediction of January-June 1999 Rainfall in Northeast Brazil using input from Multiple Regression and Discriminant Analysis. Exp. Long-lead Bull. Vol 7 no.4 COLA/USA

Potts, J.M., Folland, C.K., Jolliffe, I.T., and Sexton, D. 1996: Revised "LEPS" scores for assessing climate model simulations and long-range forecasts. J.Clim, 9, 34-53.

Ward, M.N. and Folland, C.K. 1991: Prediction of seasonal rainfall in the North Nordeste of Brazil using eigenvectors of sea surface temperature. Int. J. Climatol., 11,711-743.

Figure captions:

Fig. 1: Locations of the 4 grid points (+) and the 27 stations (*) used to compile the NE Brazil time series.

Fig. 2. The (a) Pacific and (b) Atlantic SST Eigenvector patterns, whose amplitude time series are used as predictors.

Fig. 3a: Amplitude timeseries for the Atlantic eigenvector for Jan 1991 to Nov 1999. Positive values (e.g. SST anomalies warm in north tropical Atlantic, cool in south tropical Atlantic) are associated with drier conditions. November values are marked with *

Fig. 3b: Amplitude timeseries for the Pacific eigenvector for Jan 1991 to Nov 1999. Positive values (e.g. SST anomalies warm in the central-east equatorial Pacific, cool in the north-west and south-west Pacific) are associated with drier conditions. November values are marked with *



Table 1: Jackknife assessments of regression forecasts 1950-1999.
SSTA

period

Rainfall

period

Corr. LEPS Hit

Quints

Rate

Terces



Sign
%RMSSS
chance

perfection

0.000

1.000

0.000

1.000

0.200

1.000

0.333

1.000

0.500

1.000

0.000

100.00

Nov Jan-June 0.583 0.414 0.380 0.600 0.600 24.189



Table 2: Contingency table showing performance of a) Regression and b) discriminant analysis hindcasts of Jan-June rainfall categories during 1950-99. Q1=very dry, Q2=dry, Q3=average, Q4=wet,Q5=very wet.

a)

Observations

Hindcast

Q1 Q2 Q3 Q4 Q5
Q1 7 4 0 2 0
Q2 1 1 4 0 0
Q3 2 4 6 4 1
Q4 2 1 2 1 1
Q5 0 0 1 2 4



b)

Observations

Hindcast

Q1 Q2 Q3 Q4 Q5
Q1 6 5 3 2 2
Q2 1 0 0 0 0
Q3 5 5 9 5 2
Q4 0 0 0 2 4
Q5 2 2 4 7 7

Table 3: Verification of hindcasts made using multiple regression against observed (obs.) values 1981-1999
Rainfall Period Corr. LEPS Hit Rate %RMSSS Bias (SU)
Quints Terces Sign
Chance 0.000 0.000 0.200 0.333 0.500 0.000 0.0
Jan-June

March-April

0.698

0.616

0.517

0.374

0.526

0.421

0.737

0.474

0.737

0.579

37.416

24.020

0.148

-0.089

Jan-June

(persistence)

0.268 0.104 0.263 0.474 0.684 0.000 -0.003

SU=Standardized units