Statistical Prediction of January-June 1999 Rainfall in Northeast Brazil

using input from Multiple Regression and Discriminant Analysis.



contributed by A. Colman, M. Davey



UK Meteorological Office, Bracknell, United Kingdom





1. Introduction

Statistical real-time forecasts of NE Brazil have been issued by the UK Meteorological Office following work by Ward & Folland (1991). Previous forecasts (Colman, 1998) have been made for rainfall between February and May using November-January Sea Surface Temperature Anomalies (SSTA). This is the time of year when predictability is at its maximum. Recent research has shown that the lead time of NE Brazil forecasts can be increased. Predictability from November SSTA is lower than from later SSTA but still significant so this year we are issuing a long lead forecast based on November SSTA.

For this forecast the definition of NE Brazil is the mean rainfall for four gridpoints located at 41.25W,5S, 41.25W,7.5S, 37.5W,5S and 37.5W, 7.5S (Fig. 1). Grid point values are calculated by interpolating observations from neighbouring stations. These gridded values of NE Brazil rainfall have been adopted to enable easy comparisons with GCM forecasts made for this grid. In the future, this will enable these empirical forecasts to be combined with GCM forecasts. The wet season in this region peaks in March and April but substantial rainfall occurs between January and June. Hindcast experiments (table 1) show the extended (January-June) season to be slightly more predictable than shorter periods so this forecast is for January-June.

Two predictors have been found to deliver substantial forecast skill. They are (1) the 30N-30S portion of the third covariance-based EOF of Atlantic SST for all seasons, and (2) the first Empirical Orthogonal Function (EOF) of Pacific SST for Dec-Jan-Feb. These two patterns, shown in Fig. 2, have been used to make empirical regular real-time predictions of NE Brazil seasonal rainfall since 1988, some of which appear in previous issues of this Bulletin, Colman et al.,1996; 1998). The Atlantic EOF pattern reflects the SST anomaly immediately off the North Nordeste east coast and the large scale north-south SST gradient structure, while the Pacific EOF pattern serves mainly as an index of the ENSO situation. The amplitude time series of each of these predictors are used to predict North Nordeste rainfall both with multiple regression (giving a point forecast) and discriminant analysis (giving probabilities for each of five climatologically equiprobable [for 1961-1990] rainfall amount categories).

Details about the EOF analyses, the physical relevance of the predictors, and the two forecasting methods are given in Ward and Folland (1991). Multiple regression develops optimal weights for each predictor in order that the resulting linear equation minimizes squared errors between forecasts and corresponding observations over the training periods (1912-98, 1946-98). In discriminant analysis, 5 categories of rainfall amount are defined which were equiprobable over 1961-1990 and from given values of the predictors, probabilities of each of the rainfall categories are determined using Bayes' theorem. Less linear constraint is imposed here than in multiple regression, as the probabilities do not necessarily change smoothly as a function of category.



2. Forecast Skill

To estimate forecast skill, multiple regression and discriminant analysis Jackknife hindcasts were made for the period 1948-1997 (Tables 1 and 2). Jackknife prediction equations are calculated using data for the whole period minus the predicted year and the two years subsequent to the predicted year. The two subsequent years are excluded to remove positive skill bias due to persistence. Hindcasts were also made for the 1981-1998 period using data from 1912-80 (Table 3). There is an overlap of the SST EOF analysis period (1901-80) and the first evaluation period (1948-1997) but the second evaluation period is completely independent of the SST EOF analysis.

The regression forecasts were assessed by comparing the point estimate rainfall predicted by the regression with the observed value and the category within which the regression forecast lies with the observed category. The point estimates were assessed using: (a) correlations between forecast and observed series, (b) LEPS (Linear Estimation in Probability space) which is a measure of how close the forecasts are to observations in terms of the probability density function of the observations (Potts et al, 1996) and (c) RMSSS (Root Mean Square error Skill Scores) which is a score comparing the RMS of forecasts with persistence.

RMSSS= 1-(RMS(forecast)/RMS(persistence))

The Categorical forecasts are assessed using hit rates. (hit rates - number of times the correct category is forecast/ the total number of forecasts). For these assessments, the observations for 1961-1990 are divided into 5 equal sized categories (quints), 3 equal sized categories (terces) and 2 equal sized categories (sign) and the forecasts are allocated into these categories. The discriminant analysis forecast skill was assessed by comparing the observed category with the most likely category according to the hindcasts over the period 1948-97 (Table 2b).

The skill measures and the contingency tables show significant (5% level) prediction skill, particularly for the extreme (very dry and very wet) categories and no depreciation in skill for the fully independent years (1981-1998). The hindcasts show that there is skill from November SSTA but there is no benefit from using SSTA from previous months so predictor time series are calculated using November SSTA alone.

Predictability is quite high for the January-June (extended season) rainfall total and for March-April (peak season) rainfall but much lower for early and late season (Jan-Feb and May-June) rainfall so our forecasts are for January-June and March-April.



3. The 1999 Forecast

3.1. State of predictors:

At the time of forecast, Above average SSTA in the Atlantic between 0 and 10S and weak La Niña conditions in the Pacific favour above average rainfall in NE Brazil. A weaker signal from below average SSTA in the East Atlantic between 10 and 30S and above average SSTA in the tropical north Atlantic favour below average rainfall.

Figures 3a and b show the monthly timeseries of the Atlantic and Pacific SST anomaly EOF predictors used in the regression and discriminant analysis prediction models.

Example multiple regression equations of the 1-month lead forecast for the January-June standardized rainfall index from standardized SST predictors based on 1912-1998 data are:

0.089 - 0.238 * P - 0.442 * A

where A - Atlantic , P- Pacific EOF time coefficients.



3.2. Empirical Forecasts

For the linear regression predictions, the average of predictions made using training periods 1912-98 and 1946-98 is calculated:
Rainfall

period

Forecast Quint Quint

range

Stand

Error

Jan-June +.30 4 -.095 to .53 0.76
Mar-Apr +.25 4 .105 to .66 0.75



The following forecast probabilities produced by discriminant analysis show the probability of the 1999 January-June index being in each of the quintiles:
Very

Dry

Dry Average Wet Very

Wet

Jan-June .17 .09 .22 .25 .27
Mar-Apr .08 .19 .23 .23 .27



3.3. Summary:

From the discriminant analysis and regression predictions, our best estimate forecast for the most likely category is WET (category 4) for January-June and March-April.



References:

Colman, A.W, Davey, M., Harrison, M., and Richardson, D. 1996: Multiple regression, discriminant analysis and unevaluated AGCM Predictions of Mar-Apr-May 1996 Rainfall in NorthEast Brazil. Exp. Long-lead Bull. Vol 5 no.1

CPC/NOAA/USA Colman, A.W, Davey, M., Harrison, M., and Evans, A. 1998: Prediction of March-April-May 1998 Rainfall in Northeast Brazil using input from Multiple Regression, Discriminant Analysis and an Atmospheric Global Circulation model. Exp. Long-lead Bull. Vol 1 no.1 COLA/USA

Potts, J.M., Folland, C.K., Jolliffe, I.T., and Sexton, D. 1996: Revised "LEPS" scores for assessing climate model simulations and long-range forecasts. J.Clim, 9,34-53.

Ward, M.N. and Folland, C.K. 1991: Prediction of seasonal rainfall in the North Nordeste of Brazil using eigenvectors of sea surface temperature. Int. J. Climatol.,11,711-743.



Table 1: Jackknife assessments of regression forecasts 1948-1997
SSTA

period

Rainfall

period

Corr. LEPS Hit rate Hit rate Hit rate %RMSSS
Quints Terces Sign
chance 0.000 0.000 0.200 0.333 0.500 0.000
perfection 1.000 1.000 1.000 1.000 1.000 100.00
Jan Mar-May 0.704 0.458 0.420 0.600 0.780 39.095
Nov Mar-May 0.545 0.357 0.460 0.520 0.640 21.865
Nov Jan-June 0.551 0.388 0.360 0.600 0.680 21.796
Oct-Nov Jan-June 0.522 0.327 0.320 0.520 0.600 19.156
Sep-Nov Jan-June 0.538 0.300 0.320 0.600 0.640 20.725
Aug-Nov Jan-June 0.539 0.303 0.360 0.540 0.640 20.682
Nov Jan-Feb 0.306 0.152 0.300 0.380 0.620 15.067
Nov Mar-April 0.539 0.316 0.380 0.500 0.620 23.425
Nov May-June 0.316 0.189 0.340 0.420 0.600 4.552



Table 2: Contingency table showing performance of a) Regression and b) discriminant analysis hindcasts of Jan-June rainfall categories during 1948-97.

Q1-very dry, Q2-dry, Q3-average, Q4-wet,Q5-very wet.

a)

Observed Q1 Observed Q2 Observed Q3 Observed Q4 Observed Q5
Hindcast Q1 7 4 1 2 0
Hindcast Q2 0 1 3 0 0
Hindcast Q3 2 3 6 5 1
Hindcast Q4 2 1 3 0 1
Hindcast Q5 0 1 1 2 4


b)
Observed Q1 Observed Q2 Observed Q3 Observed Q4 Observed Q5
Hindcast Q1 6 4 4 2 2
Hindcast Q2 0 0 0 0 0
Hindcast Q3 3 1 1 3 1
Hindcast Q4 0 0 1 0 0
Hindcast Q5 2 2 4 7 7



Table 3: Verification of hindcasts made using multiple regression against observed (obs.) values 1981-1998
Rainfall

period

Corr. LEPS

Hit rate Hit rate Hit rate %RMSSS Bias (SU)
Quints Terces Sign
Change 0.000 0.000 0.200 0.333 0.500 0.000 0.0
Jan-June 0.670 0.437 0.556 0.722 0.778 33.792 0.249
Jan-Feb 0.372 0.290 0.333 0.389 0.556 26.469 0.395
Mar-Apr 0.593 0.360 0.444 0.500 0.667 28.028 0.097
May-June 0.530 0.265 0.278 0.556 0.500 1.047 0.713
Jan-June

(persistence)

SU-Standardized

Units

0.257 0.078 0.278 0.444 0.667 0.000 0.041



Fig. 1: Locations of the 4 grid points (+) and the 27 stations (*) used to compile the NE Brazil time series.

Fig. 2. The (a) Pacific and (b) Atlantic SST Eigenvector patterns, whose amplitude time series are used as predictors.

Fig. 3a: Amplitude time series for the Atlantic eigenvector for Jan 1991 to Nov 1998. Positive values (e.g. SST anomalies warm in north tropical Atlantic, cool in south tropical Atlantic) are associated with drier conditions. November values are marked with *

Fig. 3b: Amplitude time series for the Pacific eigenvector for Jan 1991 to Nov 1998. Positive values (e.g. SST anomalies warm in the central-east equatorial Pacific, cool in the north-west and south-west Pacific) are associated with drier conditions. November values are marked with *