Statistical Prediction of January-June 1999 Rainfall in Northeast Brazil
using input from Multiple Regression and Discriminant Analysis.
contributed by A. Colman, M. Davey
UK Meteorological Office, Bracknell, United Kingdom
1. Introduction
Statistical real-time forecasts of NE Brazil have been issued by the UK Meteorological Office following work by Ward & Folland (1991). Previous forecasts (Colman, 1998) have been made for rainfall between February and May using November-January Sea Surface Temperature Anomalies (SSTA). This is the time of year when predictability is at its maximum. Recent research has shown that the lead time of NE Brazil forecasts can be increased. Predictability from November SSTA is lower than from later SSTA but still significant so this year we are issuing a long lead forecast based on November SSTA.
For this forecast the definition of NE Brazil is the mean rainfall for four gridpoints located at 41.25W,5S, 41.25W,7.5S, 37.5W,5S and 37.5W, 7.5S (Fig. 1). Grid point values are calculated by interpolating observations from neighbouring stations. These gridded values of NE Brazil rainfall have been adopted to enable easy comparisons with GCM forecasts made for this grid. In the future, this will enable these empirical forecasts to be combined with GCM forecasts. The wet season in this region peaks in March and April but substantial rainfall occurs between January and June. Hindcast experiments (table 1) show the extended (January-June) season to be slightly more predictable than shorter periods so this forecast is for January-June.
Two predictors have been found to deliver substantial forecast skill. They are (1) the 30N-30S portion of the third covariance-based EOF of Atlantic SST for all seasons, and (2) the first Empirical Orthogonal Function (EOF) of Pacific SST for Dec-Jan-Feb. These two patterns, shown in Fig. 2, have been used to make empirical regular real-time predictions of NE Brazil seasonal rainfall since 1988, some of which appear in previous issues of this Bulletin, Colman et al.,1996; 1998). The Atlantic EOF pattern reflects the SST anomaly immediately off the North Nordeste east coast and the large scale north-south SST gradient structure, while the Pacific EOF pattern serves mainly as an index of the ENSO situation. The amplitude time series of each of these predictors are used to predict North Nordeste rainfall both with multiple regression (giving a point forecast) and discriminant analysis (giving probabilities for each of five climatologically equiprobable [for 1961-1990] rainfall amount categories).
Details about the EOF analyses, the physical relevance of the predictors, and the two forecasting methods are given in Ward and Folland (1991). Multiple regression develops optimal weights for each predictor in order that the resulting linear equation minimizes squared errors between forecasts and corresponding observations over the training periods (1912-98, 1946-98). In discriminant analysis, 5 categories of rainfall amount are defined which were equiprobable over 1961-1990 and from given values of the predictors, probabilities of each of the rainfall categories are determined using Bayes' theorem. Less linear constraint is imposed here than in multiple regression, as the probabilities do not necessarily change smoothly as a function of category.
2. Forecast Skill
To estimate forecast skill, multiple regression and discriminant analysis Jackknife hindcasts were made for the period 1948-1997 (Tables 1 and 2). Jackknife prediction equations are calculated using data for the whole period minus the predicted year and the two years subsequent to the predicted year. The two subsequent years are excluded to remove positive skill bias due to persistence. Hindcasts were also made for the 1981-1998 period using data from 1912-80 (Table 3). There is an overlap of the SST EOF analysis period (1901-80) and the first evaluation period (1948-1997) but the second evaluation period is completely independent of the SST EOF analysis.
The regression forecasts were assessed by comparing the point estimate rainfall predicted by the regression with the observed value and the category within which the regression forecast lies with the observed category. The point estimates were assessed using: (a) correlations between forecast and observed series, (b) LEPS (Linear Estimation in Probability space) which is a measure of how close the forecasts are to observations in terms of the probability density function of the observations (Potts et al, 1996) and (c) RMSSS (Root Mean Square error Skill Scores) which is a score comparing the RMS of forecasts with persistence.
RMSSS= 1-(RMS(forecast)/RMS(persistence))
The Categorical forecasts are assessed using hit rates. (hit rates - number of times the correct category is forecast/ the total number of forecasts). For these assessments, the observations for 1961-1990 are divided into 5 equal sized categories (quints), 3 equal sized categories (terces) and 2 equal sized categories (sign) and the forecasts are allocated into these categories. The discriminant analysis forecast skill was assessed by comparing the observed category with the most likely category according to the hindcasts over the period 1948-97 (Table 2b).
The skill measures and the contingency tables show significant (5% level) prediction skill, particularly for the extreme (very dry and very wet) categories and no depreciation in skill for the fully independent years (1981-1998). The hindcasts show that there is skill from November SSTA but there is no benefit from using SSTA from previous months so predictor time series are calculated using November SSTA alone.
Predictability is quite high for the January-June (extended season) rainfall total and for March-April (peak season) rainfall but much lower for early and late season (Jan-Feb and May-June) rainfall so our forecasts are for January-June and March-April.
3. The 1999 Forecast
3.1. State of predictors:
At the time of forecast, Above average SSTA in the Atlantic between 0 and 10S and weak La Niña conditions in the Pacific favour above average rainfall in NE Brazil. A weaker signal from below average SSTA in the East Atlantic between 10 and 30S and above average SSTA in the tropical north Atlantic favour below average rainfall.
Figures 3a and b show the monthly timeseries of the Atlantic and Pacific SST anomaly EOF predictors used in the regression and discriminant analysis prediction models.
Example multiple regression equations of the 1-month lead forecast for the January-June standardized rainfall index from standardized SST predictors based on 1912-1998 data are:
0.089 - 0.238 * P - 0.442 * A
where A - Atlantic , P- Pacific EOF time coefficients.
3.2. Empirical Forecasts
For the linear regression predictions, the average of predictions made using training periods 1912-98 and 1946-98 is calculated:
| Rainfall
period |
Forecast | Quint | Quint
range |
Stand
Error |
| Jan-June | +.30 | 4 | -.095 to .53 | 0.76 |
| Mar-Apr | +.25 | 4 | .105 to .66 | 0.75 |
The following forecast probabilities produced by discriminant analysis show the probability of the 1999 January-June index being in each of the quintiles:
| Very
Dry |
Dry | Average | Wet | Very
Wet | |
| Jan-June | .17 | .09 | .22 | .25 | .27 |
| Mar-Apr | .08 | .19 | .23 | .23 | .27 |
3.3. Summary:
From the discriminant analysis and regression predictions, our best estimate forecast for the most likely category is WET (category 4) for January-June and March-April.
References:
Colman, A.W, Davey, M., Harrison, M., and Richardson, D. 1996: Multiple regression, discriminant analysis and unevaluated AGCM Predictions of Mar-Apr-May 1996 Rainfall in NorthEast Brazil. Exp. Long-lead Bull. Vol 5 no.1
CPC/NOAA/USA Colman, A.W, Davey, M., Harrison, M., and Evans, A. 1998: Prediction of March-April-May 1998 Rainfall in Northeast Brazil using input from Multiple Regression, Discriminant Analysis and an Atmospheric Global Circulation model. Exp. Long-lead Bull. Vol 1 no.1 COLA/USA
Potts, J.M., Folland, C.K., Jolliffe, I.T., and Sexton, D. 1996: Revised "LEPS" scores for assessing climate model simulations and long-range forecasts. J.Clim, 9,34-53.
Ward, M.N. and Folland, C.K. 1991: Prediction of seasonal rainfall in the North Nordeste of Brazil using eigenvectors of sea surface temperature. Int. J. Climatol.,11,711-743.
Table 1: Jackknife assessments of regression forecasts 1948-1997
| SSTA
period |
Rainfall
period |
Corr. | LEPS | Hit rate | Hit rate | Hit rate | %RMSSS |
| Quints | Terces | Sign | |||||
| chance | 0.000 | 0.000 | 0.200 | 0.333 | 0.500 | 0.000 | |
| perfection | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 100.00 | |
| Jan | Mar-May | 0.704 | 0.458 | 0.420 | 0.600 | 0.780 | 39.095 |
| Nov | Mar-May | 0.545 | 0.357 | 0.460 | 0.520 | 0.640 | 21.865 |
| Nov | Jan-June | 0.551 | 0.388 | 0.360 | 0.600 | 0.680 | 21.796 |
| Oct-Nov | Jan-June | 0.522 | 0.327 | 0.320 | 0.520 | 0.600 | 19.156 |
| Sep-Nov | Jan-June | 0.538 | 0.300 | 0.320 | 0.600 | 0.640 | 20.725 |
| Aug-Nov | Jan-June | 0.539 | 0.303 | 0.360 | 0.540 | 0.640 | 20.682 |
| Nov | Jan-Feb | 0.306 | 0.152 | 0.300 | 0.380 | 0.620 | 15.067 |
| Nov | Mar-April | 0.539 | 0.316 | 0.380 | 0.500 | 0.620 | 23.425 |
| Nov | May-June | 0.316 | 0.189 | 0.340 | 0.420 | 0.600 | 4.552 |
Table 2: Contingency table showing performance of a) Regression and b) discriminant analysis hindcasts of Jan-June rainfall categories during 1948-97.
Q1-very dry, Q2-dry, Q3-average, Q4-wet,Q5-very wet.
a)
| Observed Q1 | Observed Q2 | Observed Q3 | Observed Q4 | Observed Q5 | |
| Hindcast Q1 | 7 | 4 | 1 | 2 | 0 |
| Hindcast Q2 | 0 | 1 | 3 | 0 | 0 |
| Hindcast Q3 | 2 | 3 | 6 | 5 | 1 |
| Hindcast Q4 | 2 | 1 | 3 | 0 | 1 |
| Hindcast Q5 | 0 | 1 | 1 | 2 | 4 |
b)
| Observed Q1 | Observed Q2 | Observed Q3 | Observed Q4 | Observed Q5 | |
| Hindcast Q1 | 6 | 4 | 4 | 2 | 2 |
| Hindcast Q2 | 0 | 0 | 0 | 0 | 0 |
| Hindcast Q3 | 3 | 1 | 1 | 3 | 1 |
| Hindcast Q4 | 0 | 0 | 1 | 0 | 0 |
| Hindcast Q5 | 2 | 2 | 4 | 7 | 7 |
Table 3: Verification of hindcasts made using multiple regression against observed (obs.) values 1981-1998
| Rainfall
period |
Corr. | LEPS
|
Hit rate | Hit rate | Hit rate | %RMSSS | Bias (SU) |
| Quints | Terces | Sign | |||||
| Change | 0.000 | 0.000 | 0.200 | 0.333 | 0.500 | 0.000 | 0.0 |
| Jan-June | 0.670 | 0.437 | 0.556 | 0.722 | 0.778 | 33.792 | 0.249 |
| Jan-Feb | 0.372 | 0.290 | 0.333 | 0.389 | 0.556 | 26.469 | 0.395 |
| Mar-Apr | 0.593 | 0.360 | 0.444 | 0.500 | 0.667 | 28.028 | 0.097 |
| May-June | 0.530 | 0.265 | 0.278 | 0.556 | 0.500 | 1.047 | 0.713 |
| Jan-June
(persistence) SU-Standardized Units |
0.257 | 0.078 | 0.278 | 0.444 | 0.667 | 0.000 | 0.041 |
Fig. 1: Locations of the 4 grid points (+) and the 27 stations (*) used to compile the NE Brazil time series.
Fig. 2. The (a) Pacific and (b) Atlantic SST Eigenvector patterns, whose amplitude time series are used as predictors.
Fig. 3a: Amplitude time series for the Atlantic eigenvector for Jan 1991 to Nov 1998. Positive values (e.g. SST anomalies warm in north tropical Atlantic, cool in south tropical Atlantic) are associated with drier conditions. November values are marked with *
Fig. 3b: Amplitude time series for the Pacific eigenvector for Jan 1991 to Nov 1998. Positive values (e.g. SST anomalies warm in the central-east equatorial Pacific, cool in the north-west and south-west Pacific) are associated with drier conditions. November values are marked with *