Forecasts of Surface Temperature and Precipitation Anomalies Over the U.S. Using Screening Multiple Linear Regression

 

contributed by D. Unger

 

Climate Prediction Center, NOAA, Camp Springs, Maryland

 

 

Screening multiple linear regression (SMLR) is used to predict seasonal temperature and precipitation amounts for locations over the mainland United States. Predictor data consist of northern hemisphere 700-mb heights, near global SSTs and station values of mean temperature and total precipitation amount from the 3-mo period prior to the forecast initial time of June 1, 1998. Forecasts for the mean temperature and total precipitation are made for a series of 13 overlapping 3-month periods, at one month intervals, beginning with Jul-Aug-Sep 1998 and extending through Jul-Aug-Sep 1999. Regression relationships were derived from data for the 1955-97 period. Forecasts were produced from single location equations for each of 102 climatic regions that are approximately evenly distributed throughout the U.S. and are approximately equal in area.

 

All predictors and predictands were expressed as standardized anomalies relative to the developmental data. Precipitation amounts were transformed by taking their square roots prior to standardization in order to help normalize their distribution. Twenty-five candidate predictors, selected from gridpoint values in regions of known importance for climate prediction, were offered for screening in the regression development. A few predictor locations were chosen on the basis of data examination of the first 20 years of the sample, referred to here as the base period. Information from the most recent 20 years was never used for selection of candidate predictors (Unger, 1996a). One additional predictor, carbon-dioxide concentration from Mauna Loa Observatory, was offered in order to capture long-term trends in the data. This crude trend variable provides the screening procedure with a convenient predictor with which to identify stations that have simple trends in their predictand values.

 

A variation of a retroactive real time (RRT) validation technique was used to estimate forecast skill (Unger 1996b). To estimate skill by RRT, a forecast equation was derived from the base period and applied to the next year's data to obtain independent data results. The case was then added to the developmental sample, a new relationship was derived and applied to the following year's data. Independent data statistics accumulate on a year by year basis in exactly the same way as an operational forecast procedure, except retroactively. Forecasts were obtained for the base period years by application of RRT in reverse. Bi-directional RRT (BRRT) validation technique provides that each available case contribute to a skill estimate as independent data in a way similar to cross-validation except with a great reduction in the distortion of results, due to redundant sampling in cross-validation (Unger, 1996b).

 

A forward selection screening procedure was used for equation development. The top 5 terms were selected for each equation. Separate statistics were accumulated for each equation length, so that results for all the one, two, three, four and five term equations were calculated. The optimum equation length was then estimated by an objective learning procedure that used the past performance at each RRT trial to "predict" which equation would perform the best on the next. Verification statistics from this "best guess" forecast were also kept separately and were used to obtain the final skill estimate of the forecasts.

 

The verification is based on the temporal correlation coefficient between forecast and observation on the 40 independent cases at each of the 102 locations. Field significance was measured by comparison of spatially averaged correlation coefficients from forecasts applied to actual target years against those applied to 500 randomly shuffled target periods. Field significance expresses the fraction of cases in which the random forecast series outperformed the actual forecasts.

 

The final forecasts are post-processed to obtain an estimate of the likelihood of the above, normal, or below class being observed, as defined by the terciles of the distribution for each forecast element and location. A forecast is assigned a class on the basis of the forecast distribution and skill. An estimate of the increased likelihood of a given class is made to place the forecast in a format similar to the operational long lead forecasts issued by the CPC (O'Lenic, 1994).

 

The probability assignments for temperature forecasts are made by integration of the estimated forecast error distribution against the 1961-90 temperature class limits. An estimate of the skill for low and high temporal frequencies (obtained from the 10-year moving average of the forecasts and the residual of this value from the raw forecast, respectively) is used to estimate the forecast error distribution (Unger, 1997). The class limit with the highest departure from climatology is displayed with its anomaly contoured. Because precipitation trends are less pronounced, precipitation probabilities are estimated on the basis of the empirical probabilities associated with skill and forecast magnitude as determined from historical forecasts.

 

The forecasts initialized from MAM 1998 are shown in Figs. 1, 3, and 5 with the corresponding skill estimates for each station shown in Figs. 2, 4 and 6. Shading on the forecast maps indicates areas of at least 3 percent probability anomaly with darker colors at the 10 and 20 percent contours. Contours within the shaded areas on the forecast maps indicate the probability anomaly at 5 percent intervals.

 

The numbers plotted in Figs. 1, 3 and 5 indicate station values of the post-processed regression forecasts, damped according to the forecast-observation correlation on independent data to minimize the squared error. Non-zero numbers plotted outside of shaded regions indicate forecast anomalies of substantial magnitude at stations with some skill, but lower than the skill threshold to choose a forecast category with confidence.

 

Regression forecasts for JAS 1998 (Fig. 1) show above normal temperatures over southeastern California, southern Nevada, and eastern Arizona, along the Pacific coast north of central California, in Texas, and in the East from Florida to North Carolina extending inland into Ohio and Michigan. There is only one area of very weak below normal indications in eastern Nebraska. The field significance for this forecast is .03.

 

The precipitation forecast for JAS 1998 (Fig. 3) is weak and not field significant (.20). It shows only isolated areas of above median precipitation in parts of Montana, Minnesota, and southern Michigan, with below median precipitation amounts for Northeastern Florida and in Mississippi.

 

Figure 5 shows the temperature forecast for SON 1998 with the skill estimate shown in Figure 6. The fall forecast is for above normal temperatures in the Pacific Northwest, the desert Southwest, northern California, southern Florida, and portions of the northern high plains from Kansas to eastern Wyoming. There are very weak indications for below average temperatures near southern Wisconsin and in the Mississippi Valley near Tennessee. The field significance for this map is only .22, which means that the high frequency signals are likely to be due to noise in the data, leaving only the low frequency signals in Florida and the desert Southwest with any reliability.

 

References:

 

O'Lenic, E., 1994: A new paradigm for production and dissemination of the NWS's long lead-time seasonal climate outlooks. Proceedings of the Nineteenth Annual Climate Diagnostics Workshop. College Park, Maryland, November 14-18, 1994, 408-411.

 

Unger, D. A., 1996a: Long lead climate prediction using screening multiple linear regression. Proceedings of the Twentieth Annual Climate Diagnostics Workshop. Seattle, Washington, October 23-27, 1995, 425-428.

 

Unger, D. A., 1996b: Skill assessment strategies for screening regression predictions based on a small sample size. Preprints, Thirteenth Conference on Probability and Statistics in the Atmospheric Sciences. San Francisco, CA., February 21-23, 1996, 260-267.

 

Unger, D. A., 1997: Conversion of Long Lead Climate Predictions from Continuous to Probabilistic Form. Proceedings of the Twenty-first Annual Climate Diagnostics and Prediction Workshop. Huntsville, Alabama October 28-November 1, 1996. (44-47).

 

 

 

Figure Captions:

 

 

Figure 1. A 1-mo lead screening regression-based temperature forecast for JAS 1998. Contours are estimated probability anomalies of the specified tercile. Shaded areas delineate the area of sufficient skill to depart from climatology by at least 3 percent. Plotted numbers are station values of the standardized anomaly.

 

 

 

Figure 2. Distribution of skill for the 1-mo lead regression forecast for JAS 1998 temperatures. Both the plotted values and the contours are the correlation (x100) between forecast and observation for the 1955-1997 period.

 

 

 

Figure 3. Same as Fig. 1 except for precipitation forecasts.

 

 

 

Figure 4. Same as Fig. 2 except for precipitation skill.

 

 

 

Figure 5. Same as Fig. 1 except for a 3-mo lead valid for SON 1998.

 

 

 

Figure 6. Same as Fig. 2 except for a 3-mo lead valid for SON 1998.