844 
occasions. One of the greatest advances in forecast 
verification was made when the limits of the various 
classes were chosen in such a way that each category 
had an equal probability of occurrence, based on the 
past climatological record. Thus there is no incentive 
for a forecaster to choose one class in preference to 
another because of purely probability (climatological) 
considerations. This principle, with slight modification, 
has been used in the verification of the Extended Fore- 
casts of the U. 8. Weather Bureau [11], and during the 
last war the Army Air Forces [8] devised a scheme in 
which thirty classes (called trentiles) were used, each 
class representing 10 of the frequency distribution 
based on past records of the particular element being 
verified. 
TaBLE I. CONTINGENCY TABLE FOR PRECIPITATION FORECASTS 
n Forecast precipitation 
Observed precipitation 
Heavy Moderate Light Total 
Heavy 5 2 1 8 
Moderate 8 4 1 13 
Light 2 2 5 9 
Total 15 8 7 30 
From this table a number of interesting verification 
statistics can be obtained. A comparison of the margins 
reveals whether the various categories were forecast 
with the same frequency as they occurred. Thus it is 
noted that, although heavy precipitation was forecast 
15 times, it occurred only 8 times, the opposite tendency 
bemg shown for moderate precipitation. This may be 
only a sampling difference, or it may be great enough to 
cause the forecaster to reject the hypothesis that he is 
able to distinguish the relative frequencies of the various 
classes. The relative frequency of occurrences of the 
various classes can also be compared with that expected 
on the basis of climatology. 
Percentage Correct. From the contingency table a 
frequency distribution of errors can easily be obtained. 
In the example given, 14 forecasts are exactly right, 13 
forecasts are wrong by one class, and 3 forecasts are 
wrong by two classes. A commonly used score is the 
per cent right, in this case 1449 = 47 per cent. More 
useful information is provided by constructing two other 
tables, Tables II and III. The extent to which sub- 
Taste IJ. Per Cent or Time Hacu Forecast Event Oc- 
CURRED FOR A PARTICULAR CATEGORY 
Forecast class 
Observed class 
Heavy Moderate Light 
Heavy 33 25 14 
Moderate 53 50 14 
Light 14 25 72 
Total 100 100 100 
sequent observations confirm the prediction when a 
certain event is forecast is shown by Table II. The term 
post agreement has been suggested for this attribute of 
WEATHER FORECASTING 
the forecasts [9] and the term prefigurance for the 
extent to which the forecasts give advance notice of the 
occurrence of a certain event (illustrated by Table III). 
Thus it is seen that forecasts of heavy precipitation were 
followed by the occurrence of heavy precipitation 33 
per cent of the time while occurrences of heavy pre- 
cipitation were correctly indicated in advance 62 per 
cent of the time. 
Tasie III]. Per Cent or Time Eacu OBSERVED CaTEGoRY 
Was Correcriy ForEcAst 
Forecast class 
Observed class 
Heavy Moderate Light Total 
Heavy 62 25 13 100 
Moderate 61 31 8 100 
Light 22 22 56 100 
For some economic uses these two scores are probably 
the most important verification figures. In planning an 
operation which is influenced in‘ an important way by 
heavy rain, for example the operation of a series of 
flood-control dams, it is potentially of value to know 
both the percentage of heavy-rain forecasts which are 
likely to be correct, that is, the reliance to be placed 
on a heavy-rain forecast, and the percentage of heavy- 
rain occurrences which are likely to be correctly forecast. 
Skill Score. The information contained in the con- 
tingency table is often combined into a single index (S), 
called a skill score and apparently first proposed by 
Heidke [4]. It is defined by 
R—-E 
O° aa 
where F is the number of correct forecasts, 7’ is the total 
number of forecasts, and E is the number expected to be 
correct based on some standard such as chance, per- 
sistence, or climatology. This score has a value of unity 
when all forecasts are correct and has a value of zero 
when the number correct is equal to the expected num- 
ber correct. It will be discussed at greater length in the 
section on comparison with control forecasts. 
Average Error, Root-Mean-Square-Error, Etc. When 
the values of the forecast element are expressed on a 
continuous numerical scale, as temperature usually is, 
it is often desirable to express a verification score in 
terms of an average absolute error or a root-mean- 
square-error (RMSE). If in a series of N forecasts F; 
represents the 7-th forecast and O; the corresponding 
observation, the average absolute error @ is given by 
the formula 
a Fa 02 
N 
and the RMSE by 
RMSE = 4/2 Ow, 
One disadvantage of using either of these scores is that 
it gives the “cautious” forecaster an opportunity to 
