VERIFICATION OF WEATHER FORECASTS 
An extreme view denying the need for objectivity is 
that expressed by Hazen [3], who says, 
... to make a proper verification of a weather forecast, and 
one that shall be rigidly applicable to every case... , it should 
be done by... one thoroughly acquainted with the average 
conditions and he should verify from the map on which the 
prediction was based and not from subsequent maps. If the 
verification is from a subsequent map, care should be taken 
to consider what abnormal conditions have occurred which 
could not have been anticipated. 
Selection of Purposes of Verification. Before any satis- 
factory verification scheme is adopted it is necessary to 
determine the primary purpose or purposes to be served 
by the verification This may appear to be obvious, but 
the history of the forecast verification controversy dur- 
ing the past half-century makes it clear that this funda- 
mental pomt has been forgotten again and again. A 
scheme that is adequate for one purpose may be un- 
satisfactory for another, just as an automobile is suited 
for travel over the highway but is quite inefficient for 
flying through the air. Unfortunately, much time has 
been wasted and much confusion has resulted from 
attempts to devise a verification method or single score 
that will serve all purposes. The very nature of weather 
forecasts and verifications and the way they are used 
make one single or absolute standard of evaluation 
impossible. A set of forecasts has many different char- 
acteristics. Some of the forecasts are correct, and 
knowledge of the percentage which are correct may 
serve some purposes. Usually the weather occurrences 
in one part of the range will seem to be more important 
than those in other parts of the range, and this must 
be considered in specifying the purpose of the verifica- 
tion. Thus if the set of forecasts are forecasts of tem- 
perature in a citrus grove, the characteristic which is 
most important may be the percentage of forecasts of 
temperature below freezing which were actually fol- 
lowed by temperature below freezing, and the percent- 
age of forecasts which were correct may be unimportant. 
The situation is in important respects analogous to 
measuring an object, for instance a table. The table has 
length, breadth, height, smoothness of the top, hardness 
of the top, number of legs, etc. A measure of any such 
characteristic is of no value unless a way of using the 
measurement has first been established. 
In general, if each purpose of verification is exactly 
specified in advance, in the form of a hypothesis, not 
only will it be much easier to select verification scores to 
satisfy each purpose, but there will be no doubt as to 
what action is indicated by any numerical value which 
the verification score may have. It will often be desirable 
to select the purpose and the score in such a way that 
the result will either support or reject an a priori hy- 
pothesis. 
Specifications of a Scale of Goodness. Another essen- 
tial criterion for satisfactory verification is that the 
verification scheme should influence the forecaster in 
no undesirable way. One of the greatest arguments 
raised against forecast verification is that forecasts 
which may be the “‘best’’ according to the accepted 
843 
system of arbitrary scores may not be the most useful 
forecasts. Resolving this difficulty becomes a question 
of how to define ‘‘best.”” That there is no unique answer 
to this question can be seen by considering the following 
example in which it is desired to verify some forecasts 
of minimum temperature. Suppose that on some par- 
ticular occasion the probability of occurrence during 
the subsequent night of the various integral degrees of 
minimum temperature is actually known by a fore- 
caster to be as follows: 
Minimum 
temperature (°F) 31 632063830 840 85 86 87 
Probability of 
occurrence 05 .10 .15 .25 .30 .10 .05 
If the forecaster is required to state a single tempera- 
ture figure in his forecast for verification purposes, what 
figure should he state on this particular occasion so as 
to maximize his score? There are a number of answers. 
If the verification scheme counts as hits only the ocea- 
sions when the forecasts are exactly right, he should 
forecast 35F, since this value has the greatest proba- 
bility of occurrence. If he is being verified on the basis 
of mean absolute error, he should say 34F, the median 
of the frequency distribution, since this will minimize 
the sum of the absolute deviations. If he is to be verified 
on the basis of the square or root-mean-square of the 
error, he should forecast 34.15F, which is the mean 
value of the frequency distribution. Any number of such 
arbitrary scoring systems could be devised and they 
will all influence the forecaster, at least to some extent, 
or in effect actually do part of the forecasting. The 
verification scheme may lead the forecaster to forecast 
something other than what he thinks will occur, for it is 
often easier to analyze the effect of different possible 
forecasts on the verification score than it is to analyze 
the weather situation. Some schemes are worse than 
others in this respect, but it is impossible to devise a 
set of scores that is free of this defect. There is one 
possible exception to this which will be discussed in the 
section on verification of probability statements. 
VERIFICATION METHODS AND SCORES 
It has been stated above that, in general. different 
verification statistics will be required for each different 
purpose. The discussion of various verification statistics 
in the following paragraphs can therefore only hint at 
the use which might be made of each of the scores, 
since so few types of scores appear ever to have been of 
real value. 
Contingency-Table Summaries. When forecasts are 
made in categorical classes a useful summary of the 
forecast and observed weather can be presented in the 
form of a contingency table. Such a table does not 
constitute a verification method in itself, but provides 
the basis from which a number of useful pertinent 
scores or indices can easily be obtained An example is 
given in Table I where precipitation was forecast in 
three classes, heavy, moderate, or light, for thirty 
