846 
When a series of forecasts has been made using 
probability statements, a study can also be made to de- 
termine whether the forecast probabilities are related 
to the relative frequency at which the events occur. An 
example of this type of comparison is shown in Table V 
(based on a more extended series of such forecasts) 
which suggests a relationship between the forecast and 
the observed probabilities, but which indicates that 
the forecaster should modify or adjust his scale to im- 
prove the forecasts. 
Arbitrary Scores for Special Purposes. An infinite 
number of scores can be devised for verification pur- 
poses, but it is beyond the scope of this paper to discuss 
many of them in detail. In particular cases, such as 
verifying the forecasts of cloud heights or visibilities, 
it may seem desirable to express errors m terms of 
per cent and compute a score such as 
14 |F;- 0;| 
we O; ; 
where fF’; is the forecast value and O; is the observed 
value. 
In other instances, verification of selected cases in 
which either forecast or observation was of particular 
importance may be required, or it may be desired to 
transform the forecasts into forecasts of change before 
verifying them. It 1s common, for example, to want to 
verify only cases when low visibility either is forecast 
or occurs, since high values are much less critical for 
airplane operations. It may be desired to verify only 
the cases when ceiling changes across the critical value, 
either in the forecast or in the occurrence. With more 
intensive study of the decisions which are made on the 
basis of such forecasts and of the consequences of right 
or wrong decisions, it would be possible to devise veri- 
fication scores based on the relative values of different 
forecasts, but as it is, practically all such specialized 
verification systems are arbitrary. This is not tended 
to discourage the use of such scores, but it should be 
emphasized that conclusions as to the relative values 
of two sets of forecasts will always be uncertain so long 
as the value of any individual forecast cannot be esti- 
mated. Innumerable other scores to be used for some 
special purpose could be mentioned. 
CONTROL FORECASTS FOR COMPARISON 
Chance or Random Forecasts, Persistence, Clima- 
tology. Up to this point most of the discussion has been 
concerned with methods of comparing the forecast 
weather with the actual weather and obtaining some 
indices or scores. The final phase of the verification 
procedure is the interpretation of these scores, which is 
usually performed by comparing them with some stand- 
ard. It has long been recognized that a figure such as the 
percentage of correct forecasts is often meaningless, 
since the same figure might be obtained by chance or by 
some scheme of forecasting requiring no meteorological 
knowledge. 
It has been suggested that forecast scores be com- 
pared with scores obtained using various “blind” fore- 
casts such as random forecasts, persistence forecasts, or 
Cp = 
WEATHER FORECASTING 
climatological forecasts. Many arguments have arisen 
over the proper choice (if any) of the blind forecasts to 
use as a standard. There is no correct answer, of course, 
since the choice depends not only upon the use that is 
made of the forecasts but upon the purpose of the 
verification. : 
To illustrate these comparisons, let us return now to 
the skill score defined in the earlier section. The ex- 
pected number of forecasts correct Hp, based on chance 
or random forecasts for the data in Table I, is computed 
by the following formula: 
Ep = elles 
where F; is the total of the 7-th row and C; is the total 
for the 7-th column. In the example given, this is 
a it Ce a= ia ee 
30 9.6, 
so the skill score based upon the margins of the con- 
tingency table is 
Sr = ———. = 0.22. 
If, m this same example, persistence forecasts had 
been made by predicting a continuation of the pre- 
ceding precipitation, the expected number right would 
have been 12, so the skill score would be 
14—12 _ 
30 = 1] 
Sp = 0.11. 
If the climatological frequencies of heavy, moderate, 
and light precipitation were P,, Py,, and P;, respectively, 
the expected number right F, is given by the formula 
E. a Py(Pu)+ Py (Fy) ar Pi(Fx), 
E, = 4 (30)= 10 
and 
in the example, since the three classes were defined as 
equally likely. 
Generally, any score which may be devised can be 
computed for a set of control forecasts a> well as for the 
‘actual forecasts, and a comparison can be made. 
Adjustment for Difficulty. When the purpose of veri- 
fication is to compare a number of forecasters, a practi- 
cal difficulty often encountered is that the forecasts 
are not comparable because they were made at dif- 
ferent times and under different conditions. It is known 
that different synoptic situations present varying de- 
grees of difficulty and it is invalid to make comparisons 
between forecasters working at different times unless 
some attempt is made to take account of this source of 
variation. This can be partly accomplished by finding 
parameters dependent on the synoptic situation that 
are related to the errors in the forecasts. 
For example, it might be found that there is a high 
correlation between the verification scores for visibility 
forecasts made by an individual and the scores obtained 
by a visibility forecast based simply on persistence. By 
