VERIFICATION OF WEATHER FORECASTS 
means of a regression analysis the scores S made by a 
number of forecasters can then be adjusted to equiva- 
lent ‘“‘persistence” values before making comparisons 
by use of the formula 
Sa = S + b (S, _ Sp), 
where S, is the adjusted score, S the unadjusted score, 
S, the persistence score corresponding to S, S, the 
mean persistence score, and b the average within re- 
gression of forecast score on persistence score. This pro- 
cedure can be generalized to take account of two or 
more parameters measuring difficulty. 
Forecast-Reversal Test. A useful and simple pro- 
cedure that can often be used to advantage might be 
called the “‘forecast-reversal” test. Suppose that a long- 
range forecaster claims he can pick out a year in advance 
the days of the month upon which rain will fall, pro- 
vided a “tolerance” in timing of one day in either 
direction is allowed. On this basis some arbitrary veri- 
fication is proposed which appears to give a relatively 
good score when applied to some actual forecasts. In 
the forecast-reversal test the same arbitrary verification 
procedure would be used, but the forecasts would be 
reversed with ‘‘rain” days being called ‘no-rain” 
days and vice versa. If approximately the same verifi- 
cation score were obtained on the reversed forecasts, 
one would be led to suspect either that the forecasts 
had no merit or that the verification scheme had no 
merit, or both. If this procedure were more generally 
used, especially for long-range forecasts, probably fewer 
claims would be made by long-range forecasters. It 
would also probably result in the selection of more sound 
verification schemes for use in determining the accuracy 
of such forecasts. 
PITFALLS OF VERIFICATION 
There are many pitfalls of verification to entrap both 
the novice and the expert. One of the greatest dangers 
lies in attempts to compare the relative abilities of 
forecasters on the basis of forecasts which are not 
comparable because of differences of location, season, 
time of day, length of forecast interval, etc. The reason 
for this is that the degree of forecasting difficulty 
varies so much from one circumstance to another that 
a very large sample of forecasts is needed to assure that 
the average weather has been approximately the same 
in the two sets of forecasts being compared. Even if the 
forecasts being compared are for the same event, there 
may be other factors to be considered, such as whether 
equal map facilities were available to each forecaster. 
Interpretation is made even more difficult when scores 
on two or more forecast weather elements are arbi- 
trarily combined to form a single index to be used for 
comparison purposes. It may be true, for example, that 
there is some particular use of a forecast for which an 
error of 2° in temperature is equivalent to an error of 
0.25 in. in a precipitation forecast, but it is certain that 
this (or any other) arbitrary weighting is not true in 
general Those who must make a selection between fore- 
casters based on verification scores may demand a 
single index to represent the verification of all forecasts 
847 
made by each man, but since a logical combination of 
scores 1s In general impossible, it would be far better to 
consider each score separately. As pointed out in the 
section on Selection of Purposes of Verification, if it is 
decided ahead of time just what measures of accuracy 
are needed and what will be done with each, the need 
for combining scores could be largely eliminated. 
Another practice that may lead to difficulty is the 
use of overlapping classes for verification purposes. 
Thus rules may be set up stating that a ceiling forecast 
of 200 ft will verify if the observation is between 100 
and 400 ft, and that a forecast of 400 ft will verify if the 
observation is between 200 and 800 ft. Although such 
rules may seem reasonable they tend to encourage 
forecasters to hedge by choosing the classes having the 
widest range. Also, the choice of rather wide tolerances 
in verification limits tends to give rather high and uni- 
form percentage scores to all forecasters, thus failing to 
discriminate between the better and poorer forecasters. 
Since the use of a contingency table for comparing 
forecasts and observations presents the data in one of 
the most useful forms, the practice of using overlapping 
classes is to be discouraged because such verification 
cannot be displayed in this way. 
FUTURE RESEARCH 
The preceding discussion has emphasized that verifi- 
cation is less difficult than has been thought and that the 
lack of recognition in the past of the various and 
diverse objectives of verification has been the real 
obstacle to progress. One of the most promising fields 
of study in this connection is that of setting up realistic 
problems to be solved and selecting scores which would 
furnish the desired information. Such studies might 
profitably be conducted in collaboration with adminis- 
trators who need to select or rank forecasters; with 
economists or business advisors who need to know the 
effect of weather on operations and who are in the best 
position to determine what characteristic of the fore- 
casts 1s related to profit and loss factors, or with meteor- 
ologists who know what characteristic of the forecast 
is useful for verifying scientific hypotheses. This appears 
to be a field of study in which private meteorologists 
should play a very important role, for their knowledge of 
the real uses which can be made of weather forecasts 
should be invaluable in specifying the questions which 
need to be answered by verification. On the scientific 
side of verification, the relation of forecast error to 
measures of forecast “difficulty” needs further investi- 
gation. If measures relating the synoptic conditions to 
forecast errors can be found, it would not only assist in 
the comparison of forecasts made at different times, but 
would also provide information as to where applied 
meteorological research is most needed and might con- 
tribute to fundamental knowledge of atmospheric proc- 
esses. 
Investigations regarding the effect of verification sys- 
tems on the forecaster are needed since very little 
concrete evidence on this point is available. In this 
connection the use of probability statements in fore- 
casting needs to be explored in more detail, for if such 
