For some given classes of patterns the 
classification provided by functions based on 
approximations to p(x) may, be quite acceptable. 
In view of their simple implementation, a 
weighted sum of the x;, or even a unweighted 
sum or score of the xj are attractive candi- 
dates for classification functions. However, 
their worth as classification functions in a 
given context can only be evaluated to the 
extent that the deterioration in the error 
curve or risk curve with respect to the optimum 
is assessed. 
It is known that when using procedures 
which are not optimum, a classification pro- 
cedure based on a subset of the x; may do better 
than a procedure based on all the *;: this has 
been found, for example in a slightly different 
context, in the application of discriminant 
functions to speech patterns. It is also 
known that for some situations, when using 
approximate procedures, dividing the xj into 
a number of (mutually exclusive) subsets 34, 
deriving classification functions fj based 
separately on the sj and using fj to obtain 
a final classification function F can be better 
than a similar function based directly on all 
the xj. This is accomplished to some degree 
by learning networks such as those of Fig. 1 
and Fig. 2, in which subsets of the x, are 
selectively connected to summation units 
However, the x; have usually been selected by 
a random method. Tailoring the grouping of 
the x;'s to given classes of patterns will 
generally give better results. 
In the network of Fig. 2, subset of the 
X;, Selected in a random manner, are connected 
through fixed weights to summation units with 
with thresholds, the outputs of a number of 
these summation units being then multiplied 
by weighting coefficients, summed and compared 
against a threshold in the response unit. Let 
bi4 be the fixed weights between the retina 
elements and the summation units, where b3 3 
can be O. Let T; be the thresholds for the 
summation units, W+, the variable weights 
between summation units and response units, 
and Ty, the thresholds for the response units. 
Also, let yj be the outputs from the summatiqn 
units with ys; being O when the threshold T; 
is not exceeded, and a constant otherwise. 
Then the classification functions used by the 
network are 
N 
ap aes7 7) 2) Se i2igersis 
1= 
Pew? . k= V2 ees 
Unlike applications of classification theory 
in many fields, in many pattern recognition 
situations it is possible for the experimenter 
to check how well a particular procedure performs 
since independently he knows to which group a 
given pattern belongs. The easy availability of 
additional samples from each group makes it 
possible to introduce an iterative or "adaptive" 
procedure to improve the performance of the 
classification function. 
An evaluation of the worth of the classifi- 
cation function resulting when iteration based 
on experience is used to modify the state of a 
learning network, is provided by comparing its 
error curve with error curves obtained from a 
likelihood ratio procedure using p(x) and 
various approximations to it, with the error 
curve resulting when the a. are obtained accord- 
ing to Fisher's discriminant function, and with 
the curve obtained when an unweighted ecore of 
the x, is usedl0, 
3. Iterative procedures for learning. The 
problem of using experience to go from some 
arbitrary initial state (a4, fo +-08y) to a 
final state (a), a5,...ay) which will produce 
a desired result can be approached in many ways. 
Typical of a number of efforts ig the approach 
used by Gaborll of minimizing a mean square 
error criterion. The problem may be also stated 
as one of applying a set of transformations T to 
the state vector. In this form, varying degrees 
of complexity can be introduced into the formu- 
lation of the problem, as is illustrated by the 
work in Dynamic Programming. Useful iteration 
procedures can be derived from the simple point 
of view provided by the techniques used in 
stochastic models for learning 12,13,1),15) ang 
from the point of view vrovided by Stochastic 
approximation methods 16, 
lh. Discussion. The operation of some 
proposed networks for pattern recognition has 
been examined in the context of classification 
theory, and related to a class of classification 
procedures. It is clear that the classification 
properties of the networks are acceptable for 
many pattern recognition situations of interest. 
Experimental work on pattern recognition 
and iterative procedures is being carried out 
by the author and his associates C. F. Fey, 
N. J. Molgaard, D. F. Smith and D. F. Parkhill. 
When using various iteration procedures, a 
number of learning nets have produced-learning 
curves similar to those shown in Fig. 3 and h. 
Fig. 5 shows a device, suggested by 
D. F. Parkhill and designed by D. F. Smith 
to facilitate experimentation on pattern 
recognition networks using simple iterative 
techniques. 
REFERENCES 
1. Kanal, L., "Pattern Recognition Studies". 
I. Use of discriminant functions, distance 
functions and clustering transformations 
in pattern recognition. 
345 
