# Counter propagation Networks

An example of a hybrid network which combine the features of two or more basic network designs. Proposed by Hecht-Nielsen in 1986.

The hidden layer is a Kohonen network with unsupervised learning and the output layer is a Grossberg (outstar) layer fully connected to the hidden layer. The output layer is trained by the Widrow-Hoff rule.

Allows the output of a pattern rather than a simple category number.

Can also be viewed as a bidirectional associative memory.

Fig. above shows a unidirectional counter propagation network used for mapping pattern A of size n to pattern B of size m.

The output of the A subsection of the input layer is fanned out to the competitive middle layer. Each neuron in the output layer receives a signal corresponding to the input pattern’s category along one connection from the middle layer.

The B subsection of the input layer has zero input during actual operation of the network and is used to provide input only during training.

The role of the output layer is to produce the pattern corresponding to the category output by the middle layer. The output layer uses a supervised learning procedure, with direct connection from the input layer’s B subsection providing the correct output.

Training is a two-stage procedure. First, the Kohonen layer is trained on input patterns. No changes are made in the output layer during this step. Once the middle layer is trained to correctly categorise all the input patterns, the weights between the input and middle layers are kept fixed and the output layer is trained to produce correct output patterns by adjusting weights between the middle and output layers.

Training algorithm stage 1:

1.  Apply normalised input vector x to input A.

2.  Determine winning node in the Kohonen layer.

3.  Update winning node’s weight vector –

w(t+1) = w(t)  +  (xw)

4.  Repeat steps 1 through 3 until all vectors have been processed.

5. Repeat steps 1 to 4 until all input vectors have been learned.

## Training algorithm stage 2:

1. Apply normalised input vector x and its corresponding output vector y, to inputs A and B respectively.

2.  Determine winning node in the Kohonen layer.

3. Update weights on the connections from the winning node to the output unit –

wi(t+1) = wi(t)  + (yi – wi)

4. Repeat steps 1 through 3 until all vectors of all classes map to satisfactory outputs.

## Bidirectional Counterpropagation Network

A bidirectional Counter propagation network is capable of a two-way mapping. For example, an A pattern input produces a B pattern output and a B pattern input produces an A pattern output.

The Fig. below illustrates the connections in a bidirectional Counter propagation network. The input and output layers are now of the same size, equal to the sum of the sizes of the A and B patterns. Both A and B sections have full connections to the middle layer, and one-to-one connections to the corresponding neurons in the output layer. The middle layer receives input from all elements of the input layer and transmits its output to the entire output layer.

As a pattern associator the Counter propagation network has the advantage over other networks such as backprop in its ability to do inverse mapping.

## Possible drawback of counter propagation networks

Training a counter propagation network has the same difficulty associated with training a Kohonen network. Counter propagation networks tend to be larger than back propagation networks. If a certain number of mappings are to be learned, the middle layer must have that many number of neurons.

1. Counter propagation network (CPN) ( § 5.3) Basic idea of CPN Purpose: fast and coarse approximation of vector mapping not to map any given x to its with given precision, input vectors x are divided into clusters/classes. each cluster of x has one output y, which is (hopefully) the average of for all x in that class. Architecture: Simple case: FORWARD ONLY CPN , from input to hidden (class) from hidden (class) to output m p n j j,k k k,i i y z x y v z w x y z x 1 1 1

2. Learning in two phases: training sample ( x, d ) where is the desired precise mapping Phase1: weights coming into hidden nodes are trained by competitive learning to become the representative vector of a cluster of input vectors x : (use only x, the input part of ( x, d )) 1. For a chosen x , feedforward to determined the winning 2. 3. Reduce , then repeat steps 1 and 2 until stop condition is met Phase weights going out of hidden nodes are trained by delta rule to be an average output of where x is an input vector that causes to win (use both x and d ). 1. For a chosen x , feedforward to determined the winning 2. (optional) 3. 4. Repeat steps 1 – 3 until stop condition is met

3. A combination of both unsupervised learning (for in phase 1) and supervised learning (for in phase 2). After phase 1, clusters are formed among sample input x , each is a representative of a cluster (average). After phase 2, each cluster k maps to an output vector y, which is the average of View phase 2 learning as following delta rule Notes

4. After training, the network works like a look-up of math table. For any input x, find a region where x falls (represented by the wining z node); use the region as the index to look-up the table for the function value. CPN works in multi-dimensional input space More cluster nodes ( z ), more accurate mapping. Training is much faster than BP May have linear separability problem

5. If both we can establish bi-directional approximation Two pairs of weights matrices: W ( x to z ) and V ( z to y ) for approx. map x to U ( y to z ) and T ( z to x ) for approx. map y to When training sample ( x, y ) is applied ( ), they can jointly determine the winner z k* or separately for Full CPN

6. Adaptive Resonance Theory (ART) ( § 5.4) ART1 : for binary patterns; ART2 : for continuous

7. patterns Motivations: Previous methods have the following problems: Number of class nodes is pre-determined and fixed. Under- and over- classification may result from training Some nodes may have empty classes. no control of the degree of similarity of inputs grouped in one class. Training is non-incremental: with a fixed set of samples, adding new samples often requires re-train the network with the enlarged training set until a new stable state is reached.

8. Ideas of ART model: suppose the input samples have been appropriately classified into k clusters (say by some fashion of competitive learning). each weight vector is a representative (average) of all samples in that cluster. when a new input vector x arrives Find the winner j* among all k cluster nodes Compare with x if they are sufficiently similar ( x resonates with class j*), then update based on else, find/create a free class node and make x as its first member.

9. To achieve these, we need: a mechanism for testing and determining (dis)similarity between x and . a control for finding/creating new class nodes. need to have all operations implemented by units of local computation. Only the basic ideas are presented Simplified from the original ART model Some of the control mechanisms realized by various specialized neurons are done by logic statements of the algorithm

10. ART1 Architecture

11. Working of ART1 3 phases after each input vector x is applied Recognition phase : determine the winner cluster for x Using bottom-up weights b Winner j* with max y j* = b j* ּ x x is tentatively classified to cluster j* the winner may be far away from x (e.g., | t j* – x | is unacceptably large)

12. Working of ART1 (3 phases) Comparison phase : Compute similarity using top-down weights t : vector: If (# of 1’s in s )|/( # of 1’s in x ) > ρ , accept the classification, update b j* and t j* else: remove j* from further consideration, look for other potential winner or create a new node with x as its first patter.

13. Weight update/adaptive phase Initial weight: (no bias) bottom up: top down: When a resonance occurs with If k sample patterns are clustered to node j then = pattern whose 1’s are common to all these k samples

14. Example for input x (1) Node 1 wins

15. increases # of classes learned, and decreases the average class size. Classification may shift during search, will reach stability eventually. There are different versions of ART1 with minor variations ART2 is the same in spirit but different in details. rNotes Classification as a search process No two classes have the same b and t Outliers that do not belong to any cluster will be assigned separate nodes Different ordering of sample input presentations may result in different classification. Increase of
16. ART1 Architecture – + + + + + – + R G2 G1

17. cluster units: competitive, receive input vector x through weights b: to determine winner j . input units: placeholder or external inputs interface units: pass s to x as input vector for classification by compare x and controlled by gain control unit G1 Needs to sequence the three phases (by control units G1, G2, and R )

18. R = 0: resonance occurs, update and R = 1: fails similarity test, inhibits J from further computation