Lecture 5 - Michael Thomas
Activation Functions and XOR Networks

The following is the LMS Error plot of one learning session for the XOR Network. From this a number of question arise.

Questions

Explanations

First, we plot the hidden unit space, just like we had plotted the input space.

How? y=ax+b PLug in patterns get...

 Epoch 0

 Epoch 100

 

 Epoch 200

 
Epoch 300


Hidden Unit 1

Notice, the network changes the hidden to output weights more than the input to the hidden weights. This is because the input to output weights are a derivative times the derivation. In essence the error is diluted every step back.

 Epoch 300

 Epoch 400


Hidden Unit 1

 Epoch 450

 Epoch 2000

Notice again, that the network hasn't completely reach the global minimum, rather the point corresponding to 0,0 is still in the lower left. Clearly, it should be more right, for everything to be hunky dorry, BUT the network won't do this as it already spent too much time strengthening the weight in the wrong direction.

How is this changing the activation function?

Well, of course, the sigmoid function never acctually changes, BUT the connectivity of the network can change the part of the sigmoid function each individual unit uses. For Michael's money, if the output units responds differently to different inputs (aka, moving from a linear to a sigmoid), then this is effectively changing the activation function.
     Indeed, the output unit of the XOR does change its responding, from linear, to sigmoid, to linear and back again. Why? Because as the net input changes, the slope of the sigmoid function changes. Thus, the larger the net input, the steeper the sigmoid slope.

 

Back to Connectionist Summer School HomePage

 Coments to: ghollich@yahoo.com

 Last Modified: Sep 20, 1999