CS 4804 Homework #6

Date Assigned: October 31, 2003
Date Due: November 10, 2003, in class, before class starts

(20 points) Consider a perceptron that takes inputs (x1, x2, x3, x4) and has weights (w1, w2, w3, w4) for these inputs respectively. In addition, it has a "threshold"/constant input always set to 1 with weight "b". Assume that the action of the perceptron is:
- output zero if (x1*w1 + x2*w2 + x3*w3 + x4*w4 < -b)
- output one if (x1*w1 + x2*w2 + x3*w3 + x4*w4 > b)
- output (1/2b)*(x1*w1 + x2*w2 + x3*w3 + x4*w4 + b) otherwise
In other words, instead of the usual threshold or sigmoid nonlinearity, we have a "ramp" function. Derive the learning rule for this perceptron. Does the weight space for this perceptron have a single minimum (global) or does it have multiple local minima?
(20 points) Consider the cascade network shown below. There are three inputs to the network (x1, x2, and x3). Sigmoid unit number 1 receives all of these inputs as well as a threshold input (set to 1). Sigmoid unit number 2 has as its inputs the output of sigmoid unit 1 as well as the original inputs (four of them). All links are weighted with adjustable weights. Derive an appropriate backpropagation-style incremental weight adjustment procedure for this network based on minimizing the sum-of-squared error between the network's output and the labels of a set of training input vectors. Then apply this algorithm on the data given in cascadedata and report the weights (first four columns in this file are x1, x2, x3, and x4 respectively; last column is the network's output). Present your answer as a closed form expression relating the output of the network to the inputs.
(60 points) In this problem, you will train a regular neural network on the Iris dataset (the only files relevant are iris.data and iris.names). First take a look at iris.names and familiarize yourself with the format of the data in iris.data. Notice that there are four input variables (which are continuous) and one class variable (that can take on three possible values).
- Use at most one hidden layer of nodes. You have to decide for yourself how many nodes you will need to use.
- Pick either distributed/local encoding for the output layer. Explain the reason for your choice.
- Code up a backpropagation algorithm in a language of your choice. Split up the given data into 2/3 training and 1/3 test. Make sure that the distribution of classes in both training and test is the same as the original distribution (i.e., the entropy is preserved).
- For each iteration of the backpropagation algorithm (i.e., one sweep through each piece of training data), compute the sum-of-squared errors on both training and test datasets, and track these metrics.
For full credit, explain all design decisions you made, a printout of your code, graphs displaying the performance, and a list of observations.