Lab 7
In this machine problem, you will design an LSTM by hand to perform a specified task. Then you will also train it, using gradient descent, to perform the same task.
Useful Files
 Code, test code, data, and solutions.
 Images of the solutions, created using visualize.py.
 Autograder submission site.
Task Description
You have a dataset object with input observations, x[t] (loaded as self.observations[t]), and target outputs y[t] (loaded as self.label[t]). The LSTM should create an activation matrix, self.activation. The last column of self.activation (self.activation[t,4]) contains the LSTM output, h[t].
Your task is to create an LSTM that will perform the following task. For every time step, t,
 If self.observation[t]==0, then output h[t]=0.

If self.observation[t]>=1, then output a count of the
number of time steps since the most recent nonzero observation. More precisely,
 If the preceding nonzero observation was observation[s], then output h[t]=ts.
 If observation[t] is the very first nonzero observation, then output h[t]=t.
Training Epochs
You'll be tested for epochs 1, 0, 50, and 100. If you run visualize.py, it will run epochs 1 through 140, then print an error convergence curve.
 If epoch==1, create a model, self.model, with knowledgebased weights that will perform the task perfectly, using a CReLU activation function (g(x)=max(0,min(1,x))).
 If epoch==0, create pseudorandom initial weights. Code for this is provided for you.
 If epoch>0, load an existing JSON model file. Code for this is provided for you.
LSTM Definition
We'll use an LSTM defined exactly as in lecture (and on Wikipedia), except that (1) the cell nonlinearity (sigma_h) is the same as the gate nonlinearity (sigma_g), and (2) the error is the meansquarederror, instead of the sumsquarederror. Thus:
c[t] = f[t]*c[t1] + i[t]*g(wc*x[t]+uc*h[t1]+bc) i[t] = g(wi*x[t]+ui*h[t1]+bi) f[t] = g(wf*x[t]+uf*h[t1]+bf) o[t] = g(wo*x[t]+uo*h[t1]+bo) h[t] = o[t]*c[t]and
error = 0.5*np.sum(np.square(self.activation[:,4]self.label))where
self.model = np.array([[bc,wc,uc],[bi,wi,ui],[bf,wf,uf],[bo,wo,uo]]) self.activation[t,:] = c[t], i[t], f[t], o[t], h[t]For epoch==1 (knowledgebased design), use the CReLU activation function, g(x) = max(0,min(1,x)), and limit the weights to [1,1]. For epoch >= 0 (gradient descent), use the logistic activation function: g(x) = 1/(1+exp(x)), and the weight values are not limited. These two activation functions are provided for you in the function self.activation(x), and their derivatives are provided in the function self.derivative().
Files included in the distribution
 setup.sh,requirements.txt  defines the version of python and numpy
 submitted.py  skeleton code with comments
 run_tests.py, score.py, tests/test_sequence.py  run the autograder tests
 debug.py  run submitted.py for whichever epoch you specify. If a solution has been distributed to you for the corresponding epoch, loads it, and computes the error between the last step of your output and the corresponding step of the distributed solution.
 visualize.py  make PNG figures that might be useful to help you debug.
 data/*  training observations and labels
 solutions/*  complete solutions for epoch0 and epoch1, scoring hash files for many epochs, PNG files computed by visualize.py for the correct solution.
What to submit:
The file submitted.py, containing all of the functions that you have written.