Exercise 1: Logistic Regression
This course consists of videos and programming exercises to teach you about unsupervised feature learning and deep learning. The exercises are designed to give you hands-on, practical experience for getting these algorithms to work. To get the most out of this course, you should watch the videos and complete the exercises in the order in which they are listed.
This first exercise will give you practice with logistic regression. These exercises have been extensively tested with Matlab, but they should also work in Octave, which has been called a "free version of Matlab." If you are using Octave, be sure to install the Image package as well (available for Windows as an option in the installer, and available for Linux from Octave-Forge ).
Download ex1Data.zip, and extract the files from the zip file. If you're using Matlab or Octave, the data you'll need for this exercise is contained in x.dat and y.dat. Each row of the matrix corresponds to one training example ( ), and the corresponding row of is the class label. This dataset has training examples.
In this first problem, you'll implement logistic regression using batch gradient descent.
Initialize , use a learning rate of , and run stochastic gradient descent so that it loops through your entire training set 5 times (i.e., execute the outerloop above 5 times; since you have 2,000 training examples, this corresponds to 10,000 iterations of stochastic gradient descent).
Run stochastic gradient descent, and plot the parameter as a function of the number of iterations taken. In other words, draw a plot with 10,000 points, where the horizontal axis is the number of iterations of stochastic gradient descent taken, and the vertical axis is the value of your parameter after that many iterations. (Note that one "iteration" means performing a single update using a single training example, not taking 5 updates using all of your training set.) Do you see this plot having a ``wavy'' appearance? What do you think causes this?
myperm = randperm(2000); shuffledX = x(myperm, :); shuffledY = y(myperm);
Re-run stochastic gradient descent using the shuffled data, and replot as a function of the number of iterations. How is this plot different from the previous one? Note down the values of and that you get.
After you have completed the exercises above, please refer to the solutions below and check that your implementation and your answers are correct. In a case where your implementation does not result in the same parameters/phenomena as described below, please debug your solution until you manage to replicate the same effect as our implementation.
Verify again that you get , , and . If your values are off by 0.1 or so, that's fine. Even after shuffling the data, with stochastic gradient descent the parameters will oscillate slightly in the vicinity of the global minimum. Note that while we asked you to only run 5 full iterations through the entire training set, we show the graphs for 10 iterations. As can be seen from the graphs, 5 iterations were enough for convergence.