Exercise 1: Logistic Regression
This course consists of videos and programming exercises to teach you about unsupervised feature learning and deep learning. The exercises are designed to give you hands-on, practical experience for getting these algorithms to work. To get the most out of this course, you should watch the videos and complete the exercises in the order in which they are listed.
This first exercise will give you practice with logistic regression. These exercises have been extensively tested with Matlab, but they should also work in Octave, which has been called a "free version of Matlab." If you are using Octave, be sure to install the Image package as well (available for Windows as an option in the installer, and available for Linux from Octave-Forge ).
Download
ex1Data.zip, and extract the files from the zip file. If you're using Matlab or Octave, the data you'll need
for this exercise is contained in x.dat and y.dat. Each row
of the matrix corresponds to one training example (
), and the corresponding row of
is the class label. This dataset has
training examples.
- 1. Batch gradient descent
In this first problem, you'll implement logistic regression using batch gradient descent.
- (a) Recall that the logistic regression model is
Here, we are not using weight decay. Implement batch gradient descent, and use it to find a good setting ofand
. Choose a reasonable value of
yourself, and note down the values of
and
found by your algorithm. (To verify that your implementation is correct, later we'll ask you to check your values of
and
against ours.)
- (b) Plot
as a function of the number of iterations, and verify convergence of your algorithm. The objective function is given by
that you had chosen, also repeat this exercise (re-initializaing gradient descent to
each time) using
and
.
- 2. Practice with stochastic gradient descent
- (a) Implement stochastic gradient descent for the same logistic regression model as Question 1. For now,
leave the data in the original ordering, and do not shuffle the data. Recall that the stochastic descent learning
algorithm is:
         
                   
                   
         Initialize
, use a learning rate of
, and run stochastic gradient descent so that it loops through your entire training set 5 times (i.e., execute the outerloop above 5 times; since you have 2,000 training examples, this corresponds to 10,000 iterations of stochastic gradient descent).
Run stochastic gradient descent, and plot the parameter
as a function of the number of iterations taken. In other words, draw a plot with 10,000 points, where the horizontal axis is the number of iterations of stochastic gradient descent taken, and the vertical axis is the value of your parameter
after that many iterations. (Note that one "iteration" means performing a single update using a single training example, not taking 5 updates using all of your training set.) Do you see this plot having a ``wavy'' appearance? What do you think causes this?
- (b) Repeat the previous problem, but now shuffle the training set first. If you are using Matlab/Octave,
you can use the randperm command. Specifically, randperm(2000) generates a random permutation
of the integers from 1 to 2000. Thus, this series of commands will shuffle your data:
myperm = randperm(2000); shuffledX = x(myperm, :); shuffledY = y(myperm);
Re-run stochastic gradient descent using the shuffled data, and replot
as a function of the number of iterations. How is this plot different from the previous one? Note down the values of
and
that you get.
- (c) Use an exponentially-weighted average to keep track of an online estimate of
while stochastic gradient descent is running, and plot this online estimate as a function of the number of iterations. Recall that the online estimate is obtained by initializing
, and then updating it (on each iteration of stochastic gradient descent) using a formula such as
- (a) Recall that the logistic regression model is