**Exercise 1: Logistic Regression**

This course consists of videos and programming exercises to teach you about unsupervised feature learning and deep learning. The exercises are designed to give you hands-on, practical experience for getting these algorithms to work. To get the most out of this course, you should watch the videos and complete the exercises in the order in which they are listed.

This first exercise will give you practice with logistic regression.
These exercises have been extensively tested with Matlab, but
they should also work in
Octave, which has been called a "free version of Matlab."
If you are using Octave, be sure to install the **Image** package as well
(available for Windows as an option in the installer, and available
for Linux from
Octave-Forge
).

Download ex1Data.zip, and extract the files from the zip file. If you're using Matlab or Octave, the data you'll need for this exercise is contained in x.dat and y.dat. Each row of the matrix corresponds to one training example ( ), and the corresponding row of is the class label. This dataset has training examples.

**1. Batch gradient descent**In this first problem, you'll implement logistic regression using

**batch**gradient descent.- (a) Recall that the logistic regression model is

Here, we are not using weight decay. Implement batch gradient descent, and use it to find a good setting of and . Choose a reasonable value of yourself, and note down the values of and found by your algorithm. (To verify that your implementation is correct, later we'll ask you to check your values of and against ours.)

- (b) Plot as a function of the number of iterations, and verify convergence of your algorithm. The
objective function is given by

**2. Practice with stochastic gradient descent**- (a) Implement stochastic gradient descent for the same logistic regression model as Question 1. For now,
leave the data in the original ordering, and do not shuffle the data. Recall that the stochastic descent learning
algorithm is:

Initialize , use a learning rate of , and run stochastic gradient descent so that it loops through your entire training set 5 times (i.e., execute the outerloop above 5 times; since you have 2,000 training examples, this corresponds to 10,000 iterations of stochastic gradient descent).

Run stochastic gradient descent, and plot the parameter as a function of the number of iterations taken. In other words, draw a plot with 10,000 points, where the horizontal axis is the number of iterations of stochastic gradient descent taken, and the vertical axis is the value of your parameter after that many iterations. (Note that one "iteration" means performing a single update using a single training example, not taking 5 updates using all of your training set.) Do you see this plot having a ``wavy'' appearance? What do you think causes this?

- (b) Repeat the previous problem, but now shuffle the training set first. If you are using Matlab/Octave,
you can use the
`randperm`command. Specifically,`randperm(2000)`generates a random permutation of the integers from 1 to 2000. Thus, this series of commands will shuffle your data:myperm = randperm(2000); shuffledX = x(myperm, :); shuffledY = y(myperm);

Re-run stochastic gradient descent using the shuffled data, and replot as a function of the number of iterations. How is this plot different from the previous one? Note down the values of and that you get.

- (c) Use an exponentially-weighted average to keep track of an online estimate of while stochastic gradient
descent is running, and plot this online estimate as a function of the number of iterations.
Recall that the online estimate is obtained by initializing , and then updating it
(on each iteration of stochastic gradient descent) using a formula such as

- (a) Recall that the logistic regression model is