\input{macros} % Don't actually need the macros, but latex2html messes up
% The links somehow if this is not provided. Weird
\documentclass[12pt]{article}
\begin{document}
\noindent
{\Huge \bf Exercise 2: Linear Regression}
This course consists of videos and programming exercises to teach you about
machine learning. The exercises are designed to
give you hands-on, practical experience for getting these algorithms to work.
To get the most out of this course, you should watch the videos and complete
the exercises in the order in which they are listed.
This first exercise will give you practice with linear regression.
These exercises have been extensively tested with Matlab, but
they should also work in
\begin{rawhtml}
Octave
\end{rawhtml}, which has been called a "free version of Matlab."
If you are using Octave, be sure to install the {\bf Image} package as well
(available for Windows as an option in the installer, and available
for Linux from
\begin{rawhtml}
Octave-Forge
\end{rawhtml}
).
{\bf Data}
Download \begin{rawhtml}
ex2Data.zip
\end{rawhtml}, and extract the files from the zip file.
The files contain some example measurements of heights for various boys
between the ages of two and eights. The y-values are the heights measured
in meters, and the x-values are the ages of the boys corresponding to the
heights.
Each height and age tuple constitutes one training example $(x^{(i)}, y^{(i)}$ in
our dataset. There are $m = 50$ training examples, and you will use them to
develop a linear regression model.
\bigskip
{\Large \bf Supervised learning problem}
In this problem, you'll implement linear regression using gradient descent.
In Matlab/Octave, you can load the training set using the commands
\begin{verbatim}
x = load('ex2x.dat');
y = load('ex2y.dat');
\end{verbatim}
This will be our training set for a supervised learning problem with $n=1$ features (~in addition to the usual
$x_0 = 1$, so $x \in \Re^2$~). If you're using Matlab/Octave, run the following commands to plot your
training set (and label the axes):
\begin{verbatim}
figure % open a new figure window
plot(x, y, 'o');
ylabel('Height in meters')
xlabel('Age in years')
\end{verbatim}
You should see a series of data points similar to the figure below.
\begin{rawhtml}
\end{rawhtml}
{\Huge \bf Solutions}
After you have completed the exercises above, please refer to the solutions below and check that your implementation and your
answers are correct. In a case where your implementation does not result in the same parameters/phenomena as described
below, debug your solution until you manage to replicate the same effect as our implementation.
A complete m-file implementation of the solutions can be found \begin{rawhtml}
here
\end{rawhtml}. Run this m-file in Matlab/Octave to produce all the solutions
and their corresponding graphs.
\\
\item {\Large \bf Linear Regression}
{\bf 1.} After your first iteration of gradient descent, verify that you get
% Do not use eqnarray* here! There is some weird latex2html bug that causes
% an error
\begin{eqnarray*}
\theta_0 &=& 0.0745 \\
\theta_1 &=& 0.3800
\end{eqnarray*}
If your answer does not exactly match this solution, you may have implemented something
wrong. Did you get the correct $\theta_0 = 0.0745$, but the wrong answer for
$\theta_1$? (You might have gotten $\theta_1 = 0.4057$). If this happened, you
probably updated the $\theta_j$ terms sequentially, that is, you first
updated $\theta_0$, plugged that value back into $\theta$, and then
updated $\theta_1$. Remember that you should not be basing your calculations on
any intermediate values of $\theta$ that you would get this way.
{\bf 2.} After running gradient descent until convergence, verify that your
parameters are approximately equal to the exact closed-form solution (which you
will learn about in the next assignment):
\begin{eqnarray*}
\theta_0 &=& 0.7502 \\
\theta_1 &=& 0.0639
\end{eqnarray*}
If you run gradient descent in MATLAB for 1500 iterations at a learning rate of 0.07,
you should see these exact numbers for theta. If used fewer iterations, your answer
should not differ by more than 0.01, or you probably did not iterate enough.
For example,
running gradient descent in MATLAB for 500 iterations gives theta = [0.7318, 0.0672].
This is close to convergence, but theta can still get closer to the exact value if you
run gradient descent some more.
If your answer differs drastically from the solutions above, there may be a bug in your
implementation. Check that you used the correct learning rate of 0.07 and that you
defined the gradient descent update correctly. Then, check that your x and y vectors
are indeed what you expect them to be. Remember that x needs an extra column of ones.
{\bf 3. } The predicted height for age 3.5 is 0.9737~meters, and for
age 7 is 1.245~meters.
{\bf Plot} A plot of the training data with the best fit from gradient descent
should look like the following graph.
\begin{rawhtml}
\end{rawhtml}
\item {\Large \bf Understanding $J(\theta)$}
In your surface plot, you should see that the cost function $J(\theta)$ approaches a minimum near the values of $\theta_0$ and $\theta_1$ that you found through
gradient descent. In general, the cost function for a linear regression problem
will be bowl-shaped with a global minimum and no local optima.
Depending on the viewing window of your surface plot, it may not be so
apparent that the cost function is bowl-shaped. To see the approach to the
global optimum better, it may be helpful to plot a contour plot.
In Matlab/Octave, this can be done with
\begin{verbatim}
figure;
% Plot the cost function with 15 contours spaced logarithmically
% between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 2, 15))
xlabel('\theta_0'); ylabel('\theta_1')
\end{verbatim}
The result is a plot like the following.
\begin{rawhtml}
\end{rawhtml}
Now the location of the minimum is more obvious.
% Marker for end of solutions section
\begin{rawhtml}
\end{rawhtml}
\end{document}