\documentclass[12pt]{article}
\input{macros}
\begin{document}
\noindent
{\Huge \bf Exercise 4: Logistic Regression and Newton's Method}
In this exercise, you will use Newton's Method to implement logistic regression
on a classification problem.
{\bf Data}
To begin, download \begin{rawhtml}
ex4Data.zip
\end{rawhtml} and extract the files from the zip file.
For this exercise, suppose that a high school has a dataset representing 40
students who were admitted to college and 40 students who were not
admitted. Each~$(x^{(i)}, y^{(i)})$ training example contains a student's score
on two standardized exams and a label of whether the student was admitted.
Your task is to build a binary classification model that estimates college
admission chances based on a student's scores on two exams. In your training
data,
{\bf a. } The first column of your x array represents all Test 1 scores,
and the second column represents all Test 2 scores.
{\bf b. } The y vector uses '1' to label a student who was admitted and '0' to
label a student who was not admitted.
\bigskip
{\Large \bf Plot the data}
Load the data for the training examples into your program and add
the $x_0 = 1$ intercept term into your x matrix.
Before beginning Newton's Method, we will first plot the data using
different symbols to represent the two classes. In Matlab/Octave,
you can separate
the positive class and the negative class using the find command:
\begin{verbatim}
% find returns the indices of the
% rows meeting the specified condition
pos = find(y == 1); neg = find(y == 0);
% Assume the features are in the 2nd and 3rd
% columns of x
plot(x(pos, 2), x(pos,3), '+'); hold on
plot(x(neg, 2), x(neg, 3), 'o')
\end{verbatim}
Your plot should look like the following:
\begin{rawhtml}
\end{rawhtml}
{\Large \bf Newton's Method}
Recall that in logistic regression, the hypothesis function is
\begin{eqnarray}
h_{\theta}(x)&=&g(\theta^{T}x)=\frac{{1}}{1+e^{-\theta^{T}x}} \nonumber \\
&=&P(y=1|x;\theta) \nonumber
\end{eqnarray}
In our example, the hypothesis is interpreted as the
probability that a driver
will be accident-free, given the values of the features in x.
Matlab/Octave does not have a library function for the sigmoid, so you will have to
define it yourself. The easiest way to do this is through an inline expression:
\begin{verbatim}
g = inline('1.0 ./ (1.0 + exp(-z))');
% Usage: To find the value of the sigmoid
% evaluated at 2, call g(2)
\end{verbatim}
The cost function $J(\theta)$ is defined as
\begin{equation}
J(\theta)=\frac{{1}}{m}\sum_{i=1}^{m}\left[
-y^{(i)}\log(h_{\theta}(x^{(i)}))-
(1-y^{(i)})\log(1-h_{\theta}(x^{(i)}))\right] \nonumber
\end{equation}
Our goal is to use Newton's method to minimize this function. Recall that the
update rule for Newton's method is
\begin{equation}
\theta^{(t+1)}=\theta^{(t)}-H^{-1}\nabla_{\theta}J \nonumber
\end{equation}
In logistic regression, the gradient and the Hessian are
% For some reason trying to merge these next two equations into an eqn array
% produced horrible rendering errors
\begin{equation}
\nabla_{\theta}J = \frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)} \nonumber \nonumber
\end{equation}
\begin{equation}
H & = & \frac{1}{m}\sum_{i=1}^{m}\left[h_{\theta}(x^{(i)})\left(1-h_{\theta}(x^{(i)})\right)x^{(i)}\left(x^{(i)}\right)^{T}\right] \nonumber
\end{equation}
Note that the formulas presented above are the vectorized versions.
Specifically, this means that $x^{(i)} \in R^{n+1}$,
$ x^{(i)}\left(x^{(i)}\right)^{T}\in R^{(n+1) \times (n+1)}$, while $h_{\theta}(x^{(i)})$ and $y^{(i)}$ are scalars.
{\bf Implementation}
Now, implement Newton's Method in your program, starting with the initial value
of $\theta = \vec{0}$. To determine how many iterations
to use, calculate $J(\theta)$ for each iteration and plot your results as you
did in Exercise~2. As mentioned in the lecture videos, Newton's method often
converges in 5-15 iterations. If you find yourself using far more iterations,
you should check for errors in your implementation.
After convergence, use your values of theta to find the decision boundary in
the classification problem. The decision boundary is defined as the line where
\[
P(y=1|x; \theta) = g(\theta^T x) = 0.5
\]
which corresponds to
\[
\theta^T x = 0
\]
Plotting the decision boundary is equivalent to plotting
the $\theta^T x = 0$ line. When
you are finished, your plot should appear like the figure below.
\begin{rawhtml}
\end{rawhtml}
{\bf Questions}
Finally, record your answers to these questions.
{\bf 1.} What values of $\theta$ did you get? How many iterations were
required for convergence?
{\bf 2.} What is the probability that a student with a score of 20 on
Exam~1 and a score of 80 on Exam~2 will not be admitted?
% Preamble to denote the beginning of the solutions section
\begin{rawhtml}
\end{rawhtml}
{\Huge \bf Solutions}
After you have completed the exercises above, please refer to the solutions below and check that your implementation and your
answers are correct. In a case where your implementation does not result in the same parameters/phenomena as described
below, debug your solution until you manage to replicate the same effect as our implementation.
A complete m-file implementation of the solutions can be found \begin{rawhtml}
here
\end{rawhtml}.
\\
{\Large \bf Newton's Method}
{\bf 1.} Your final values of theta should be
\begin{eqnarray}
\theta_0 &=& -16.38 \nonumber \\
\theta_1 &=& 0.1483 \nonumber \\
\theta_2 &=& 0.1589 \nonumber
\end{eqnarray}
{\bf Plot. } Your plot of the cost function should look similar to the picture below:
\begin{rawhtml}
\end{rawhtml}
From this plot, you can infer that Newton's Method has converged by around 5 iterations.
In fact, by looking at a printout of the values of J, you will see that J changes
by less than $10^{-7}$ between the 4th and 5th
iterations. Recall that in the previous two exercises, gradient descent took
hundreds or even thousands of iterations to converge. Newton's Method is much
faster in comparison.
{\bf 2.} The probability that a student with a score of 20 on Exam~1 and 80 on
Exam~2 will not be admitted to college is 0.668.
% Marker for end of solutions section
\begin{rawhtml}