Machine Learning
Andrew Ng
Exercise 5: Regularization
In this exercise, you will implement regularized linear regression and regularized logistic regression.
Data
To begin, download ex5Data.zip and extract the files from the zip file. This data bundle contains two sets of data, one for linear regression and the other for logistic regression. It also includes a helper function named 'map_feature.m' which will be used for logistic regression. Make sure that this function's m-file is placed in the same working directory where you plan to write your code.
Regularized linear regression
The first part of this exercise focuses on regularized linear regression and the normal equations.
Plot the data
Load the data files "ex5Linx.dat" and "ex5Liny.dat" into your program. These correspond to the "x" and "y" variables that you will start out with.
Notice that in this data, the input "x" is a single feature, so you can plot y as a function of x on a 2-dimensional graph (try it yourself):

From looking at this plot, it seems that fitting a straight line might be too simple of an approximation. Instead, we will try fitting a higher-order polynomial to the data to capture more of the variations in the points.
Let's try a fifth-order polynomial. Our hypothesis will be

This means that we have a hypothesis of six features, because

Since we are fitting a 5th-order polynomial to a data set of only 7 points, over-fitting is likely to occur. To guard against this, we will use regularization in our model.
Recall that in regularization problems, the goal is to minimize the following
cost function with respect to :

The regularization parameter




Normal equations
Now we will find the best parameters of our model using the normal equations.
Recall that the normal equations solution to regularized linear regression is
![\begin{displaymath}
\theta=(X^{T}X+\lambda\left[\begin{array}{cccc}
0\\
& 1\\
& & \ddots\\
& & & 1\end{array}\right])^{-1}X^{T}\vec{y}\end{displaymath}](img7.png)
The matrix following





![\begin{displaymath}
\begin{array}{cc}
\par
\vec{y} = \left[
\begin{array}{c}
y...
...m)})^T\mbox{-}
\end{array}\right] \nonumber
\par
\end{array}\end{displaymath}](img12.png)
Using this equation, find values for

a. (this is the same case as non-regularized linear
regression)
b.
c.
As you are implementing your program, keep in mind
that is an
matrix, because there are
training examples
and
features, plus an
intercept term.
In the data provided for this exercise, you were only give the first power of
. You will need to include the other powers of
in
your feature vector
,
which means that the first column will contain all ones, the next column will
contain the first powers, the next column will contain the second powers, and
so on. You can do this in Matlab/Octave with the command
x = [ones(m, 1), x, x.^2, x.^3, x.^4, x.^5];
When you have found the answers for , verify them with
the values in the solutions. In addition to listing the values
for each element
of the
vector, we will
also provide the L2-norm of
so you can quickly
check if your answer is correct. In Matlab/Octave, you can calculate the
L2-norm of a vector x using the command norm(x).
Also, plot the polyomial fit for
each value of . You will get plots similar to these:



From looking at these graphs, what conclusions can you make about how
the regularization parameter affects your model?
Regularized logistic regression
In this 2nd part of the exercise, you will implement regularized logistic
regression using Newton's Method. To begin, load the files 'ex5Logx.dat' and ex5Logy.dat' into your program. This dataset
represents the training set of a logistic regression problem with two features.
To avoid confusion later, we will refer to the two input features
contained in 'ex5Logx.dat' as and
.
So in the 'ex5Logx.dat' file, the first column of numbers represents the
feature
, which you will plot on the horizontal axis, and the second
feature represents
, which you will plot on the vertical axis.
After loading the data, plot the points using different markers to distinguish between the two classifications. The commands in Matlab/Octave will be:
x = load('ex5Logx.dat'); y = load('ex5Logy.dat'); figure % Find the indices for the 2 classes pos = find(y); neg = find(y == 0); plot(x(pos, 1), x(pos, 2), '+') hold on plot(x(neg, 1), x(neg, 2), 'o')
After plotting your image, it should look something like this:

We will now fit a regularized regression model to this data.
Recall that in logistic regression, the hypothesis function is
![]() |
![]() |
![]() |
|
![]() |
![]() |
Let's look at the parameter in the sigmoid function
.
In this exercise, we will assign to be all monomials (meaning polynomial terms) of
and
up to
the sixth power:
![\begin{displaymath}
x=\left[\begin{array}{c}
1\\
u\\
v\\
u^2\\
uv\\
v^2\\
u^3\\
\vdots\\
uv^5\\
v^6\end{array}\right]
\end{displaymath}](img29.png)
To clarify this notation: we have made a 28-feature vector where
. Remember that
was the first column of numbers in your
'ex5Logx.dat' file and
was the second column. From now on, we will just refer to the entries
of
as
,
, and so on instead of their values in terms of
and
.
To save you the trouble of enumerating all the terms of , we've included
a Matlab/Octave helper function named 'map_feature'
that maps the original inputs to the feature vector. This function works for
a single training example as well as for an entire training. To
use this function, place 'map_feature.m' in your working directory and call
x = map_feature(u, v)
This assumes that the two original features were stored in column vectors named 'u' and 'v.' (If you had only one training example, each column vector would be a scalar.) The function will output a new feature array stored in the variable 'x.' Of course, you can use any names you'd like for the arguments and the output. Just make sure your two arguments are column vectors of the same size.
Before building this model, recall that our objective is to minimize the cost function in regularized logistic regression:
![\begin{displaymath}
J(\theta)=-\frac{{1}}{m}\sum_{i=1}^{m}\left[
y^{(i)}\log(h_...
...\right] +
\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2}
\par
\end{displaymath}](img33.png)
Notice that this looks like the cost function for unregularized logistic regression, except that there is a regularization term at the end. We will now minimize this function using Newton's method.
Newton's method
Recall that the Newton's Method update rule is

This is the same rule that you used for unregularized logistic regression
in Exercise 4. But because you are now implementing regularization,
the gradient
and the Hessian
have different
forms:
![\begin{displaymath}
\nabla_{\theta}J = \left[\begin{array}{c}
\frac{1}{m}\sum_{i...
...ight)x_{n}^{(i)}+\frac{\lambda}{m}\theta_{n}\end{array}\right]
\end{displaymath}](img37.png)
![\begin{displaymath}
H & = & \frac{1}{m}\left[\sum_{i=1}^{m}h_{\theta}(x^{(i)})\l...
...}{cccc}
0\\
& 1\\
& & \ddots\\
& & & 1\end{array}\right]
\end{displaymath}](img38.png)
Notice that if you substitute into these expressions, you will
see the same formulas as unregularized logistic regression.
Also, remember that in these formulas,
1. is your feature vector, which is a 28x1 vector in this exercise.
2.
is a 28x1 vector.
3.
and
are 28x28 matrices.
4. and
are scalars.
5. The matrix following
in the Hessian formula
is a 28x28 diagonal matrix with a zero in the upper left and ones on every
other diagonal entry.
Run Newton's Method
Now run Newton's Method on this dataset using the three values of lambda below:
a. (this is the same case as non-regularized linear
regression)
b.
c.
To determine whether Newton's Method has converged, it may help to print
out the value of during each iteration.
should not
be decreasing at any point during Newton's Method. If it is, check that you
have defined
correctly. Also check your definitions of the gradient
and Hessian to make sure there are no mistakes in the regularization parts.
After convergence,
use your values of theta to find the decision boundary in
the classification problem. The decision boundary is defined as the line where
![]() |
Plotting the decision boundary here will be trickier than plotting the best-fit
curve in linear regression. You will need to plot the
line implicity, by plotting a contour. This can be done by
evaluating
over a grid of points representing the original
and
inputs, and then plotting the line where
evaluates to zero.
The plot implementation for Matlab/Octave is given below. To get the best viewing results, use the same plotting ranges that we use here.
% Define the ranges of the grid u = linspace(-1, 1.5, 200); v = linspace(-1, 1.5, 200); % Initialize space for the values to be plotted z = zeros(length(u), length(v)); % Evaluate z = theta*x over the grid for i = 1:length(u) for j = 1:length(v) % Notice the order of j, i here! z(j,i) = map_feature(u(i), v(j))*theta; end end % Because of the way that contour plotting works % in Matlab, we need to transpose z, or % else the axis orientation will be flipped! z = z' % Plot z = 0 by specifying the range [0, 0] contour(u,v,z, [0, 0], 'LineWidth', 2)
When you are finished, your plots for the three values of should
look similar to the ones below.



Finally, because there are 28 elements , we will not provide an
element-by-element comparison in the solutions. Instead, use norm(theta)
to calculate the L2-norm of
, and check it against the norm in the
solutions.