\documentclass[12pt]{article}
\begin{document}
\noindent
{\Huge \bf Exercise 8: Non-linear SVM classification with kernels}
In this exercise, you will an RBF kernel to classify data that is not
linearly separable. As in the last exercise, you will use the
\begin{rawhtml}
LIBSVM
\end{rawhtml} interface to MATLAB/Octave to build an SVM model.
If you do not already have LIBSVM on your computer, refer to the previous
exercise for directions on installing and running LIBSVM.
To begin, download and unzip the data for this exercise: \begin{rawhtml}
ex8Data.zip\end{rawhtml}.
\bigskip
{\Large \bf Kernels}
Recall from the lecture videos that linearly non-separable features often become
linearly separable after they are mapped to a high dimensional feature space.
However, we don't ever need to compute the feature mappings $\phi(x^{(i)})$
explicitly: we only need to work with their kernels, which are
easier to compute. Therefore, it's possible to create a very complex decision
boundary based on a high dimensional (even infinite dimensional)
feature mapping but still have an efficient computation because of the kernel
representation.
% Something to look up later: does high-dimensional need the
% hyphen? Because sometimes it looks wrong
% without the hyphen, but there's some rule about adverbs that says
% you don't need the hyphen, so you can write things like 'a well done job'
{\bf The RBF kernel}
In this exercise, you will use the Radial Basis Function (RBF) kernel in LIBSVM.
This kernel has the formula
\begin{eqnarray*}
K\left(x^{(i)},x^{(j)}\right) & = & \phi(x^{(i)})^{T}\phi(x^{(j)})\\
& = & \exp\left(-\gamma\left\Vert x^{(i)}-x^{(j)}\right\Vert ^{2}\right),
\qquad\gamma>0\end{eqnarray*}
Notice that this is the same as the Gaussian kernel in the video lectures,
except that term $\frac{1}{2\sigma^2}$ in the Gaussian kernel has been
replaced by $\gamma$. Once again, remember that at no point will you need
to calculate $\phi(x)$ directly. In fact, $\phi(x)$ is infinite dimensional for
this kernel, so storing it in memory would be impossible.
{\Large \bf Example 1}
Now let's see how an RBF kernel can choose a non-linear decision boundary. Load
the LIBSVM-formatted data "ex8a.txt" into your Matlab/Octave workspace using
the command
\begin{verbatim}
[train_labels, train_features] = libsvmread('ex8b.txt');
\end{verbatim}
This is a two-dimensional classification problem, and if you plot the
positives and negatives using different colors, you should get
an image like the one below.
\begin{rawhtml}
\end{rawhtml}
There's no linear decision boundary for this dataset, but we'll see now
how an RBF kernel can automatically decide a non-linear one.
Using the {\tt svmtrain} command that you learned in the last exercise,
train an SVM model on an RBF kernel with $\gamma = 100$. If you don't
remember how to set the parameters for this command, type "svmtrain" at
the MATLAB/Octave console for usage directions.
Once you have the model, you can use the "plotboundary" command that we have
provided to visualize the decision boundary.
\begin{verbatim}
plotboundary(train_labels, train_features, model);
\end{verbatim}
This gives you the following result:
\begin{rawhtml}
\end{rawhtml}
The {\tt plotboundary} command includes an option for seeing the filled-in contours
of the $\sum_i \alpha_i K(x^{(i)}, x) + b$ function. Remember that this function
gives the decision values that are used to make a classification. An example
$x$ is classified as a positive example if
$\sum_i \alpha_i K(x^{(i)}, x) + b \geq 0$, and is classified negative otherwise.
To activate this option, type
\begin{verbatim}
plotboundary(train_labels, train_features, model, 't');
\end{verbatim}
The result will look like the image below.
\begin{rawhtml}
\end{rawhtml}
Intuitively, you can think of the white areas as the
highest-confidence regions for positive classification, and the black areas
as the highest-confidence regions for negative classification. The color
gradient shows the change in decision values for making classifications.
{\Large \bf Example 2}
In this second example, we'll look at how the parameter
$\gamma$ in the RBF kernel affects the decision boundary.
Load the data file "ex8b.txt" into your workspace and plot the data. Your
result should look similar to this:
\begin{rawhtml}
\end{rawhtml}
Now train your model using $\gamma$ values of 1, 10, 100, and 1000 and plot
the decision boundary (using no contour fill) for each model.
How does the fit of the boundary change with $\gamma$?
When you are finished, check your answers with the solutions.
% Preamble to denote the beginning of the solutions section
\begin{rawhtml}
\end{rawhtml}
{\Huge \bf Solutions}
\bigskip
An m-file solution of this exercise can be found
\begin{rawhtml}
here.\end{rawhtml}
Here are the decision boundaries for each model:
\begin{rawhtml}
\end{rawhtml}
\begin{rawhtml}
\end{rawhtml}
\begin{rawhtml}
\end{rawhtml}
\begin{rawhtml}
\end{rawhtml}
The classification accuracies on the training data for these models are
91.9\%, 93.3\%, 94.3\%, and 100\%, respectively. This shows that as $\gamma$
increases, the algorithm tries harder to avoid misclassifying training data,
which leads to overfitting.
It may not be so obvious how to set a value of $\gamma$ that works well
for a model without overfitting the data.
In practice, there are model selection techniques that can help
you choose. You will learn about them in upcoming
video lectures for this course.
% Marker for end of solutions section
\begin{rawhtml}