add part of opencv

This commit is contained in:
Tang1705
2020-01-27 20:20:56 +08:00
parent 0c4ac1d8bb
commit a71fa47620
6518 changed files with 3122580 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 203 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

View File

@@ -0,0 +1,208 @@
Introduction to Principal Component Analysis (PCA) {#tutorial_introduction_to_pca}
=======================================
Goal
----
In this tutorial you will learn how to:
- Use the OpenCV class @ref cv::PCA to calculate the orientation of an object.
What is PCA?
--------------
Principal Component Analysis (PCA) is a statistical procedure that extracts the most important features of a dataset.
![](images/pca_line.png)
Consider that you have a set of 2D points as it is shown in the figure above. Each dimension corresponds to a feature you are interested in. Here some could argue that the points are set in a random order. However, if you have a better look you will see that there is a linear pattern (indicated by the blue line) which is hard to dismiss. A key point of PCA is the Dimensionality Reduction. Dimensionality Reduction is the process of reducing the number of the dimensions of the given dataset. For example, in the above case it is possible to approximate the set of points to a single line and therefore, reduce the dimensionality of the given points from 2D to 1D.
Moreover, you could also see that the points vary the most along the blue line, more than they vary along the Feature 1 or Feature 2 axes. This means that if you know the position of a point along the blue line you have more information about the point than if you only knew where it was on Feature 1 axis or Feature 2 axis.
Hence, PCA allows us to find the direction along which our data varies the most. In fact, the result of running PCA on the set of points in the diagram consist of 2 vectors called _eigenvectors_ which are the _principal components_ of the data set.
![](images/pca_eigen.png)
The size of each eigenvector is encoded in the corresponding eigenvalue and indicates how much the data vary along the principal component. The beginning of the eigenvectors is the center of all points in the data set. Applying PCA to N-dimensional data set yields N N-dimensional eigenvectors, N eigenvalues and 1 N-dimensional center point. Enough theory, lets see how we can put these ideas into code.
How are the eigenvectors and eigenvalues computed?
--------------------------------------------------
The goal is to transform a given data set __X__ of dimension _p_ to an alternative data set __Y__ of smaller dimension _L_. Equivalently, we are seeking to find the matrix __Y__, where __Y__ is the _KarhunenLoève transform_ (KLT) of matrix __X__:
\f[ \mathbf{Y} = \mathbb{K} \mathbb{L} \mathbb{T} \{\mathbf{X}\} \f]
__Organize the data set__
Suppose you have data comprising a set of observations of _p_ variables, and you want to reduce the data so that each observation can be described with only _L_ variables, _L_ < _p_. Suppose further, that the data are arranged as a set of _n_ data vectors \f$ x_1...x_n \f$ with each \f$ x_i \f$ representing a single grouped observation of the _p_ variables.
- Write \f$ x_1...x_n \f$ as row vectors, each of which has _p_ columns.
- Place the row vectors into a single matrix __X__ of dimensions \f$ n\times p \f$.
__Calculate the empirical mean__
- Find the empirical mean along each dimension \f$ j = 1, ..., p \f$.
- Place the calculated mean values into an empirical mean vector __u__ of dimensions \f$ p\times 1 \f$.
\f[ \mathbf{u[j]} = \frac{1}{n}\sum_{i=1}^{n}\mathbf{X[i,j]} \f]
__Calculate the deviations from the mean__
Mean subtraction is an integral part of the solution towards finding a principal component basis that minimizes the mean square error of approximating the data. Hence, we proceed by centering the data as follows:
- Subtract the empirical mean vector __u__ from each row of the data matrix __X__.
- Store mean-subtracted data in the \f$ n\times p \f$ matrix __B__.
\f[ \mathbf{B} = \mathbf{X} - \mathbf{h}\mathbf{u^{T}} \f]
where __h__ is an \f$ n\times 1 \f$ column vector of all 1s:
\f[ h[i] = 1, i = 1, ..., n \f]
__Find the covariance matrix__
- Find the \f$ p\times p \f$ empirical covariance matrix __C__ from the outer product of matrix __B__ with itself:
\f[ \mathbf{C} = \frac{1}{n-1} \mathbf{B^{*}} \cdot \mathbf{B} \f]
where * is the conjugate transpose operator. Note that if B consists entirely of real numbers, which is the case in many applications, the "conjugate transpose" is the same as the regular transpose.
__Find the eigenvectors and eigenvalues of the covariance matrix__
- Compute the matrix __V__ of eigenvectors which diagonalizes the covariance matrix __C__:
\f[ \mathbf{V^{-1}} \mathbf{C} \mathbf{V} = \mathbf{D} \f]
where __D__ is the diagonal matrix of eigenvalues of __C__.
- Matrix __D__ will take the form of an \f$ p \times p \f$ diagonal matrix:
\f[ D[k,l] = \left\{\begin{matrix} \lambda_k, k = l \\ 0, k \neq l \end{matrix}\right. \f]
here, \f$ \lambda_j \f$ is the _j_-th eigenvalue of the covariance matrix __C__
- Matrix __V__, also of dimension _p_ x _p_, contains _p_ column vectors, each of length _p_, which represent the _p_ eigenvectors of the covariance matrix __C__.
- The eigenvalues and eigenvectors are ordered and paired. The _j_ th eigenvalue corresponds to the _j_ th eigenvector.
@note sources [[1]](https://robospace.wordpress.com/2013/10/09/object-orientation-principal-component-analysis-opencv/), [[2]](http://en.wikipedia.org/wiki/Principal_component_analysis) and special thanks to Svetlin Penkov for the original tutorial.
Source Code
-----------
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py)
- **Code at glance:**
@include samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py
@end_toggle
@note Another example using PCA for dimensionality reduction while maintaining an amount of variance can be found at [opencv_source_code/samples/cpp/pca.cpp](https://github.com/opencv/opencv/tree/master/samples/cpp/pca.cpp)
Explanation
-----------
- __Read image and convert it to binary__
Here we apply the necessary pre-processing procedures in order to be able to detect the objects of interest.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp pre-process
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java pre-process
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py pre-process
@end_toggle
- __Extract objects of interest__
Then find and filter contours by size and obtain the orientation of the remaining ones.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp contours
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java contours
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py contours
@end_toggle
- __Extract orientation__
Orientation is extracted by the call of getOrientation() function, which performs all the PCA procedure.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp pca
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java pca
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py pca
@end_toggle
First the data need to be arranged in a matrix with size n x 2, where n is the number of data points we have. Then we can perform that PCA analysis. The calculated mean (i.e. center of mass) is stored in the _cntr_ variable and the eigenvectors and eigenvalues are stored in the corresponding std::vectors.
- __Visualize result__
The final result is visualized through the drawAxis() function, where the principal components are drawn in lines, and each eigenvector is multiplied by its eigenvalue and translated to the mean position.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp visualization
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java visualization
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py visualization
@end_toggle
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp visualization1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java visualization1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py visualization1
@end_toggle
Results
-------
The code opens an image, finds the orientation of the detected objects of interest and then visualizes the result by drawing the contours of the detected objects of interest, the center point, and the x-axis, y-axis regarding the extracted orientation.
![](images/pca_test1.jpg)
![](images/pca_output.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

View File

@@ -0,0 +1,263 @@
Introduction to Support Vector Machines {#tutorial_introduction_to_svm}
=======================================
Goal
----
In this tutorial you will learn how to:
- Use the OpenCV functions @ref cv::ml::SVM::train to build a classifier based on SVMs and @ref
cv::ml::SVM::predict to test its performance.
What is a SVM?
--------------
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating
hyperplane. In other words, given labeled training data (*supervised learning*), the algorithm
outputs an optimal hyperplane which categorizes new examples.
In which sense is the hyperplane obtained optimal? Let's consider the following simple problem:
For a linearly separable set of 2D-points which belong to one of two classes, find a separating
straight line.
![](images/separating-lines.png)
@note In this example we deal with lines and points in the Cartesian plane instead of hyperplanes
and vectors in a high dimensional space. This is a simplification of the problem.It is important to
understand that this is done only because our intuition is better built from examples that are easy
to imagine. However, the same concepts apply to tasks where the examples to classify lie in a space
whose dimension is higher than two.
In the above picture you can see that there exists multiple lines that offer a solution to the
problem. Is any of them better than the others? We can intuitively define a criterion to estimate
the worth of the lines: <em> A line is bad if it passes too close to the points because it will be
noise sensitive and it will not generalize correctly. </em> Therefore, our goal should be to find
the line passing as far as possible from all points.
Then, the operation of the SVM algorithm is based on finding the hyperplane that gives the largest
minimum distance to the training examples. Twice, this distance receives the important name of
**margin** within SVM's theory. Therefore, the optimal separating hyperplane *maximizes* the margin
of the training data.
![](images/optimal-hyperplane.png)
How is the optimal hyperplane computed?
---------------------------------------
Let's introduce the notation used to define formally a hyperplane:
\f[f(x) = \beta_{0} + \beta^{T} x,\f]
where \f$\beta\f$ is known as the *weight vector* and \f$\beta_{0}\f$ as the *bias*.
@sa A more in depth description of this and hyperplanes you can find in the section 4.5 (*Separating
Hyperplanes*) of the book: *Elements of Statistical Learning* by T. Hastie, R. Tibshirani and J. H.
Friedman (@cite HTF01).
The optimal hyperplane can be represented in an infinite number of different ways by
scaling of \f$\beta\f$ and \f$\beta_{0}\f$. As a matter of convention, among all the possible
representations of the hyperplane, the one chosen is
\f[|\beta_{0} + \beta^{T} x| = 1\f]
where \f$x\f$ symbolizes the training examples closest to the hyperplane. In general, the training
examples that are closest to the hyperplane are called **support vectors**. This representation is
known as the **canonical hyperplane**.
Now, we use the result of geometry that gives the distance between a point \f$x\f$ and a hyperplane
\f$(\beta, \beta_{0})\f$:
\f[\mathrm{distance} = \frac{|\beta_{0} + \beta^{T} x|}{||\beta||}.\f]
In particular, for the canonical hyperplane, the numerator is equal to one and the distance to the
support vectors is
\f[\mathrm{distance}_{\text{ support vectors}} = \frac{|\beta_{0} + \beta^{T} x|}{||\beta||} = \frac{1}{||\beta||}.\f]
Recall that the margin introduced in the previous section, here denoted as \f$M\f$, is twice the
distance to the closest examples:
\f[M = \frac{2}{||\beta||}\f]
Finally, the problem of maximizing \f$M\f$ is equivalent to the problem of minimizing a function
\f$L(\beta)\f$ subject to some constraints. The constraints model the requirement for the hyperplane to
classify correctly all the training examples \f$x_{i}\f$. Formally,
\f[\min_{\beta, \beta_{0}} L(\beta) = \frac{1}{2}||\beta||^{2} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 \text{ } \forall i,\f]
where \f$y_{i}\f$ represents each of the labels of the training examples.
This is a problem of Lagrangian optimization that can be solved using Lagrange multipliers to obtain
the weight vector \f$\beta\f$ and the bias \f$\beta_{0}\f$ of the optimal hyperplane.
Source Code
-----------
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py)
- **Code at glance:**
@include samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py
@end_toggle
Explanation
-----------
- **Set up the training data**
The training data of this exercise is formed by a set of labeled 2D-points that belong to one of
two different classes; one of the classes consists of one point and the other of three points.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp setup1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java setup1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py setup1
@end_toggle
The function @ref cv::ml::SVM::train that will be used afterwards requires the training data to be
stored as @ref cv::Mat objects of floats. Therefore, we create these objects from the arrays
defined above:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp setup2
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java setup2
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py setup1
@end_toggle
- **Set up SVM's parameters**
In this tutorial we have introduced the theory of SVMs in the most simple case, when the
training examples are spread into two classes that are linearly separable. However, SVMs can be
used in a wide variety of problems (e.g. problems with non-linearly separable data, a SVM using
a kernel function to raise the dimensionality of the examples, etc). As a consequence of this,
we have to define some parameters before training the SVM. These parameters are stored in an
object of the class @ref cv::ml::SVM.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp init
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java init
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py init
@end_toggle
Here:
- *Type of SVM*. We choose here the type @ref cv::ml::SVM::C_SVC "C_SVC" that can be used for
n-class classification (n \f$\geq\f$ 2). The important feature of this type is that it deals
with imperfect separation of classes (i.e. when the training data is non-linearly separable).
This feature is not important here since the data is linearly separable and we chose this SVM
type only for being the most commonly used.
- *Type of SVM kernel*. We have not talked about kernel functions since they are not
interesting for the training data we are dealing with. Nevertheless, let's explain briefly now
the main idea behind a kernel function. It is a mapping done to the training data to improve
its resemblance to a linearly separable set of data. This mapping consists of increasing the
dimensionality of the data and is done efficiently using a kernel function. We choose here the
type @ref cv::ml::SVM::LINEAR "LINEAR" which means that no mapping is done. This parameter is
defined using cv::ml::SVM::setKernel.
- *Termination criteria of the algorithm*. The SVM training procedure is implemented solving a
constrained quadratic optimization problem in an **iterative** fashion. Here we specify a
maximum number of iterations and a tolerance error so we allow the algorithm to finish in
less number of steps even if the optimal hyperplane has not been computed yet. This
parameter is defined in a structure @ref cv::TermCriteria .
- **Train the SVM**
We call the method @ref cv::ml::SVM::train to build the SVM model.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp train
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java train
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py train
@end_toggle
- **Regions classified by the SVM**
The method @ref cv::ml::SVM::predict is used to classify an input sample using a trained SVM. In
this example we have used this method in order to color the space depending on the prediction done
by the SVM. In other words, an image is traversed interpreting its pixels as points of the
Cartesian plane. Each of the points is colored depending on the class predicted by the SVM; in
green if it is the class with label 1 and in blue if it is the class with label -1.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp show
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java show
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py show
@end_toggle
- **Support vectors**
We use here a couple of methods to obtain information about the support vectors.
The method @ref cv::ml::SVM::getSupportVectors obtain all of the support
vectors. We have used this methods here to find the training examples that are
support vectors and highlight them.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp show_vectors
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java show_vectors
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py show_vectors
@end_toggle
Results
-------
- The code opens an image and shows the training examples of both classes. The points of one class
are represented with white circles and black ones are used for the other class.
- The SVM is trained and used to classify all the pixels of the image. This results in a division
of the image in a blue region and a green region. The boundary between both regions is the
optimal separating hyperplane.
- Finally the support vectors are shown using gray rings around the training examples.
![](images/svm_intro_result.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

View File

@@ -0,0 +1,278 @@
Support Vector Machines for Non-Linearly Separable Data {#tutorial_non_linear_svms}
=======================================================
Goal
----
In this tutorial you will learn how to:
- Define the optimization problem for SVMs when it is not possible to separate linearly the
training data.
- How to configure the parameters to adapt your SVM for this class of problems.
Motivation
----------
Why is it interesting to extend the SVM optimization problem in order to handle non-linearly separable
training data? Most of the applications in which SVMs are used in computer vision require a more
powerful tool than a simple linear classifier. This stems from the fact that in these tasks __the
training data can be rarely separated using an hyperplane__.
Consider one of these tasks, for example, face detection. The training data in this case is composed
by a set of images that are faces and another set of images that are non-faces (_every other thing
in the world except from faces_). This training data is too complex so as to find a representation
of each sample (_feature vector_) that could make the whole set of faces linearly separable from the
whole set of non-faces.
Extension of the Optimization Problem
-------------------------------------
Remember that using SVMs we obtain a separating hyperplane. Therefore, since the training data is
now non-linearly separable, we must admit that the hyperplane found will misclassify some of the
samples. This _misclassification_ is a new variable in the optimization that must be taken into
account. The new model has to include both the old requirement of finding the hyperplane that gives
the biggest margin and the new one of generalizing the training data correctly by not allowing too
many classification errors.
We start here from the formulation of the optimization problem of finding the hyperplane which
maximizes the __margin__ (this is explained in the previous tutorial (@ref tutorial_introduction_to_svm):
\f[\min_{\beta, \beta_{0}} L(\beta) = \frac{1}{2}||\beta||^{2} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 \text{ } \forall i\f]
There are multiple ways in which this model can be modified so it takes into account the
misclassification errors. For example, one could think of minimizing the same quantity plus a
constant times the number of misclassification errors in the training data, i.e.:
\f[\min ||\beta||^{2} + C \text{(\# misclassication errors)}\f]
However, this one is not a very good solution since, among some other reasons, we do not distinguish
between samples that are misclassified with a small distance to their appropriate decision region or
samples that are not. Therefore, a better solution will take into account the _distance of the
misclassified samples to their correct decision regions_, i.e.:
\f[\min ||\beta||^{2} + C \text{(distance of misclassified samples to their correct regions)}\f]
For each sample of the training data a new parameter \f$\xi_{i}\f$ is defined. Each one of these
parameters contains the distance from its corresponding training sample to their correct decision
region. The following picture shows non-linearly separable training data from two classes, a
separating hyperplane and the distances to their correct regions of the samples that are
misclassified.
![](images/sample-errors-dist.png)
@note Only the distances of the samples that are misclassified are shown in the picture. The
distances of the rest of the samples are zero since they lay already in their correct decision
region.
The red and blue lines that appear on the picture are the margins to each one of the
decision regions. It is very __important__ to realize that each of the \f$\xi_{i}\f$ goes from a
misclassified training sample to the margin of its appropriate region.
Finally, the new formulation for the optimization problem is:
\f[\min_{\beta, \beta_{0}} L(\beta) = ||\beta||^{2} + C \sum_{i} {\xi_{i}} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 - \xi_{i} \text{ and } \xi_{i} \geq 0 \text{ } \forall i\f]
How should the parameter C be chosen? It is obvious that the answer to this question depends on how
the training data is distributed. Although there is no general answer, it is useful to take into
account these rules:
- Large values of C give solutions with _less misclassification errors_ but a _smaller margin_.
Consider that in this case it is expensive to make misclassification errors. Since the aim of
the optimization is to minimize the argument, few misclassifications errors are allowed.
- Small values of C give solutions with _bigger margin_ and _more classification errors_. In this
case the minimization does not consider that much the term of the sum so it focuses more on
finding a hyperplane with big margin.
Source Code
-----------
You may also find the source code in `samples/cpp/tutorial_code/ml/non_linear_svms` folder of the OpenCV source library or
[download it from here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp).
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py)
- **Code at glance:**
@include samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py
@end_toggle
Explanation
-----------
- __Set up the training data__
The training data of this exercise is formed by a set of labeled 2D-points that belong to one of
two different classes. To make the exercise more appealing, the training data is generated
randomly using a uniform probability density functions (PDFs).
We have divided the generation of the training data into two main parts.
In the first part we generate data for both classes that is linearly separable.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp setup1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java setup1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py setup1
@end_toggle
In the second part we create data for both classes that is non-linearly separable, data that
overlaps.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp setup2
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java setup2
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py setup2
@end_toggle
- __Set up SVM's parameters__
@note In the previous tutorial @ref tutorial_introduction_to_svm there is an explanation of the
attributes of the class @ref cv::ml::SVM that we configure here before training the SVM.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp init
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java init
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py init
@end_toggle
There are just two differences between the configuration we do here and the one that was done in
the previous tutorial (@ref tutorial_introduction_to_svm) that we use as reference.
- _C_. We chose here a small value of this parameter in order not to punish too much the
misclassification errors in the optimization. The idea of doing this stems from the will of
obtaining a solution close to the one intuitively expected. However, we recommend to get a
better insight of the problem by making adjustments to this parameter.
@note In this case there are just very few points in the overlapping region between classes.
By giving a smaller value to __FRAC_LINEAR_SEP__ the density of points can be incremented and the
impact of the parameter _C_ explored deeply.
- _Termination Criteria of the algorithm_. The maximum number of iterations has to be
increased considerably in order to solve correctly a problem with non-linearly separable
training data. In particular, we have increased in five orders of magnitude this value.
- __Train the SVM__
We call the method @ref cv::ml::SVM::train to build the SVM model. Watch out that the training
process may take a quite long time. Have patiance when your run the program.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp train
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java train
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py train
@end_toggle
- __Show the Decision Regions__
The method @ref cv::ml::SVM::predict is used to classify an input sample using a trained SVM. In
this example we have used this method in order to color the space depending on the prediction done
by the SVM. In other words, an image is traversed interpreting its pixels as points of the
Cartesian plane. Each of the points is colored depending on the class predicted by the SVM; in
dark green if it is the class with label 1 and in dark blue if it is the class with label 2.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show
@end_toggle
- __Show the training data__
The method @ref cv::circle is used to show the samples that compose the training data. The samples
of the class labeled with 1 are shown in light green and in light blue the samples of the class
labeled with 2.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show_data
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show_data
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show_data
@end_toggle
- __Support vectors__
We use here a couple of methods to obtain information about the support vectors. The method
@ref cv::ml::SVM::getSupportVectors obtain all support vectors. We have used this methods here
to find the training examples that are support vectors and highlight them.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show_vectors
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show_vectors
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show_vectors
@end_toggle
Results
-------
- The code opens an image and shows the training examples of both classes. The points of one class
are represented with light green and light blue ones are used for the other class.
- The SVM is trained and used to classify all the pixels of the image. This results in a division
of the image in a blue region and a green region. The boundary between both regions is the
separating hyperplane. Since the training data is non-linearly separable, it can be seen that
some of the examples of both classes are misclassified; some green points lay on the blue region
and some blue points lay on the green one.
- Finally the support vectors are shown using gray rings around the training examples.
![](images/svm_non_linear_result.png)
You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=vFv2yPcSo-Q).
@youtube{vFv2yPcSo-Q}

View File

@@ -0,0 +1,36 @@
Machine Learning (ml module) {#tutorial_table_of_content_ml}
============================
Use the powerful machine learning classes for statistical classification, regression and clustering
of data.
- @subpage tutorial_introduction_to_svm
*Languages:* C++, Java, Python
*Compatibility:* \> OpenCV 2.0
*Author:* Fernando Iglesias García
Learn what a Support Vector Machine is.
- @subpage tutorial_non_linear_svms
*Languages:* C++, Java, Python
*Compatibility:* \> OpenCV 2.0
*Author:* Fernando Iglesias García
Here you will learn how to define the optimization problem for SVMs when it is not possible to
separate linearly the training data.
- @subpage tutorial_introduction_to_pca
*Languages:* C++, Java, Python
*Compatibility:* \> OpenCV 2.0
*Author:* Theodore Tsesmelis
Learn what a Principal Component Analysis (PCA) is.