add part of opencv

This commit is contained in:
Tang1705
2020-01-27 19:26:57 +08:00
parent 0c4ac1d8bb
commit 54c5864e7e
6130 changed files with 2631746 additions and 0 deletions

View File

@@ -0,0 +1,308 @@
Camera calibration With OpenCV {#tutorial_camera_calibration}
==============================
Cameras have been around for a long-long time. However, with the introduction of the cheap *pinhole*
cameras in the late 20th century, they became a common occurrence in our everyday life.
Unfortunately, this cheapness comes with its price: significant distortion. Luckily, these are
constants and with a calibration and some remapping we can correct this. Furthermore, with
calibration you may also determine the relation between the camera's natural units (pixels) and the
real world units (for example millimeters).
Theory
------
For the distortion OpenCV takes into account the radial and tangential factors. For the radial
factor one uses the following formula:
\f[x_{distorted} = x( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \\
y_{distorted} = y( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6)\f]
So for an undistorted pixel point at \f$(x,y)\f$ coordinates, its position on the distorted image
will be \f$(x_{distorted} y_{distorted})\f$. The presence of the radial distortion manifests in form
of the "barrel" or "fish-eye" effect.
Tangential distortion occurs because the image taking lenses are not perfectly parallel to the
imaging plane. It can be represented via the formulas:
\f[x_{distorted} = x + [ 2p_1xy + p_2(r^2+2x^2)] \\
y_{distorted} = y + [ p_1(r^2+ 2y^2)+ 2p_2xy]\f]
So we have five distortion parameters which in OpenCV are presented as one row matrix with 5
columns:
\f[distortion\_coefficients=(k_1 \hspace{10pt} k_2 \hspace{10pt} p_1 \hspace{10pt} p_2 \hspace{10pt} k_3)\f]
Now for the unit conversion we use the following formula:
\f[\left [ \begin{matrix} x \\ y \\ w \end{matrix} \right ] = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix} X \\ Y \\ Z \end{matrix} \right ]\f]
Here the presence of \f$w\f$ is explained by the use of homography coordinate system (and \f$w=Z\f$). The
unknown parameters are \f$f_x\f$ and \f$f_y\f$ (camera focal lengths) and \f$(c_x, c_y)\f$ which are the optical
centers expressed in pixels coordinates. If for both axes a common focal length is used with a given
\f$a\f$ aspect ratio (usually 1), then \f$f_y=f_x*a\f$ and in the upper formula we will have a single focal
length \f$f\f$. The matrix containing these four parameters is referred to as the *camera matrix*. While
the distortion coefficients are the same regardless of the camera resolutions used, these should be
scaled along with the current resolution from the calibrated resolution.
The process of determining these two matrices is the calibration. Calculation of these parameters is
done through basic geometrical equations. The equations used depend on the chosen calibrating
objects. Currently OpenCV supports three types of objects for calibration:
- Classical black-white chessboard
- Symmetrical circle pattern
- Asymmetrical circle pattern
Basically, you need to take snapshots of these patterns with your camera and let OpenCV find them.
Each found pattern results in a new equation. To solve the equation you need at least a
predetermined number of pattern snapshots to form a well-posed equation system. This number is
higher for the chessboard pattern and less for the circle ones. For example, in theory the
chessboard pattern requires at least two snapshots. However, in practice we have a good amount of
noise present in our input images, so for good results you will probably need at least 10 good
snapshots of the input pattern in different positions.
Goal
----
The sample application will:
- Determine the distortion matrix
- Determine the camera matrix
- Take input from Camera, Video and Image file list
- Read configuration from XML/YAML file
- Save the results into XML/YAML file
- Calculate re-projection error
Source code
-----------
You may also find the source code in the `samples/cpp/tutorial_code/calib3d/camera_calibration/`
folder of the OpenCV source library or [download it from here
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp). For the usage of the program, run it with `-h` argument. The program has an
essential argument: the name of its configuration file. If none is given then it will try to open the
one named "default.xml". [Here's a sample configuration file
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/calib3d/camera_calibration/in_VID5.xml) in XML format. In the
configuration file you may choose to use camera as an input, a video file or an image list. If you
opt for the last one, you will need to create a configuration file where you enumerate the images to
use. Here's [an example of this ](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/calib3d/camera_calibration/VID5.xml).
The important part to remember is that the images need to be specified using the absolute path or
the relative one from your application's working directory. You may find all this in the samples
directory mentioned above.
The application starts up with reading the settings from the configuration file. Although, this is
an important part of it, it has nothing to do with the subject of this tutorial: *camera
calibration*. Therefore, I've chosen not to post the code for that part here. Technical background
on how to do this you can find in the @ref tutorial_file_input_output_with_xml_yml tutorial.
Explanation
-----------
-# **Read the settings**
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp file_read
For this I've used simple OpenCV class input operation. After reading the file I've an
additional post-processing function that checks validity of the input. Only if all inputs are
good then *goodInput* variable will be true.
-# **Get next input, if it fails or we have enough of them - calibrate**
After this we have a big
loop where we do the following operations: get the next image from the image list, camera or
video file. If this fails or we have enough images then we run the calibration process. In case
of image we step out of the loop and otherwise the remaining frames will be undistorted (if the
option is set) via changing from *DETECTION* mode to the *CALIBRATED* one.
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp get_input
For some cameras we may need to flip the input image. Here we do this too.
-# **Find the pattern in the current input**
The formation of the equations I mentioned above aims
to finding major patterns in the input: in case of the chessboard this are corners of the
squares and for the circles, well, the circles themselves. The position of these will form the
result which will be written into the *pointBuf* vector.
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp find_pattern
Depending on the type of the input pattern you use either the @ref cv::findChessboardCorners or
the @ref cv::findCirclesGrid function. For both of them you pass the current image and the size
of the board and you'll get the positions of the patterns. Furthermore, they return a boolean
variable which states if the pattern was found in the input (we only need to take into account
those images where this is true!).
Then again in case of cameras we only take camera images when an input delay time is passed.
This is done in order to allow user moving the chessboard around and getting different images.
Similar images result in similar equations, and similar equations at the calibration step will
form an ill-posed problem, so the calibration will fail. For square images the positions of the
corners are only approximate. We may improve this by calling the @ref cv::cornerSubPix function.
(`winSize` is used to control the side length of the search window. Its default value is 11.
`winSzie` may be changed by command line parameter `--winSize=<number>`.)
It will produce better calibration result. After this we add a valid inputs result to the
*imagePoints* vector to collect all of the equations into a single container. Finally, for
visualization feedback purposes we will draw the found points on the input image using @ref
cv::findChessboardCorners function.
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp pattern_found
-# **Show state and result to the user, plus command line control of the application**
This part shows text output on the image.
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp output_text
If we ran calibration and got camera's matrix with the distortion coefficients we may want to
correct the image using @ref cv::undistort function:
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp output_undistorted
Then we show the image and wait for an input key and if this is *u* we toggle the distortion removal,
if it is *g* we start again the detection process, and finally for the *ESC* key we quit the application:
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp await_input
-# **Show the distortion removal for the images too**
When you work with an image list it is not
possible to remove the distortion inside the loop. Therefore, you must do this after the loop.
Taking advantage of this now I'll expand the @ref cv::undistort function, which is in fact first
calls @ref cv::initUndistortRectifyMap to find transformation matrices and then performs
transformation using @ref cv::remap function. Because, after successful calibration map
calculation needs to be done only once, by using this expanded form you may speed up your
application:
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp show_results
The calibration and save
------------------------
Because the calibration needs to be done only once per camera, it makes sense to save it after a
successful calibration. This way later on you can just load these values into your program. Due to
this we first make the calibration, and if it succeeds we save the result into an OpenCV style XML
or YAML file, depending on the extension you give in the configuration file.
Therefore in the first function we just split up these two processes. Because we want to save many
of the calibration variables we'll create these variables here and pass on both of them to the
calibration and saving function. Again, I'll not show the saving part as that has little in common
with the calibration. Explore the source file in order to find out how and what:
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp run_and_save
We do the calibration with the help of the @ref cv::calibrateCameraRO function. It has the following
parameters:
- The object points. This is a vector of *Point3f* vector that for each input image describes how
should the pattern look. If we have a planar pattern (like a chessboard) then we can simply set
all Z coordinates to zero. This is a collection of the points where these important points are
present. Because, we use a single pattern for all the input images we can calculate this just
once and multiply it for all the other input views. We calculate the corner points with the
*calcBoardCornerPositions* function as:
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp board_corners
And then multiply it as:
@code{.cpp}
vector<vector<Point3f> > objectPoints(1);
calcBoardCornerPositions(s.boardSize, s.squareSize, objectPoints[0], s.calibrationPattern);
objectPoints[0][s.boardSize.width - 1].x = objectPoints[0][0].x + grid_width;
newObjPoints = objectPoints[0];
objectPoints.resize(imagePoints.size(),objectPoints[0]);
@endcode
@note If your calibration board is inaccurate, unmeasured, roughly planar targets (Checkerboard
patterns on paper using off-the-shelf printers are the most convenient calibration targets and
most of them are not accurate enough.), a method from @cite strobl2011iccv can be utilized to
dramatically improve the accuracies of the estimated camera intrinsic parameters. This new
calibration method will be called if command line parameter `-d=<number>` is provided. In the
above code snippet, `grid_width` is actually the value set by `-d=<number>`. It's the measured
distance between top-left (0, 0, 0) and top-right (s.squareSize*(s.boardSize.width-1), 0, 0)
corners of the pattern grid points. It should be measured precisely with rulers or vernier calipers.
After calibration, newObjPoints will be updated with refined 3D coordinates of object points.
- The image points. This is a vector of *Point2f* vector which for each input image contains
coordinates of the important points (corners for chessboard and centers of the circles for the
circle pattern). We have already collected this from @ref cv::findChessboardCorners or @ref
cv::findCirclesGrid function. We just need to pass it on.
- The size of the image acquired from the camera, video file or the images.
- The index of the object point to be fixed. We set it to -1 to request standard calibration method.
If the new object-releasing method to be used, set it to the index of the top-right corner point
of the calibration board grid. See cv::calibrateCameraRO for detailed explanation.
@code{.cpp}
int iFixedPoint = -1;
if (release_object)
iFixedPoint = s.boardSize.width - 1;
@endcode
- The camera matrix. If we used the fixed aspect ratio option we need to set \f$f_x\f$:
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp fixed_aspect
- The distortion coefficient matrix. Initialize with zero.
@code{.cpp}
distCoeffs = Mat::zeros(8, 1, CV_64F);
@endcode
- For all the views the function will calculate rotation and translation vectors which transform
the object points (given in the model coordinate space) to the image points (given in the world
coordinate space). The 7-th and 8-th parameters are the output vector of matrices containing in
the i-th position the rotation and translation vector for the i-th object point to the i-th
image point.
- The updated output vector of calibration pattern points. This parameter is ignored with standard
calibration method.
- The final argument is the flag. You need to specify here options like fix the aspect ratio for
the focal length, assume zero tangential distortion or to fix the principal point. Here we use
CALIB_USE_LU to get faster calibration speed.
@code{.cpp}
rms = calibrateCameraRO(objectPoints, imagePoints, imageSize, iFixedPoint,
cameraMatrix, distCoeffs, rvecs, tvecs, newObjPoints,
s.flag | CALIB_USE_LU);
@endcode
- The function returns the average re-projection error. This number gives a good estimation of
precision of the found parameters. This should be as close to zero as possible. Given the
intrinsic, distortion, rotation and translation matrices we may calculate the error for one view
by using the @ref cv::projectPoints to first transform the object point to image point. Then we
calculate the absolute norm between what we got with our transformation and the corner/circle
finding algorithm. To find the average error we calculate the arithmetical mean of the errors
calculated for all the calibration images.
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp compute_errors
Results
-------
Let there be [this input chessboard pattern ](pattern.png) which has a size of 9 X 6. I've used an
AXIS IP camera to create a couple of snapshots of the board and saved it into VID5 directory. I've
put this inside the `images/CameraCalibration` folder of my working directory and created the
following `VID5.XML` file that describes which images to use:
@code{.xml}
<?xml version="1.0"?>
<opencv_storage>
<images>
images/CameraCalibration/VID5/xx1.jpg
images/CameraCalibration/VID5/xx2.jpg
images/CameraCalibration/VID5/xx3.jpg
images/CameraCalibration/VID5/xx4.jpg
images/CameraCalibration/VID5/xx5.jpg
images/CameraCalibration/VID5/xx6.jpg
images/CameraCalibration/VID5/xx7.jpg
images/CameraCalibration/VID5/xx8.jpg
</images>
</opencv_storage>
@endcode
Then passed `images/CameraCalibration/VID5/VID5.XML` as an input in the configuration file. Here's a
chessboard pattern found during the runtime of the application:
![](images/fileListImage.jpg)
After applying the distortion removal we get:
![](images/fileListImageUnDist.jpg)
The same works for [this asymmetrical circle pattern ](acircles_pattern.png) by setting the input
width to 4 and height to 11. This time I've used a live camera feed by specifying its ID ("1") for
the input. Here's, how a detected pattern should look:
![](images/asymetricalPattern.jpg)
In both cases in the specified output XML/YAML file you'll find the camera and distortion
coefficients matrices:
@code{.xml}
<camera_matrix type_id="opencv-matrix">
<rows>3</rows>
<cols>3</cols>
<dt>d</dt>
<data>
6.5746697944293521e+002 0. 3.1950000000000000e+002 0.
6.5746697944293521e+002 2.3950000000000000e+002 0. 0. 1.</data></camera_matrix>
<distortion_coefficients type_id="opencv-matrix">
<rows>5</rows>
<cols>1</cols>
<dt>d</dt>
<data>
-4.1802327176423804e-001 5.0715244063187526e-001 0. 0.
-5.7843597214487474e-001</data></distortion_coefficients>
@endcode
Add these values as constants to your program, call the @ref cv::initUndistortRectifyMap and the
@ref cv::remap function to remove distortion and enjoy distortion free inputs for cheap and low
quality cameras.
You may observe a runtime instance of this on the [YouTube
here](https://www.youtube.com/watch?v=ViPN810E0SU).
@youtube{ViPN810E0SU}

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

View File

@@ -0,0 +1,38 @@
Create calibration pattern {#tutorial_camera_calibration_pattern}
=========================================
The goal of this tutorial is to learn how to create calibration pattern.
You can find a chessboard pattern in https://github.com/opencv/opencv/blob/master/doc/pattern.png
You can find a circleboard pattern in https://github.com/opencv/opencv/blob/master/doc/acircles_pattern.png
Create your own pattern
---------------
Now, if you want to create your own pattern, you will need python to use https://github.com/opencv/opencv/blob/master/doc/pattern_tools/gen_pattern.py
Example
create a checkerboard pattern in file chessboard.svg with 9 rows, 6 columns and a square size of 20mm:
python gen_pattern.py -o chessboard.svg --rows 9 --columns 6 --type checkerboard --square_size 20
create a circle board pattern in file circleboard.svg with 7 rows, 5 columns and a radius of 15mm:
python gen_pattern.py -o circleboard.svg --rows 7 --columns 5 --type circles --square_size 15
create a circle board pattern in file acircleboard.svg with 7 rows, 5 columns and a square size of 10mm and less spacing between circle:
python gen_pattern.py -o acircleboard.svg --rows 7 --columns 5 --type acircles --square_size 10 --radius_rate 2
If you want to change unit use -u option (mm inches, px, m)
If you want to change page size use -w and -h options
@cond HAVE_opencv_aruco
If you want to create a ChArUco board read @ref tutorial_charuco_detection "tutorial Detection of ChArUco Corners" in opencv_contrib tutorial.
@endcond
@cond !HAVE_opencv_aruco
If you want to create a ChArUco board read tutorial Detection of ChArUco Corners in opencv_contrib tutorial.
@endcond

View File

@@ -0,0 +1,55 @@
Camera calibration with square chessboard {#tutorial_camera_calibration_square_chess}
=========================================
The goal of this tutorial is to learn how to calibrate a camera given a set of chessboard images.
*Test data*: use images in your data/chess folder.
- Compile OpenCV with samples by setting BUILD_EXAMPLES to ON in cmake configuration.
- Go to bin folder and use imagelist_creator to create an XML/YAML list of your images.
- Then, run calibration sample to get camera parameters. Use square size equal to 3cm.
Pose estimation
---------------
Now, let us write code that detects a chessboard in an image and finds its distance from the
camera. You can apply this method to any object with known 3D geometry; which you detect in an
image.
*Test data*: use chess_test\*.jpg images from your data folder.
- Create an empty console project. Load a test image :
Mat img = imread(argv[1], IMREAD_GRAYSCALE);
- Detect a chessboard in this image using findChessboard function :
bool found = findChessboardCorners( img, boardSize, ptvec, CALIB_CB_ADAPTIVE_THRESH );
- Now, write a function that generates a vector\<Point3f\> array of 3d coordinates of a chessboard
in any coordinate system. For simplicity, let us choose a system such that one of the chessboard
corners is in the origin and the board is in the plane *z = 0*
- Read camera parameters from XML/YAML file :
FileStorage fs( filename, FileStorage::READ );
Mat intrinsics, distortion;
fs["camera_matrix"] >> intrinsics;
fs["distortion_coefficients"] >> distortion;
- Now we are ready to find a chessboard pose by running \`solvePnP\` :
vector<Point3f> boardPoints;
// fill the array
...
solvePnP(Mat(boardPoints), Mat(foundBoardCorners), cameraMatrix,
distCoeffs, rvec, tvec, false);
- Calculate reprojection error like it is done in calibration sample (see
opencv/samples/cpp/calibration.cpp, function computeReprojectionErrors).
Question: how would you calculate distance from the camera origin to any one of the corners?
Answer: As our image lies in a 3D space, firstly we would calculate the relative camera pose. This would give us 3D to 2D correspondences. Next, we can apply a simple L2 norm to calculate distance between any point (end point for corners).

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

View File

@@ -0,0 +1,198 @@
Interactive camera calibration application {#tutorial_interactive_calibration}
==============================
According to classical calibration technique user must collect all data first and when run @ref cv::calibrateCamera function
to obtain camera parameters. If average re-projection error is huge or if estimated parameters seems to be wrong, process of
selection or collecting data and starting of @ref cv::calibrateCamera repeats.
Interactive calibration process assumes that after each new data portion user can see results and errors estimation, also
he can delete last data portion and finally, when dataset for calibration is big enough starts process of auto data selection.
Main application features
------
The sample application will:
- Determine the distortion matrix and confidence interval for each element
- Determine the camera matrix and confidence interval for each element
- Take input from camera or video file
- Read configuration from XML file
- Save the results into XML file
- Calculate re-projection error
- Reject patterns views on sharp angles to prevent appear of ill-conditioned jacobian blocks
- Auto switch calibration flags (fix aspect ratio and elements of distortion matrix if needed)
- Auto detect when calibration is done by using several criteria
- Auto capture of static patterns (user doesn't need press any keys to capture frame, just don't move pattern for a second)
Supported patterns:
- Black-white chessboard
- Asymmetrical circle pattern
- Dual asymmetrical circle pattern
- chAruco (chessboard with Aruco markers)
Description of parameters
------
Application has two groups of parameters: primary (passed through command line) and advances (passed through XML file).
### Primary parameters:
All of this parameters are passed to application through a command line.
-[parameter]=[default value]: description
- -v=[filename]: get video from filename, default input -- camera with id=0
- -ci=[0]: get video from camera with specified id
- -flip=[false]: vertical flip of input frames
- -t=[circles]: pattern for calibration (circles, chessboard, dualCircles, chAruco)
- -sz=[16.3]: distance between two nearest centers of circles or squares on calibration board
- -dst=[295] distance between white and black parts of daulCircles pattern
- -w=[width]: width of pattern (in corners or circles)
- -h=[height]: height of pattern (in corners or circles)
- -of=[camParams.xml]: output file name
- -ft=[true]: auto tuning of calibration flags
- -vis=[grid]: captured boards visualization (grid, window)
- -d=[0.8]: delay between captures in seconds
- -pf=[defaultConfig.xml]: advanced application parameters file
### Advanced parameters:
By default values of advanced parameters are stored in defaultConfig.xml
@code{.xml}
<?xml version="1.0"?>
<opencv_storage>
<charuco_dict>0</charuco_dict>
<charuco_square_lenght>200</charuco_square_lenght>
<charuco_marker_size>100</charuco_marker_size>
<calibration_step>1</calibration_step>
<max_frames_num>30</max_frames_num>
<min_frames_num>10</min_frames_num>
<solver_eps>1e-7</solver_eps>
<solver_max_iters>30</solver_max_iters>
<fast_solver>0</fast_solver>
<frame_filter_conv_param>0.1</frame_filter_conv_param>
<camera_resolution>1280 720</camera_resolution>
</opencv_storage>
@endcode
- *charuco_dict*: name of special dictionary, which has been used for generation of chAruco pattern
- *charuco_square_lenght*: size of square on chAruco board (in pixels)
- *charuco_marker_size*: size of Aruco markers on chAruco board (in pixels)
- *calibration_step*: interval in frames between launches of @ref cv::calibrateCamera
- *max_frames_num*: if number of frames for calibration is greater then this value frames filter starts working.
After filtration size of calibration dataset is equals to *max_frames_num*
- *min_frames_num*: if number of frames is greater then this value turns on auto flags tuning, undistorted view and quality evaluation
- *solver_eps*: precision of Levenberg-Marquardt solver in @ref cv::calibrateCamera
- *solver_max_iters*: iterations limit of solver
- *fast_solver*: if this value is nonzero and Lapack is found QR decomposition is used instead of SVD in solver.
QR faster than SVD, but potentially less precise
- *frame_filter_conv_param*: parameter which used in linear convolution of bicriterial frames filter
- *camera_resolution*: resolution of camera which is used for calibration
**Note:** *charuco_dict*, *charuco_square_lenght* and *charuco_marker_size* are used for chAruco pattern generation
(see Aruco module description for details: [Aruco tutorials](https://github.com/opencv/opencv_contrib/tree/master/modules/aruco/tutorials))
Default chAruco pattern:
![](images/charuco_board.png)
Dual circles pattern
------
To make this pattern you need standard OpenCV circles pattern and binary inverted one.
Place two patterns on one plane in order when all horizontal lines of circles in one pattern are
continuations of similar lines in another.
Measure distance between patterns as shown at picture below pass it as **dst** command line parameter. Also measure distance between centers of nearest circles and pass
this value as **sz** command line parameter.
![](images/dualCircles.jpg)
This pattern is very sensitive to quality of production and measurements.
Data filtration
------
When size of calibration dataset is greater then *max_frames_num* starts working
data filter. It tries to remove "bad" frames from dataset. Filter removes the frame
on which \f$loss\_function\f$ takes maximum.
\f[loss\_function(i)=\alpha RMS(i)+(1-\alpha)reducedGridQuality(i)\f]
**RMS** is an average re-projection error calculated for frame *i*, **reducedGridQuality**
is scene coverage quality evaluation without frame *i*. \f$\alpha\f$ is equals to
**frame_filter_conv_param**.
Calibration process
------
To start calibration just run application. Place pattern ahead the camera and fixate pattern in some pose.
After that wait for capturing (will be shown message like "Frame #i captured").
Current focal distance and re-projection error will be shown at the main screen. Move pattern to the next position and repeat procedure. Try to cover image plane
uniformly and don't show pattern on sharp angles to the image plane.
![](images/screen_charuco.jpg)
If calibration seems to be successful (confidence intervals and average re-projection
error are small, frame coverage quality and number of pattern views are big enough)
application will show a message like on screen below.
![](images/screen_finish.jpg)
Hot keys:
- Esc -- exit application
- s -- save current data to XML file
- r -- delete last frame
- d -- delete all frames
- u -- enable/disable applying of undistortion
- v -- switch visualization mode
Results
------
As result you will get camera parameters and confidence intervals for them.
Example of output XML file:
@code{.xml}
<?xml version="1.0"?>
<opencv_storage>
<calibrationDate>"Thu 07 Apr 2016 04:23:03 PM MSK"</calibrationDate>
<framesCount>21</framesCount>
<cameraResolution>
1280 720</cameraResolution>
<cameraMatrix type_id="opencv-matrix">
<rows>3</rows>
<cols>3</cols>
<dt>d</dt>
<data>
1.2519588293098975e+03 0. 6.6684948780852471e+02 0.
1.2519588293098975e+03 3.6298123112613683e+02 0. 0. 1.</data></cameraMatrix>
<cameraMatrix_std_dev type_id="opencv-matrix">
<rows>4</rows>
<cols>1</cols>
<dt>d</dt>
<data>
0. 1.2887048808572649e+01 2.8536856683866230e+00
2.8341737483430314e+00</data></cameraMatrix_std_dev>
<dist_coeffs type_id="opencv-matrix">
<rows>1</rows>
<cols>5</cols>
<dt>d</dt>
<data>
1.3569117181595716e-01 -8.2513063822554633e-01 0. 0.
1.6412101575010554e+00</data></dist_coeffs>
<dist_coeffs_std_dev type_id="opencv-matrix">
<rows>5</rows>
<cols>1</cols>
<dt>d</dt>
<data>
1.5570675523402111e-02 8.7229075437543435e-02 0. 0.
1.8382427901856876e-01</data></dist_coeffs_std_dev>
<avg_reprojection_error>4.2691743074130178e-01</avg_reprojection_error>
</opencv_storage>
@endcode

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

View File

@@ -0,0 +1,795 @@
Real Time pose estimation of a textured object {#tutorial_real_time_pose}
==============================================
Nowadays, augmented reality is one of the top research topic in computer vision and robotics fields.
The most elemental problem in augmented reality is the estimation of the camera pose respect of an
object in the case of computer vision area to do later some 3D rendering or in the case of robotics
obtain an object pose in order to grasp it and do some manipulation. However, this is not a trivial
problem to solve due to the fact that the most common issue in image processing is the computational
cost of applying a lot of algorithms or mathematical operations for solving a problem which is basic
and immediately for humans.
Goal
----
In this tutorial is explained how to build a real time application to estimate the camera pose in
order to track a textured object with six degrees of freedom given a 2D image and its 3D textured
model.
The application will have the following parts:
- Read 3D textured object model and object mesh.
- Take input from Camera or Video.
- Extract ORB features and descriptors from the scene.
- Match scene descriptors with model descriptors using Flann matcher.
- Pose estimation using PnP + Ransac.
- Linear Kalman Filter for bad poses rejection.
Theory
------
In computer vision estimate the camera pose from *n* 3D-to-2D point correspondences is a fundamental
and well understood problem. The most general version of the problem requires estimating the six
degrees of freedom of the pose and five calibration parameters: focal length, principal point,
aspect ratio and skew. It could be established with a minimum of 6 correspondences, using the well
known Direct Linear Transform (DLT) algorithm. There are, though, several simplifications to the
problem which turn into an extensive list of different algorithms that improve the accuracy of the
DLT.
The most common simplification is to assume known calibration parameters which is the so-called
Perspective-*n*-Point problem:
![](images/pnp.jpg)
**Problem Formulation:** Given a set of correspondences between 3D points \f$p_i\f$ expressed in a world
reference frame, and their 2D projections \f$u_i\f$ onto the image, we seek to retrieve the pose (\f$R\f$
and \f$t\f$) of the camera w.r.t. the world and the focal length \f$f\f$.
OpenCV provides four different approaches to solve the Perspective-*n*-Point problem which return
\f$R\f$ and \f$t\f$. Then, using the following formula it's possible to project 3D points into the image
plane:
\f[s\ \left [ \begin{matrix} u \\ v \\ 1 \end{matrix} \right ] = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix} r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\ r_{31} & r_{32} & r_{33} & t_3 \end{matrix} \right ] \left [ \begin{matrix} X \\ Y \\ Z\\ 1 \end{matrix} \right ]\f]
The complete documentation of how to manage with this equations is in @ref calib3d .
Source code
-----------
You can find the source code of this tutorial in the
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/` folder of the OpenCV source library.
The tutorial consists of two main programs:
-# **Model registration**
This application is exclusive to whom don't have a 3D textured model of the object to be detected.
You can use this program to create your own textured 3D model. This program only works for planar
objects, then if you want to model an object with complex shape you should use a sophisticated
software to create it.
The application needs an input image of the object to be registered and its 3D mesh. We have also
to provide the intrinsic parameters of the camera with which the input image was taken. All the
files need to be specified using the absolute path or the relative one from your applications
working directory. If none files are specified the program will try to open the provided default
parameters.
The application starts up extracting the ORB features and descriptors from the input image and
then uses the mesh along with the [MöllerTrumbore intersection
algorithm](http://http://en.wikipedia.org/wiki/M%C3%B6ller%E2%80%93Trumbore_intersection_algorithm/)
to compute the 3D coordinates of the found features. Finally, the 3D points and the descriptors
are stored in different lists in a file with YAML format which each row is a different point. The
technical background on how to store the files can be found in the @ref tutorial_file_input_output_with_xml_yml
tutorial.
![](images/registration.png)
-# **Model detection**
The aim of this application is estimate in real time the object pose given its 3D textured model.
The application starts up loading the 3D textured model in YAML file format with the same
structure explained in the model registration program. From the scene, the ORB features and
descriptors are detected and extracted. Then, is used @ref cv::FlannBasedMatcher with
@ref cv::flann::GenericIndex to do the matching between the scene descriptors and the model descriptors.
Using the found matches along with @ref cv::solvePnPRansac function the `R` and `t` of
the camera are computed. Finally, a KalmanFilter is applied in order to reject bad poses.
In the case that you compiled OpenCV with the samples, you can find it in opencv/build/bin/cpp-tutorial-pnp_detection\`.
Then you can run the application and change some parameters:
@code{.cpp}
This program shows how to detect an object given its 3D textured model. You can choose to use a recorded video or the webcam.
Usage:
./cpp-tutorial-pnp_detection -help
Keys:
'esc' - to quit.
--------------------------------------------------------------------------
Usage: cpp-tutorial-pnp_detection [params]
-c, --confidence (value:0.95)
RANSAC confidence
-e, --error (value:2.0)
RANSAC reprojection error
-f, --fast (value:true)
use of robust fast match
-h, --help (value:true)
print this message
--in, --inliers (value:30)
minimum inliers for Kalman update
--it, --iterations (value:500)
RANSAC maximum iterations count
-k, --keypoints (value:2000)
number of keypoints to detect
--mesh
path to ply mesh
--method, --pnp (value:0)
PnP method: (0) ITERATIVE - (1) EPNP - (2) P3P - (3) DLS
--model
path to yml model
-r, --ratio (value:0.7)
threshold for ratio test
-v, --video
path to recorded video
@endcode
For example, you can run the application changing the pnp method:
@code{.cpp}
./cpp-tutorial-pnp_detection --method=2
@endcode
Explanation
-----------
Here is explained in detail the code for the real time application:
-# **Read 3D textured object model and object mesh.**
In order to load the textured model I implemented the *class* **Model** which has the function
*load()* that opens a YAML file and take the stored 3D points with its corresponding descriptors.
You can find an example of a 3D textured model in
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/cookies_ORB.yml`.
@code{.cpp}
/* Load a YAML file using OpenCV */
void Model::load(const std::string path)
{
cv::Mat points3d_mat;
cv::FileStorage storage(path, cv::FileStorage::READ);
storage["points_3d"] >> points3d_mat;
storage["descriptors"] >> descriptors_;
points3d_mat.copyTo(list_points3d_in_);
storage.release();
}
@endcode
In the main program the model is loaded as follows:
@code{.cpp}
Model model; // instantiate Model object
model.load(yml_read_path); // load a 3D textured object model
@endcode
In order to read the model mesh I implemented a *class* **Mesh** which has a function *load()*
that opens a \f$*\f$.ply file and store the 3D points of the object and also the composed triangles.
You can find an example of a model mesh in
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.ply`.
@code{.cpp}
/* Load a CSV with *.ply format */
void Mesh::load(const std::string path)
{
// Create the reader
CsvReader csvReader(path);
// Clear previous data
list_vertex_.clear();
list_triangles_.clear();
// Read from .ply file
csvReader.readPLY(list_vertex_, list_triangles_);
// Update mesh attributes
num_vertexs_ = list_vertex_.size();
num_triangles_ = list_triangles_.size();
}
@endcode
In the main program the mesh is loaded as follows:
@code{.cpp}
Mesh mesh; // instantiate Mesh object
mesh.load(ply_read_path); // load an object mesh
@endcode
You can also load different model and mesh:
@code{.cpp}
./cpp-tutorial-pnp_detection --mesh=/absolute_path_to_your_mesh.ply --model=/absolute_path_to_your_model.yml
@endcode
-# **Take input from Camera or Video**
To detect is necessary capture video. It's done loading a recorded video by passing the absolute
path where it is located in your machine. In order to test the application you can find a recorded
video in `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.mp4`.
@code{.cpp}
cv::VideoCapture cap; // instantiate VideoCapture
cap.open(video_read_path); // open a recorded video
if(!cap.isOpened()) // check if we succeeded
{
std::cout << "Could not open the camera device" << std::endl;
return -1;
}
@endcode
Then the algorithm is computed frame per frame:
@code{.cpp}
cv::Mat frame, frame_vis;
while(cap.read(frame) && cv::waitKey(30) != 27) // capture frame until ESC is pressed
{
frame_vis = frame.clone(); // refresh visualisation frame
// MAIN ALGORITHM
}
@endcode
You can also load different recorded video:
@code{.cpp}
./cpp-tutorial-pnp_detection --video=/absolute_path_to_your_video.mp4
@endcode
-# **Extract ORB features and descriptors from the scene**
The next step is to detect the scene features and extract it descriptors. For this task I
implemented a *class* **RobustMatcher** which has a function for keypoints detection and features
extraction. You can find it in
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobusMatcher.cpp`. In your
*RobusMatch* object you can use any of the 2D features detectors of OpenCV. In this case I used
@ref cv::ORB features because is based on @ref cv::FAST to detect the keypoints and cv::xfeatures2d::BriefDescriptorExtractor
to extract the descriptors which means that is fast and robust to rotations. You can find more
detailed information about *ORB* in the documentation.
The following code is how to instantiate and set the features detector and the descriptors
extractor:
@code{.cpp}
RobustMatcher rmatcher; // instantiate RobustMatcher
cv::FeatureDetector * detector = new cv::OrbFeatureDetector(numKeyPoints); // instantiate ORB feature detector
cv::DescriptorExtractor * extractor = new cv::OrbDescriptorExtractor(); // instantiate ORB descriptor extractor
rmatcher.setFeatureDetector(detector); // set feature detector
rmatcher.setDescriptorExtractor(extractor); // set descriptor extractor
@endcode
The features and descriptors will be computed by the *RobustMatcher* inside the matching function.
-# **Match scene descriptors with model descriptors using Flann matcher**
It is the first step in our detection algorithm. The main idea is to match the scene descriptors
with our model descriptors in order to know the 3D coordinates of the found features into the
current scene.
Firstly, we have to set which matcher we want to use. In this case is used
@ref cv::FlannBasedMatcher matcher which in terms of computational cost is faster than the
@ref cv::BFMatcher matcher as we increase the trained collection of features. Then, for
FlannBased matcher the index created is *Multi-Probe LSH: Efficient Indexing for High-Dimensional
Similarity Search* due to *ORB* descriptors are binary.
You can tune the *LSH* and search parameters to improve the matching efficiency:
@code{.cpp}
cv::Ptr<cv::flann::IndexParams> indexParams = cv::makePtr<cv::flann::LshIndexParams>(6, 12, 1); // instantiate LSH index parameters
cv::Ptr<cv::flann::SearchParams> searchParams = cv::makePtr<cv::flann::SearchParams>(50); // instantiate flann search parameters
cv::DescriptorMatcher * matcher = new cv::FlannBasedMatcher(indexParams, searchParams); // instantiate FlannBased matcher
rmatcher.setDescriptorMatcher(matcher); // set matcher
@endcode
Secondly, we have to call the matcher by using *robustMatch()* or *fastRobustMatch()* function.
The difference of using this two functions is its computational cost. The first method is slower
but more robust at filtering good matches because uses two ratio test and a symmetry test. In
contrast, the second method is faster but less robust because only applies a single ratio test to
the matches.
The following code is to get the model 3D points and its descriptors and then call the matcher in
the main program:
@code{.cpp}
// Get the MODEL INFO
std::vector<cv::Point3f> list_points3d_model = model.get_points3d(); // list with model 3D coordinates
cv::Mat descriptors_model = model.get_descriptors(); // list with descriptors of each 3D coordinate
@endcode
@code{.cpp}
// -- Step 1: Robust matching between model descriptors and scene descriptors
std::vector<cv::DMatch> good_matches; // to obtain the model 3D points in the scene
std::vector<cv::KeyPoint> keypoints_scene; // to obtain the 2D points of the scene
if(fast_match)
{
rmatcher.fastRobustMatch(frame, good_matches, keypoints_scene, descriptors_model);
}
else
{
rmatcher.robustMatch(frame, good_matches, keypoints_scene, descriptors_model);
}
@endcode
The following code corresponds to the *robustMatch()* function which belongs to the
*RobustMatcher* class. This function uses the given image to detect the keypoints and extract the
descriptors, match using *two Nearest Neighbour* the extracted descriptors with the given model
descriptors and vice versa. Then, a ratio test is applied to the two direction matches in order to
remove these matches which its distance ratio between the first and second best match is larger
than a given threshold. Finally, a symmetry test is applied in order the remove non symmetrical
matches.
@code{.cpp}
void RobustMatcher::robustMatch( const cv::Mat& frame, std::vector<cv::DMatch>& good_matches,
std::vector<cv::KeyPoint>& keypoints_frame,
const std::vector<cv::KeyPoint>& keypoints_model, const cv::Mat& descriptors_model )
{
// 1a. Detection of the ORB features
this->computeKeyPoints(frame, keypoints_frame);
// 1b. Extraction of the ORB descriptors
cv::Mat descriptors_frame;
this->computeDescriptors(frame, keypoints_frame, descriptors_frame);
// 2. Match the two image descriptors
std::vector<std::vector<cv::DMatch> > matches12, matches21;
// 2a. From image 1 to image 2
matcher_->knnMatch(descriptors_frame, descriptors_model, matches12, 2); // return 2 nearest neighbours
// 2b. From image 2 to image 1
matcher_->knnMatch(descriptors_model, descriptors_frame, matches21, 2); // return 2 nearest neighbours
// 3. Remove matches for which NN ratio is > than threshold
// clean image 1 -> image 2 matches
int removed1 = ratioTest(matches12);
// clean image 2 -> image 1 matches
int removed2 = ratioTest(matches21);
// 4. Remove non-symmetrical matches
symmetryTest(matches12, matches21, good_matches);
}
@endcode
After the matches filtering we have to subtract the 2D and 3D correspondences from the found scene
keypoints and our 3D model using the obtained *DMatches* vector. For more information about
@ref cv::DMatch check the documentation.
@code{.cpp}
// -- Step 2: Find out the 2D/3D correspondences
std::vector<cv::Point3f> list_points3d_model_match; // container for the model 3D coordinates found in the scene
std::vector<cv::Point2f> list_points2d_scene_match; // container for the model 2D coordinates found in the scene
for(unsigned int match_index = 0; match_index < good_matches.size(); ++match_index)
{
cv::Point3f point3d_model = list_points3d_model[ good_matches[match_index].trainIdx ]; // 3D point from model
cv::Point2f point2d_scene = keypoints_scene[ good_matches[match_index].queryIdx ].pt; // 2D point from the scene
list_points3d_model_match.push_back(point3d_model); // add 3D point
list_points2d_scene_match.push_back(point2d_scene); // add 2D point
}
@endcode
You can also change the ratio test threshold, the number of keypoints to detect as well as use or
not the robust matcher:
@code{.cpp}
./cpp-tutorial-pnp_detection --ratio=0.8 --keypoints=1000 --fast=false
@endcode
-# **Pose estimation using PnP + Ransac**
Once with the 2D and 3D correspondences we have to apply a PnP algorithm in order to estimate the
camera pose. The reason why we have to use @ref cv::solvePnPRansac instead of @ref cv::solvePnP is
due to the fact that after the matching not all the found correspondences are correct and, as like
as not, there are false correspondences or also called *outliers*. The [Random Sample
Consensus](http://en.wikipedia.org/wiki/RANSAC) or *Ransac* is a non-deterministic iterative
method which estimate parameters of a mathematical model from observed data producing an
approximate result as the number of iterations increase. After appyling *Ransac* all the *outliers*
will be eliminated to then estimate the camera pose with a certain probability to obtain a good
solution.
For the camera pose estimation I have implemented a *class* **PnPProblem**. This *class* has 4
attributes: a given calibration matrix, the rotation matrix, the translation matrix and the
rotation-translation matrix. The intrinsic calibration parameters of the camera which you are
using to estimate the pose are necessary. In order to obtain the parameters you can check
@ref tutorial_camera_calibration_square_chess and @ref tutorial_camera_calibration tutorials.
The following code is how to declare the *PnPProblem class* in the main program:
@code{.cpp}
// Intrinsic camera parameters: UVC WEBCAM
double f = 55; // focal length in mm
double sx = 22.3, sy = 14.9; // sensor size
double width = 640, height = 480; // image size
double params_WEBCAM[] = { width*f/sx, // fx
height*f/sy, // fy
width/2, // cx
height/2}; // cy
PnPProblem pnp_detection(params_WEBCAM); // instantiate PnPProblem class
@endcode
The following code is how the *PnPProblem class* initialises its attributes:
@code{.cpp}
// Custom constructor given the intrinsic camera parameters
PnPProblem::PnPProblem(const double params[])
{
_A_matrix = cv::Mat::zeros(3, 3, CV_64FC1); // intrinsic camera parameters
_A_matrix.at<double>(0, 0) = params[0]; // [ fx 0 cx ]
_A_matrix.at<double>(1, 1) = params[1]; // [ 0 fy cy ]
_A_matrix.at<double>(0, 2) = params[2]; // [ 0 0 1 ]
_A_matrix.at<double>(1, 2) = params[3];
_A_matrix.at<double>(2, 2) = 1;
_R_matrix = cv::Mat::zeros(3, 3, CV_64FC1); // rotation matrix
_t_matrix = cv::Mat::zeros(3, 1, CV_64FC1); // translation matrix
_P_matrix = cv::Mat::zeros(3, 4, CV_64FC1); // rotation-translation matrix
}
@endcode
OpenCV provides four PnP methods: ITERATIVE, EPNP, P3P and DLS. Depending on the application type,
the estimation method will be different. In the case that we want to make a real time application,
the more suitable methods are EPNP and P3P since they are faster than ITERATIVE and DLS at
finding an optimal solution. However, EPNP and P3P are not especially robust in front of planar
surfaces and sometimes the pose estimation seems to have a mirror effect. Therefore, in this
tutorial an ITERATIVE method is used due to the object to be detected has planar surfaces.
The OpenCV RANSAC implementation wants you to provide three parameters: 1) the maximum number of
iterations until the algorithm stops, 2) the maximum allowed distance between the observed and
computed point projections to consider it an inlier and 3) the confidence to obtain a good result.
You can tune these parameters in order to improve your algorithm performance. Increasing the
number of iterations will have a more accurate solution, but will take more time to find a
solution. Increasing the reprojection error will reduce the computation time, but your solution
will be unaccurate. Decreasing the confidence your algorithm will be faster, but the obtained
solution will be unaccurate.
The following parameters work for this application:
@code{.cpp}
// RANSAC parameters
int iterationsCount = 500; // number of Ransac iterations.
float reprojectionError = 2.0; // maximum allowed distance to consider it an inlier.
float confidence = 0.95; // RANSAC successful confidence.
@endcode
The following code corresponds to the *estimatePoseRANSAC()* function which belongs to the
*PnPProblem class*. This function estimates the rotation and translation matrix given a set of
2D/3D correspondences, the desired PnP method to use, the output inliers container and the Ransac
parameters:
@code{.cpp}
// Estimate the pose given a list of 2D/3D correspondences with RANSAC and the method to use
void PnPProblem::estimatePoseRANSAC( const std::vector<cv::Point3f> &list_points3d, // list with model 3D coordinates
const std::vector<cv::Point2f> &list_points2d, // list with scene 2D coordinates
int flags, cv::Mat &inliers, int iterationsCount, // PnP method; inliers container
float reprojectionError, float confidence ) // RANSAC parameters
{
cv::Mat distCoeffs = cv::Mat::zeros(4, 1, CV_64FC1); // vector of distortion coefficients
cv::Mat rvec = cv::Mat::zeros(3, 1, CV_64FC1); // output rotation vector
cv::Mat tvec = cv::Mat::zeros(3, 1, CV_64FC1); // output translation vector
bool useExtrinsicGuess = false; // if true the function uses the provided rvec and tvec values as
// initial approximations of the rotation and translation vectors
cv::solvePnPRansac( list_points3d, list_points2d, _A_matrix, distCoeffs, rvec, tvec,
useExtrinsicGuess, iterationsCount, reprojectionError, confidence,
inliers, flags );
Rodrigues(rvec,_R_matrix); // converts Rotation Vector to Matrix
_t_matrix = tvec; // set translation matrix
this->set_P_matrix(_R_matrix, _t_matrix); // set rotation-translation matrix
}
@endcode
In the following code are the 3th and 4th steps of the main algorithm. The first, calling the
above function and the second taking the output inliers vector from RANSAC to get the 2D scene
points for drawing purpose. As seen in the code we must be sure to apply RANSAC if we have
matches, in the other case, the function @ref cv::solvePnPRansac crashes due to any OpenCV *bug*.
@code{.cpp}
if(good_matches.size() > 0) // None matches, then RANSAC crashes
{
// -- Step 3: Estimate the pose using RANSAC approach
pnp_detection.estimatePoseRANSAC( list_points3d_model_match, list_points2d_scene_match,
pnpMethod, inliers_idx, iterationsCount, reprojectionError, confidence );
// -- Step 4: Catch the inliers keypoints to draw
for(int inliers_index = 0; inliers_index < inliers_idx.rows; ++inliers_index)
{
int n = inliers_idx.at<int>(inliers_index); // i-inlier
cv::Point2f point2d = list_points2d_scene_match[n]; // i-inlier point 2D
list_points2d_inliers.push_back(point2d); // add i-inlier to list
}
@endcode
Finally, once the camera pose has been estimated we can use the \f$R\f$ and \f$t\f$ in order to compute
the 2D projection onto the image of a given 3D point expressed in a world reference frame using
the showed formula on *Theory*.
The following code corresponds to the *backproject3DPoint()* function which belongs to the
*PnPProblem class*. The function backproject a given 3D point expressed in a world reference frame
onto a 2D image:
@code{.cpp}
// Backproject a 3D point to 2D using the estimated pose parameters
cv::Point2f PnPProblem::backproject3DPoint(const cv::Point3f &point3d)
{
// 3D point vector [x y z 1]'
cv::Mat point3d_vec = cv::Mat(4, 1, CV_64FC1);
point3d_vec.at<double>(0) = point3d.x;
point3d_vec.at<double>(1) = point3d.y;
point3d_vec.at<double>(2) = point3d.z;
point3d_vec.at<double>(3) = 1;
// 2D point vector [u v 1]'
cv::Mat point2d_vec = cv::Mat(4, 1, CV_64FC1);
point2d_vec = _A_matrix * _P_matrix * point3d_vec;
// Normalization of [u v]'
cv::Point2f point2d;
point2d.x = point2d_vec.at<double>(0) / point2d_vec.at<double>(2);
point2d.y = point2d_vec.at<double>(1) / point2d_vec.at<double>(2);
return point2d;
}
@endcode
The above function is used to compute all the 3D points of the object *Mesh* to show the pose of
the object.
You can also change RANSAC parameters and PnP method:
@code{.cpp}
./cpp-tutorial-pnp_detection --error=0.25 --confidence=0.90 --iterations=250 --method=3
@endcode
-# **Linear Kalman Filter for bad poses rejection**
Is it common in computer vision or robotics fields that after applying detection or tracking
techniques, bad results are obtained due to some sensor errors. In order to avoid these bad
detections in this tutorial is explained how to implement a Linear Kalman Filter. The Kalman
Filter will be applied after detected a given number of inliers.
You can find more information about what [Kalman
Filter](http://en.wikipedia.org/wiki/Kalman_filter) is. In this tutorial it's used the OpenCV
implementation of the @ref cv::KalmanFilter based on
[Linear Kalman Filter for position and orientation tracking](http://campar.in.tum.de/Chair/KalmanFilter)
to set the dynamics and measurement models.
Firstly, we have to define our state vector which will have 18 states: the positional data (x,y,z)
with its first and second derivatives (velocity and acceleration), then rotation is added in form
of three euler angles (roll, pitch, jaw) together with their first and second derivatives (angular
velocity and acceleration)
\f[X = (x,y,z,\dot x,\dot y,\dot z,\ddot x,\ddot y,\ddot z,\psi,\theta,\phi,\dot \psi,\dot \theta,\dot \phi,\ddot \psi,\ddot \theta,\ddot \phi)^T\f]
Secondly, we have to define the number of measurements which will be 6: from \f$R\f$ and \f$t\f$ we can
extract \f$(x,y,z)\f$ and \f$(\psi,\theta,\phi)\f$. In addition, we have to define the number of control
actions to apply to the system which in this case will be *zero*. Finally, we have to define the
differential time between measurements which in this case is \f$1/T\f$, where *T* is the frame rate of
the video.
@code{.cpp}
cv::KalmanFilter KF; // instantiate Kalman Filter
int nStates = 18; // the number of states
int nMeasurements = 6; // the number of measured states
int nInputs = 0; // the number of action control
double dt = 0.125; // time between measurements (1/FPS)
initKalmanFilter(KF, nStates, nMeasurements, nInputs, dt); // init function
@endcode
The following code corresponds to the *Kalman Filter* initialisation. Firstly, is set the process
noise, the measurement noise and the error covariance matrix. Secondly, are set the transition
matrix which is the dynamic model and finally the measurement matrix, which is the measurement
model.
You can tune the process and measurement noise to improve the *Kalman Filter* performance. As the
measurement noise is reduced the faster will converge doing the algorithm sensitive in front of
bad measurements.
@code{.cpp}
void initKalmanFilter(cv::KalmanFilter &KF, int nStates, int nMeasurements, int nInputs, double dt)
{
KF.init(nStates, nMeasurements, nInputs, CV_64F); // init Kalman Filter
cv::setIdentity(KF.processNoiseCov, cv::Scalar::all(1e-5)); // set process noise
cv::setIdentity(KF.measurementNoiseCov, cv::Scalar::all(1e-4)); // set measurement noise
cv::setIdentity(KF.errorCovPost, cv::Scalar::all(1)); // error covariance
/* DYNAMIC MODEL */
// [1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0 0 0]
// [0 1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0 0]
// [0 0 1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0]
// [0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0 0 0]
// [0 0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0 0]
// [0 0 0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0]
// [0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
// [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
// [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
// [0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2 0 0]
// [0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2 0]
// [0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2]
// [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0]
// [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0]
// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt]
// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0]
// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
// position
KF.transitionMatrix.at<double>(0,3) = dt;
KF.transitionMatrix.at<double>(1,4) = dt;
KF.transitionMatrix.at<double>(2,5) = dt;
KF.transitionMatrix.at<double>(3,6) = dt;
KF.transitionMatrix.at<double>(4,7) = dt;
KF.transitionMatrix.at<double>(5,8) = dt;
KF.transitionMatrix.at<double>(0,6) = 0.5*pow(dt,2);
KF.transitionMatrix.at<double>(1,7) = 0.5*pow(dt,2);
KF.transitionMatrix.at<double>(2,8) = 0.5*pow(dt,2);
// orientation
KF.transitionMatrix.at<double>(9,12) = dt;
KF.transitionMatrix.at<double>(10,13) = dt;
KF.transitionMatrix.at<double>(11,14) = dt;
KF.transitionMatrix.at<double>(12,15) = dt;
KF.transitionMatrix.at<double>(13,16) = dt;
KF.transitionMatrix.at<double>(14,17) = dt;
KF.transitionMatrix.at<double>(9,15) = 0.5*pow(dt,2);
KF.transitionMatrix.at<double>(10,16) = 0.5*pow(dt,2);
KF.transitionMatrix.at<double>(11,17) = 0.5*pow(dt,2);
/* MEASUREMENT MODEL */
// [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
// [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
// [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
// [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
// [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
// [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
KF.measurementMatrix.at<double>(0,0) = 1; // x
KF.measurementMatrix.at<double>(1,1) = 1; // y
KF.measurementMatrix.at<double>(2,2) = 1; // z
KF.measurementMatrix.at<double>(3,9) = 1; // roll
KF.measurementMatrix.at<double>(4,10) = 1; // pitch
KF.measurementMatrix.at<double>(5,11) = 1; // yaw
}
@endcode
In the following code is the 5th step of the main algorithm. When the obtained number of inliers
after *Ransac* is over the threshold, the measurements matrix is filled and then the *Kalman
Filter* is updated:
@code{.cpp}
// -- Step 5: Kalman Filter
// GOOD MEASUREMENT
if( inliers_idx.rows >= minInliersKalman )
{
// Get the measured translation
cv::Mat translation_measured(3, 1, CV_64F);
translation_measured = pnp_detection.get_t_matrix();
// Get the measured rotation
cv::Mat rotation_measured(3, 3, CV_64F);
rotation_measured = pnp_detection.get_R_matrix();
// fill the measurements vector
fillMeasurements(measurements, translation_measured, rotation_measured);
}
// Instantiate estimated translation and rotation
cv::Mat translation_estimated(3, 1, CV_64F);
cv::Mat rotation_estimated(3, 3, CV_64F);
// update the Kalman filter with good measurements
updateKalmanFilter( KF, measurements,
translation_estimated, rotation_estimated);
@endcode
The following code corresponds to the *fillMeasurements()* function which converts the measured
[Rotation Matrix to Eulers
angles](http://euclideanspace.com/maths/geometry/rotations/conversions/matrixToEuler/index.htm)
and fill the measurements matrix along with the measured translation vector:
@code{.cpp}
void fillMeasurements( cv::Mat &measurements,
const cv::Mat &translation_measured, const cv::Mat &rotation_measured)
{
// Convert rotation matrix to euler angles
cv::Mat measured_eulers(3, 1, CV_64F);
measured_eulers = rot2euler(rotation_measured);
// Set measurement to predict
measurements.at<double>(0) = translation_measured.at<double>(0); // x
measurements.at<double>(1) = translation_measured.at<double>(1); // y
measurements.at<double>(2) = translation_measured.at<double>(2); // z
measurements.at<double>(3) = measured_eulers.at<double>(0); // roll
measurements.at<double>(4) = measured_eulers.at<double>(1); // pitch
measurements.at<double>(5) = measured_eulers.at<double>(2); // yaw
}
@endcode
The following code corresponds to the *updateKalmanFilter()* function which update the Kalman
Filter and set the estimated Rotation Matrix and translation vector. The estimated Rotation Matrix
comes from the estimated [Euler angles to Rotation
Matrix](http://euclideanspace.com/maths/geometry/rotations/conversions/eulerToMatrix/index.htm).
@code{.cpp}
void updateKalmanFilter( cv::KalmanFilter &KF, cv::Mat &measurement,
cv::Mat &translation_estimated, cv::Mat &rotation_estimated )
{
// First predict, to update the internal statePre variable
cv::Mat prediction = KF.predict();
// The "correct" phase that is going to use the predicted value and our measurement
cv::Mat estimated = KF.correct(measurement);
// Estimated translation
translation_estimated.at<double>(0) = estimated.at<double>(0);
translation_estimated.at<double>(1) = estimated.at<double>(1);
translation_estimated.at<double>(2) = estimated.at<double>(2);
// Estimated euler angles
cv::Mat eulers_estimated(3, 1, CV_64F);
eulers_estimated.at<double>(0) = estimated.at<double>(9);
eulers_estimated.at<double>(1) = estimated.at<double>(10);
eulers_estimated.at<double>(2) = estimated.at<double>(11);
// Convert estimated quaternion to rotation matrix
rotation_estimated = euler2rot(eulers_estimated);
}
@endcode
The 6th step is set the estimated rotation-translation matrix:
@code{.cpp}
// -- Step 6: Set estimated projection matrix
pnp_detection_est.set_P_matrix(rotation_estimated, translation_estimated);
@endcode
The last and optional step is draw the found pose. To do it I implemented a function to draw all
the mesh 3D points and an extra reference axis:
@code{.cpp}
// -- Step X: Draw pose
drawObjectMesh(frame_vis, &mesh, &pnp_detection, green); // draw current pose
drawObjectMesh(frame_vis, &mesh, &pnp_detection_est, yellow); // draw estimated pose
double l = 5;
std::vector<cv::Point2f> pose_points2d;
pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,0))); // axis center
pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(l,0,0))); // axis x
pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,l,0))); // axis y
pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,l))); // axis z
draw3DCoordinateAxes(frame_vis, pose_points2d); // draw axes
@endcode
You can also modify the minimum inliers to update Kalman Filter:
@code{.cpp}
./cpp-tutorial-pnp_detection --inliers=20
@endcode
Results
-------
The following videos are the results of pose estimation in real time using the explained detection
algorithm using the following parameters:
@code{.cpp}
// Robust Matcher parameters
int numKeyPoints = 2000; // number of detected keypoints
float ratio = 0.70f; // ratio test
bool fast_match = true; // fastRobustMatch() or robustMatch()
// RANSAC parameters
int iterationsCount = 500; // number of Ransac iterations.
int reprojectionError = 2.0; // maximum allowed distance to consider it an inlier.
float confidence = 0.95; // ransac successful confidence.
// Kalman Filter parameters
int minInliersKalman = 30; // Kalman threshold updating
@endcode
You can watch the real time pose estimation on the [YouTube
here](http://www.youtube.com/user/opencvdev/videos).
@youtube{XNATklaJlSQ}
@youtube{YLS9bWek78k}

View File

@@ -0,0 +1,50 @@
Camera calibration and 3D reconstruction (calib3d module) {#tutorial_table_of_content_calib3d}
==========================================================
Although we get most of our images in a 2D format they do come from a 3D world. Here you will learn how to find out 3D world information from 2D images.
- @subpage tutorial_camera_calibration_pattern
*Compatibility:* \> OpenCV 2.0
*Author:* Laurent Berger
You will learn how to create some calibration pattern.
- @subpage tutorial_camera_calibration_square_chess
*Compatibility:* \> OpenCV 2.0
*Author:* Victor Eruhimov
You will use some chessboard images to calibrate your camera.
- @subpage tutorial_camera_calibration
*Compatibility:* \> OpenCV 4.0
*Author:* Bernát Gábor
Camera calibration by using either the chessboard, circle or the asymmetrical circle
pattern. Get the images either from a camera attached, a video file or from an image
collection.
- @subpage tutorial_real_time_pose
*Compatibility:* \> OpenCV 2.0
*Author:* Edgar Riba
Real time pose estimation of a textured object using ORB features, FlannBased matcher, PnP
approach plus Ransac and Linear Kalman Filter to reject possible bad poses.
- @subpage tutorial_interactive_calibration
*Compatibility:* \> OpenCV 3.1
*Author:* Vladislav Sovrasov
Camera calibration by using either the chessboard, chAruco, asymmetrical circle or dual asymmetrical circle
pattern. Calibration process is continuous, so you can see results after each new pattern shot.
As an output you get average reprojection error, intrinsic camera parameters, distortion coefficients and
confidence intervals for all of evaluated variables.

View File

@@ -0,0 +1,115 @@
Adding (blending) two images using OpenCV {#tutorial_adding_images}
=========================================
@prev_tutorial{tutorial_mat_operations}
@next_tutorial{tutorial_basic_linear_transform}
Goal
----
In this tutorial you will learn:
- what is *linear blending* and why it is useful;
- how to add two images using **addWeighted()**
Theory
------
@note
The explanation below belongs to the book [Computer Vision: Algorithms and
Applications](http://szeliski.org/Book/) by Richard Szeliski
From our previous tutorial, we know already a bit of *Pixel operators*. An interesting dyadic
(two-input) operator is the *linear blend operator*:
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
By varying \f$\alpha\f$ from \f$0 \rightarrow 1\f$ this operator can be used to perform a temporal
*cross-dissolve* between two images or videos, as seen in slide shows and film productions (cool,
eh?)
Source Code
-----------
@add_toggle_cpp
Download the source code from
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/AddingImages/AddingImages.cpp).
@include cpp/tutorial_code/core/AddingImages/AddingImages.cpp
@end_toggle
@add_toggle_java
Download the source code from
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/AddingImages/AddingImages.java).
@include java/tutorial_code/core/AddingImages/AddingImages.java
@end_toggle
@add_toggle_python
Download the source code from
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/AddingImages/adding_images.py).
@include python/tutorial_code/core/AddingImages/adding_images.py
@end_toggle
Explanation
-----------
Since we are going to perform:
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
We need two source images (\f$f_{0}(x)\f$ and \f$f_{1}(x)\f$). So, we load them in the usual way:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp load
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/AddingImages/AddingImages.java load
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/AddingImages/adding_images.py load
@end_toggle
We used the following images: [LinuxLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/LinuxLogo.jpg) and [WindowsLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/WindowsLogo.jpg)
@warning Since we are *adding* *src1* and *src2*, they both have to be of the same size
(width and height) and type.
Now we need to generate the `g(x)` image. For this, the function **addWeighted()** comes quite handy:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp blend_images
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/AddingImages/AddingImages.java blend_images
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/AddingImages/adding_images.py blend_images
Numpy version of above line (but cv function is around 2x faster):
\code{.py}
dst = np.uint8(alpha*(img1)+beta*(img2))
\endcode
@end_toggle
since **addWeighted()** produces:
\f[dst = \alpha \cdot src1 + \beta \cdot src2 + \gamma\f]
In this case, `gamma` is the argument \f$0.0\f$ in the code above.
Create windows, show the images and wait for the user to end the program.
@add_toggle_cpp
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp display
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/AddingImages/AddingImages.java display
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/AddingImages/adding_images.py display
@end_toggle
Result
------
![](images/Adding_Images_Tutorial_Result_Big.jpg)

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.4 KiB

View File

@@ -0,0 +1,318 @@
Changing the contrast and brightness of an image! {#tutorial_basic_linear_transform}
=================================================
@prev_tutorial{tutorial_adding_images}
@next_tutorial{tutorial_discrete_fourier_transform}
Goal
----
In this tutorial you will learn how to:
- Access pixel values
- Initialize a matrix with zeros
- Learn what @ref cv::saturate_cast does and why it is useful
- Get some cool info about pixel transformations
- Improve the brightness of an image on a practical example
Theory
------
@note
The explanation below belongs to the book [Computer Vision: Algorithms and
Applications](http://szeliski.org/Book/) by Richard Szeliski
### Image Processing
- A general image processing operator is a function that takes one or more input images and
produces an output image.
- Image transforms can be seen as:
- Point operators (pixel transforms)
- Neighborhood (area-based) operators
### Pixel Transforms
- In this kind of image processing transform, each output pixel's value depends on only the
corresponding input pixel value (plus, potentially, some globally collected information or
parameters).
- Examples of such operators include *brightness and contrast adjustments* as well as color
correction and transformations.
### Brightness and contrast adjustments
- Two commonly used point processes are *multiplication* and *addition* with a constant:
\f[g(x) = \alpha f(x) + \beta\f]
- The parameters \f$\alpha > 0\f$ and \f$\beta\f$ are often called the *gain* and *bias* parameters;
sometimes these parameters are said to control *contrast* and *brightness* respectively.
- You can think of \f$f(x)\f$ as the source image pixels and \f$g(x)\f$ as the output image pixels. Then,
more conveniently we can write the expression as:
\f[g(i,j) = \alpha \cdot f(i,j) + \beta\f]
where \f$i\f$ and \f$j\f$ indicates that the pixel is located in the *i-th* row and *j-th* column.
Code
----
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp)
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
@include samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java)
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
@include samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py)
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
@include samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py
@end_toggle
Explanation
-----------
- We load an image using @ref cv::imread and save it in a Mat object:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-load
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-load
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-load
@end_toggle
- Now, since we will make some transformations to this image, we need a new Mat object to store
it. Also, we want this to have the following features:
- Initial pixel values equal to zero
- Same size and type as the original image
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-output
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-output
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-output
@end_toggle
We observe that @ref cv::Mat::zeros returns a Matlab-style zero initializer based on
*image.size()* and *image.type()*
- We ask now the values of \f$\alpha\f$ and \f$\beta\f$ to be entered by the user:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-parameters
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-parameters
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-parameters
@end_toggle
- Now, to perform the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ we will access to each
pixel in image. Since we are operating with BGR images, we will have three values per pixel (B,
G and R), so we will also access them separately. Here is the piece of code:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-operation
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-operation
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-operation
@end_toggle
Notice the following (**C++ code only**):
- To access each pixel in the images we are using this syntax: *image.at\<Vec3b\>(y,x)[c]*
where *y* is the row, *x* is the column and *c* is B, G or R (0, 1 or 2).
- Since the operation \f$\alpha \cdot p(i,j) + \beta\f$ can give values out of range or not
integers (if \f$\alpha\f$ is float), we use cv::saturate_cast to make sure the
values are valid.
- Finally, we create windows and show the images, the usual way.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-display
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-display
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-display
@end_toggle
@note
Instead of using the **for** loops to access each pixel, we could have simply used this command:
@add_toggle_cpp
@code{.cpp}
image.convertTo(new_image, -1, alpha, beta);
@endcode
@end_toggle
@add_toggle_java
@code{.java}
image.convertTo(newImage, -1, alpha, beta);
@endcode
@end_toggle
@add_toggle_python
@code{.py}
new_image = cv.convertScaleAbs(image, alpha=alpha, beta=beta)
@endcode
@end_toggle
where @ref cv::Mat::convertTo would effectively perform *new_image = a*image + beta\*. However, we
wanted to show you how to access each pixel. In any case, both methods give the same result but
convertTo is more optimized and works a lot faster.
Result
------
- Running our code and using \f$\alpha = 2.2\f$ and \f$\beta = 50\f$
@code{.bash}
$ ./BasicLinearTransforms lena.jpg
Basic Linear Transforms
-------------------------
* Enter the alpha value [1.0-3.0]: 2.2
* Enter the beta value [0-100]: 50
@endcode
- We get this:
![](images/Basic_Linear_Transform_Tutorial_Result_big.jpg)
Practical example
----
In this paragraph, we will put into practice what we have learned to correct an underexposed image by adjusting the brightness
and the contrast of the image. We will also see another technique to correct the brightness of an image called
gamma correction.
### Brightness and contrast adjustments
Increasing (/ decreasing) the \f$\beta\f$ value will add (/ subtract) a constant value to every pixel. Pixel values outside of the [0 ; 255]
range will be saturated (i.e. a pixel value higher (/ lesser) than 255 (/ 0) will be clamped to 255 (/ 0)).
![In light gray, histogram of the original image, in dark gray when brightness = 80 in Gimp](images/Basic_Linear_Transform_Tutorial_hist_beta.png)
The histogram represents for each color level the number of pixels with that color level. A dark image will have many pixels with
low color value and thus the histogram will present a peak in its left part. When adding a constant bias, the histogram is shifted to the
right as we have added a constant bias to all the pixels.
The \f$\alpha\f$ parameter will modify how the levels spread. If \f$ \alpha < 1 \f$, the color levels will be compressed and the result
will be an image with less contrast.
![In light gray, histogram of the original image, in dark gray when contrast < 0 in Gimp](images/Basic_Linear_Transform_Tutorial_hist_alpha.png)
Note that these histograms have been obtained using the Brightness-Contrast tool in the Gimp software. The brightness tool should be
identical to the \f$\beta\f$ bias parameters but the contrast tool seems to differ to the \f$\alpha\f$ gain where the output range
seems to be centered with Gimp (as you can notice in the previous histogram).
It can occur that playing with the \f$\beta\f$ bias will improve the brightness but in the same time the image will appear with a
slight veil as the contrast will be reduced. The \f$\alpha\f$ gain can be used to diminue this effect but due to the saturation,
we will lose some details in the original bright regions.
### Gamma correction
[Gamma correction](https://en.wikipedia.org/wiki/Gamma_correction) can be used to correct the brightness of an image by using a non
linear transformation between the input values and the mapped output values:
\f[O = \left( \frac{I}{255} \right)^{\gamma} \times 255\f]
As this relation is non linear, the effect will not be the same for all the pixels and will depend to their original value.
![Plot for different values of gamma](images/Basic_Linear_Transform_Tutorial_gamma.png)
When \f$ \gamma < 1 \f$, the original dark regions will be brighter and the histogram will be shifted to the right whereas it will
be the opposite with \f$ \gamma > 1 \f$.
### Correct an underexposed image
The following image has been corrected with: \f$ \alpha = 1.3 \f$ and \f$ \beta = 40 \f$.
![By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons](images/Basic_Linear_Transform_Tutorial_linear_transform_correction.jpg)
The overall brightness has been improved but you can notice that the clouds are now greatly saturated due to the numerical saturation
of the implementation used ([highlight clipping](https://en.wikipedia.org/wiki/Clipping_(photography)) in photography).
The following image has been corrected with: \f$ \gamma = 0.4 \f$.
![By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons](images/Basic_Linear_Transform_Tutorial_gamma_correction.jpg)
The gamma correction should tend to add less saturation effect as the mapping is non linear and there is no numerical saturation possible as in the previous method.
![Left: histogram after alpha, beta correction ; Center: histogram of the original image ; Right: histogram after the gamma correction](images/Basic_Linear_Transform_Tutorial_histogram_compare.png)
The previous figure compares the histograms for the three images (the y-ranges are not the same between the three histograms).
You can notice that most of the pixel values are in the lower part of the histogram for the original image. After \f$ \alpha \f$,
\f$ \beta \f$ correction, we can observe a big peak at 255 due to the saturation as well as a shift in the right.
After gamma correction, the histogram is shifted to the right but the pixels in the dark regions are more shifted
(see the gamma curves [figure](Basic_Linear_Transform_Tutorial_gamma.png)) than those in the bright regions.
In this tutorial, you have seen two simple methods to adjust the contrast and the brightness of an image. **They are basic techniques
and are not intended to be used as a replacement of a raster graphics editor!**
### Code
@add_toggle_cpp
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp).
@end_toggle
@add_toggle_java
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java).
@end_toggle
@add_toggle_python
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py).
@end_toggle
Code for the gamma correction:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp changing-contrast-brightness-gamma-correction
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java changing-contrast-brightness-gamma-correction
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py changing-contrast-brightness-gamma-correction
@end_toggle
A look-up table is used to improve the performance of the computation as only 256 values needs to be calculated once.
### Additional resources
- [Gamma correction in graphics rendering](https://learnopengl.com/#!Advanced-Lighting/Gamma-Correction)
- [Gamma correction and images displayed on CRT monitors](http://www.graphics.cornell.edu/~westin/gamma/gamma.html)
- [Digital exposure techniques](http://www.cambridgeincolour.com/tutorials/digital-exposure-techniques.htm)

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

View File

@@ -0,0 +1,238 @@
Discrete Fourier Transform {#tutorial_discrete_fourier_transform}
==========================
@prev_tutorial{tutorial_basic_linear_transform}
@next_tutorial{tutorial_file_input_output_with_xml_yml}
Goal
----
We'll seek answers for the following questions:
- What is a Fourier transform and why use it?
- How to do it in OpenCV?
- Usage of functions such as: **copyMakeBorder()** , **merge()** , **dft()** ,
**getOptimalDFTSize()** , **log()** and **normalize()** .
Source code
-----------
@add_toggle_cpp
You can [download this from here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp) or
find it in the
`samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp` of the
OpenCV source code library.
@end_toggle
@add_toggle_java
You can [download this from here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java) or
find it in the
`samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java` of the
OpenCV source code library.
@end_toggle
@add_toggle_python
You can [download this from here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py) or
find it in the
`samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py` of the
OpenCV source code library.
@end_toggle
Here's a sample usage of **dft()** :
@add_toggle_cpp
@include cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp
@end_toggle
@add_toggle_java
@include java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java
@end_toggle
@add_toggle_python
@include python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py
@end_toggle
Explanation
-----------
The Fourier Transform will decompose an image into its sinus and cosines components. In other words,
it will transform an image from its spatial domain to its frequency domain. The idea is that any
function may be approximated exactly with the sum of infinite sinus and cosines functions. The
Fourier Transform is a way how to do this. Mathematically a two dimensional images Fourier transform
is:
\f[F(k,l) = \displaystyle\sum\limits_{i=0}^{N-1}\sum\limits_{j=0}^{N-1} f(i,j)e^{-i2\pi(\frac{ki}{N}+\frac{lj}{N})}\f]\f[e^{ix} = \cos{x} + i\sin {x}\f]
Here f is the image value in its spatial domain and F in its frequency domain. The result of the
transformation is complex numbers. Displaying this is possible either via a *real* image and a
*complex* image or via a *magnitude* and a *phase* image. However, throughout the image processing
algorithms only the *magnitude* image is interesting as this contains all the information we need
about the images geometric structure. Nevertheless, if you intend to make some modifications of the
image in these forms and then you need to retransform it you'll need to preserve both of these.
In this sample I'll show how to calculate and show the *magnitude* image of a Fourier Transform. In
case of digital images are discrete. This means they may take up a value from a given domain value.
For example in a basic gray scale image values usually are between zero and 255. Therefore the
Fourier Transform too needs to be of a discrete type resulting in a Discrete Fourier Transform
(*DFT*). You'll want to use this whenever you need to determine the structure of an image from a
geometrical point of view. Here are the steps to follow (in case of a gray scale input image *I*):
#### Expand the image to an optimal size
The performance of a DFT is dependent of the image
size. It tends to be the fastest for image sizes that are multiple of the numbers two, three and
five. Therefore, to achieve maximal performance it is generally a good idea to pad border values
to the image to get a size with such traits. The **getOptimalDFTSize()** returns this
optimal size and we can use the **copyMakeBorder()** function to expand the borders of an
image (the appended pixels are initialized with zero):
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp expand
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java expand
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py expand
@end_toggle
#### Make place for both the complex and the real values
The result of a Fourier Transform is
complex. This implies that for each image value the result is two image values (one per
component). Moreover, the frequency domains range is much larger than its spatial counterpart.
Therefore, we store these usually at least in a *float* format. Therefore we'll convert our
input image to this type and expand it with another channel to hold the complex values:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp complex_and_real
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java complex_and_real
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py complex_and_real
@end_toggle
#### Make the Discrete Fourier Transform
It's possible an in-place calculation (same input as
output):
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp dft
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java dft
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py dft
@end_toggle
#### Transform the real and complex values to magnitude
A complex number has a real (*Re*) and a
complex (imaginary - *Im*) part. The results of a DFT are complex numbers. The magnitude of a
DFT is:
\f[M = \sqrt[2]{ {Re(DFT(I))}^2 + {Im(DFT(I))}^2}\f]
Translated to OpenCV code:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp magnitude
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java magnitude
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py magnitude
@end_toggle
#### Switch to a logarithmic scale
It turns out that the dynamic range of the Fourier
coefficients is too large to be displayed on the screen. We have some small and some high
changing values that we can't observe like this. Therefore the high values will all turn out as
white points, while the small ones as black. To use the gray scale values to for visualization
we can transform our linear scale to a logarithmic one:
\f[M_1 = \log{(1 + M)}\f]
Translated to OpenCV code:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp log
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java log
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py log
@end_toggle
#### Crop and rearrange
Remember, that at the first step, we expanded the image? Well, it's time
to throw away the newly introduced values. For visualization purposes we may also rearrange the
quadrants of the result, so that the origin (zero, zero) corresponds with the image center.
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp crop_rearrange
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java crop_rearrange
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py crop_rearrange
@end_toggle
#### Normalize
This is done again for visualization purposes. We now have the magnitudes,
however this are still out of our image display range of zero to one. We normalize our values to
this range using the @ref cv::normalize() function.
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp normalize
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java normalize
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py normalize
@end_toggle
Result
------
An application idea would be to determine the geometrical orientation present in the image. For
example, let us find out if a text is horizontal or not? Looking at some text you'll notice that the
text lines sort of form also horizontal lines and the letters form sort of vertical lines. These two
main components of a text snippet may be also seen in case of the Fourier transform. Let us use
[this horizontal ](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/imageTextN.png) and [this rotated](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/imageTextR.png)
image about a text.
In case of the horizontal text:
![](images/result_normal.jpg)
In case of a rotated text:
![](images/result_rotated.jpg)
You can see that the most influential components of the frequency domain (brightest dots on the
magnitude image) follow the geometric rotation of objects on the image. From this we may calculate
the offset and perform an image rotation to correct eventual miss alignments.

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View File

@@ -0,0 +1,269 @@
File Input and Output using XML and YAML files {#tutorial_file_input_output_with_xml_yml}
==============================================
@prev_tutorial{tutorial_discrete_fourier_transform}
@next_tutorial{tutorial_how_to_use_OpenCV_parallel_for_}
Goal
----
You'll find answers for the following questions:
- How to print and read text entries to a file and OpenCV using YAML or XML files?
- How to do the same for OpenCV data structures?
- How to do this for your data structures?
- Usage of OpenCV data structures such as @ref cv::FileStorage , @ref cv::FileNode or @ref
cv::FileNodeIterator .
Source code
-----------
You can [download this from here
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp) or find it in the
`samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp` of the OpenCV source code
library.
Here's a sample code of how to achieve all the stuff enumerated at the goal list.
@include cpp/tutorial_code/core/file_input_output/file_input_output.cpp
Explanation
-----------
Here we talk only about XML and YAML file inputs. Your output (and its respective input) file may
have only one of these extensions and the structure coming from this. They are two kinds of data
structures you may serialize: *mappings* (like the STL map) and *element sequence* (like the STL
vector). The difference between these is that in a map every element has a unique name through what
you may access it. For sequences you need to go through them to query a specific item.
-# **XML/YAML File Open and Close.** Before you write any content to such file you need to open it
and at the end to close it. The XML/YAML data structure in OpenCV is @ref cv::FileStorage . To
specify that this structure to which file binds on your hard drive you can use either its
constructor or the *open()* function of this:
@code{.cpp}
string filename = "I.xml";
FileStorage fs(filename, FileStorage::WRITE);
//...
fs.open(filename, FileStorage::READ);
@endcode
Either one of this you use the second argument is a constant specifying the type of operations
you'll be able to on them: WRITE, READ or APPEND. The extension specified in the file name also
determinates the output format that will be used. The output may be even compressed if you
specify an extension such as *.xml.gz*.
The file automatically closes when the @ref cv::FileStorage objects is destroyed. However, you
may explicitly call for this by using the *release* function:
@code{.cpp}
fs.release(); // explicit close
@endcode
-# **Input and Output of text and numbers.** The data structure uses the same \<\< output operator
that the STL library. For outputting any type of data structure we need first to specify its
name. We do this by just simply printing out the name of this. For basic types you may follow
this with the print of the value :
@code{.cpp}
fs << "iterationNr" << 100;
@endcode
Reading in is a simple addressing (via the [] operator) and casting operation or a read via
the \>\> operator :
@code{.cpp}
int itNr;
fs["iterationNr"] >> itNr;
itNr = (int) fs["iterationNr"];
@endcode
-# **Input/Output of OpenCV Data structures.** Well these behave exactly just as the basic C++
types:
@code{.cpp}
Mat R = Mat_<uchar >::eye (3, 3),
T = Mat_<double>::zeros(3, 1);
fs << "R" << R; // Write cv::Mat
fs << "T" << T;
fs["R"] >> R; // Read cv::Mat
fs["T"] >> T;
@endcode
-# **Input/Output of vectors (arrays) and associative maps.** As I mentioned beforehand, we can
output maps and sequences (array, vector) too. Again we first print the name of the variable and
then we have to specify if our output is either a sequence or map.
For sequence before the first element print the "[" character and after the last one the "]"
character:
@code{.cpp}
fs << "strings" << "["; // text - string sequence
fs << "image1.jpg" << "Awesomeness" << "baboon.jpg";
fs << "]"; // close sequence
@endcode
For maps the drill is the same however now we use the "{" and "}" delimiter characters:
@code{.cpp}
fs << "Mapping"; // text - mapping
fs << "{" << "One" << 1;
fs << "Two" << 2 << "}";
@endcode
To read from these we use the @ref cv::FileNode and the @ref cv::FileNodeIterator data
structures. The [] operator of the @ref cv::FileStorage class returns a @ref cv::FileNode data
type. If the node is sequential we can use the @ref cv::FileNodeIterator to iterate through the
items:
@code{.cpp}
FileNode n = fs["strings"]; // Read string sequence - Get node
if (n.type() != FileNode::SEQ)
{
cerr << "strings is not a sequence! FAIL" << endl;
return 1;
}
FileNodeIterator it = n.begin(), it_end = n.end(); // Go through the node
for (; it != it_end; ++it)
cout << (string)*it << endl;
@endcode
For maps you can use the [] operator again to access the given item (or the \>\> operator too):
@code{.cpp}
n = fs["Mapping"]; // Read mappings from a sequence
cout << "Two " << (int)(n["Two"]) << "; ";
cout << "One " << (int)(n["One"]) << endl << endl;
@endcode
-# **Read and write your own data structures.** Suppose you have a data structure such as:
@code{.cpp}
class MyData
{
public:
MyData() : A(0), X(0), id() {}
public: // Data Members
int A;
double X;
string id;
};
@endcode
It's possible to serialize this through the OpenCV I/O XML/YAML interface (just as in case of
the OpenCV data structures) by adding a read and a write function inside and outside of your
class. For the inside part:
@code{.cpp}
void write(FileStorage& fs) const //Write serialization for this class
{
fs << "{" << "A" << A << "X" << X << "id" << id << "}";
}
void read(const FileNode& node) //Read serialization for this class
{
A = (int)node["A"];
X = (double)node["X"];
id = (string)node["id"];
}
@endcode
Then you need to add the following functions definitions outside the class:
@code{.cpp}
void write(FileStorage& fs, const std::string&, const MyData& x)
{
x.write(fs);
}
void read(const FileNode& node, MyData& x, const MyData& default_value = MyData())
{
if(node.empty())
x = default_value;
else
x.read(node);
}
@endcode
Here you can observe that in the read section we defined what happens if the user tries to read
a non-existing node. In this case we just return the default initialization value, however a
more verbose solution would be to return for instance a minus one value for an object ID.
Once you added these four functions use the \>\> operator for write and the \<\< operator for
read:
@code{.cpp}
MyData m(1);
fs << "MyData" << m; // your own data structures
fs["MyData"] >> m; // Read your own structure_
@endcode
Or to try out reading a non-existing read:
@code{.cpp}
fs["NonExisting"] >> m; // Do not add a fs << "NonExisting" << m command for this to work
cout << endl << "NonExisting = " << endl << m << endl;
@endcode
Result
------
Well mostly we just print out the defined numbers. On the screen of your console you could see:
@code{.bash}
Write Done.
Reading:
100image1.jpg
Awesomeness
baboon.jpg
Two 2; One 1
R = [1, 0, 0;
0, 1, 0;
0, 0, 1]
T = [0; 0; 0]
MyData =
{ id = mydata1234, X = 3.14159, A = 97}
Attempt to read NonExisting (should initialize the data structure with its default).
NonExisting =
{ id = , X = 0, A = 0}
Tip: Open up output.xml with a text editor to see the serialized data.
@endcode
Nevertheless, it's much more interesting what you may see in the output xml file:
@code{.xml}
<?xml version="1.0"?>
<opencv_storage>
<iterationNr>100</iterationNr>
<strings>
image1.jpg Awesomeness baboon.jpg</strings>
<Mapping>
<One>1</One>
<Two>2</Two></Mapping>
<R type_id="opencv-matrix">
<rows>3</rows>
<cols>3</cols>
<dt>u</dt>
<data>
1 0 0 0 1 0 0 0 1</data></R>
<T type_id="opencv-matrix">
<rows>3</rows>
<cols>1</cols>
<dt>d</dt>
<data>
0. 0. 0.</data></T>
<MyData>
<A>97</A>
<X>3.1415926535897931e+000</X>
<id>mydata1234</id></MyData>
</opencv_storage>
@endcode
Or the YAML file:
@code{.yaml}
%YAML:1.0
iterationNr: 100
strings:
- "image1.jpg"
- Awesomeness
- "baboon.jpg"
Mapping:
One: 1
Two: 2
R: !!opencv-matrix
rows: 3
cols: 3
dt: u
data: [ 1, 0, 0, 0, 1, 0, 0, 0, 1 ]
T: !!opencv-matrix
rows: 3
cols: 1
dt: d
data: [ 0., 0., 0. ]
MyData:
A: 97
X: 3.1415926535897931e+000
id: mydata1234
@endcode
You may observe a runtime instance of this on the [YouTube
here](https://www.youtube.com/watch?v=A4yqVnByMMM) .
@youtube{A4yqVnByMMM}

View File

@@ -0,0 +1,220 @@
How to scan images, lookup tables and time measurement with OpenCV {#tutorial_how_to_scan_images}
==================================================================
@prev_tutorial{tutorial_mat_the_basic_image_container}
@next_tutorial{tutorial_mat_mask_operations}
Goal
----
We'll seek answers for the following questions:
- How to go through each and every pixel of an image?
- How are OpenCV matrix values stored?
- How to measure the performance of our algorithm?
- What are lookup tables and why use them?
Our test case
-------------
Let us consider a simple color reduction method. By using the unsigned char C and C++ type for
matrix item storing, a channel of pixel may have up to 256 different values. For a three channel
image this can allow the formation of way too many colors (16 million to be exact). Working with so
many color shades may give a heavy blow to our algorithm performance. However, sometimes it is
enough to work with a lot less of them to get the same final result.
In this cases it's common that we make a *color space reduction*. This means that we divide the
color space current value with a new input value to end up with fewer colors. For instance every
value between zero and nine takes the new value zero, every value between ten and nineteen the value
ten and so on.
When you divide an *uchar* (unsigned char - aka values between zero and 255) value with an *int*
value the result will be also *char*. These values may only be char values. Therefore, any fraction
will be rounded down. Taking advantage of this fact the upper operation in the *uchar* domain may be
expressed as:
\f[I_{new} = (\frac{I_{old}}{10}) * 10\f]
A simple color space reduction algorithm would consist of just passing through every pixel of an
image matrix and applying this formula. It's worth noting that we do a divide and a multiplication
operation. These operations are bloody expensive for a system. If possible it's worth avoiding them
by using cheaper operations such as a few subtractions, addition or in best case a simple
assignment. Furthermore, note that we only have a limited number of input values for the upper
operation. In case of the *uchar* system this is 256 to be exact.
Therefore, for larger images it would be wise to calculate all possible values beforehand and during
the assignment just make the assignment, by using a lookup table. Lookup tables are simple arrays
(having one or more dimensions) that for a given input value variation holds the final output value.
Its strength is that we do not need to make the calculation, we just need to read the result.
Our test case program (and the code sample below) will do the following: read in an image passed
as a command line argument (it may be either color or grayscale) and apply the reduction
with the given command line argument integer value. In OpenCV, at the moment there are
three major ways of going through an image pixel by pixel. To make things a little more interesting
we'll make the scanning of the image using each of these methods, and print out how long it took.
You can download the full source code [here
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/how_to_scan_images/how_to_scan_images.cpp) or look it up in
the samples directory of OpenCV at the cpp tutorial code for the core section. Its basic usage is:
@code{.bash}
how_to_scan_images imageName.jpg intValueToReduce [G]
@endcode
The final argument is optional. If given the image will be loaded in grayscale format, otherwise
the BGR color space is used. The first thing is to calculate the lookup table.
@snippet how_to_scan_images.cpp dividewith
Here we first use the C++ *stringstream* class to convert the third command line argument from text
to an integer format. Then we use a simple look and the upper formula to calculate the lookup table.
No OpenCV specific stuff here.
Another issue is how do we measure time? Well OpenCV offers two simple functions to achieve this
cv::getTickCount() and cv::getTickFrequency() . The first returns the number of ticks of
your systems CPU from a certain event (like since you booted your system). The second returns how
many times your CPU emits a tick during a second. So, measuring amount of time elapsed between
two operations is as easy as:
@code{.cpp}
double t = (double)getTickCount();
// do something ...
t = ((double)getTickCount() - t)/getTickFrequency();
cout << "Times passed in seconds: " << t << endl;
@endcode
@anchor tutorial_how_to_scan_images_storing
How is the image matrix stored in memory?
-----------------------------------------
As you could already read in my @ref tutorial_mat_the_basic_image_container tutorial the size of the matrix
depends on the color system used. More accurately, it depends on the number of channels used. In
case of a grayscale image we have something like:
![](tutorial_how_matrix_stored_1.png)
For multichannel images the columns contain as many sub columns as the number of channels. For
example in case of an BGR color system:
![](tutorial_how_matrix_stored_2.png)
Note that the order of the channels is inverse: BGR instead of RGB. Because in many cases the memory
is large enough to store the rows in a successive fashion the rows may follow one after another,
creating a single long row. Because everything is in a single place following one after another this
may help to speed up the scanning process. We can use the cv::Mat::isContinuous() function to *ask*
the matrix if this is the case. Continue on to the next section to find an example.
The efficient way
-----------------
When it comes to performance you cannot beat the classic C style operator[] (pointer) access.
Therefore, the most efficient method we can recommend for making the assignment is:
@snippet how_to_scan_images.cpp scan-c
Here we basically just acquire a pointer to the start of each row and go through it until it ends.
In the special case that the matrix is stored in a continuous manner we only need to request the
pointer a single time and go all the way to the end. We need to look out for color images: we have
three channels so we need to pass through three times more items in each row.
There's another way of this. The *data* data member of a *Mat* object returns the pointer to the
first row, first column. If this pointer is null you have no valid input in that object. Checking
this is the simplest method to check if your image loading was a success. In case the storage is
continuous we can use this to go through the whole data pointer. In case of a grayscale image this
would look like:
@code{.cpp}
uchar* p = I.data;
for( unsigned int i = 0; i < ncol*nrows; ++i)
*p++ = table[*p];
@endcode
You would get the same result. However, this code is a lot harder to read later on. It gets even
harder if you have some more advanced technique there. Moreover, in practice I've observed you'll
get the same performance result (as most of the modern compilers will probably make this small
optimization trick automatically for you).
The iterator (safe) method
--------------------------
In case of the efficient way making sure that you pass through the right amount of *uchar* fields
and to skip the gaps that may occur between the rows was your responsibility. The iterator method is
considered a safer way as it takes over these tasks from the user. All you need to do is to ask the
begin and the end of the image matrix and then just increase the begin iterator until you reach the
end. To acquire the value *pointed* by the iterator use the \* operator (add it before it).
@snippet how_to_scan_images.cpp scan-iterator
In case of color images we have three uchar items per column. This may be considered a short vector
of uchar items, that has been baptized in OpenCV with the *Vec3b* name. To access the n-th sub
column we use simple operator[] access. It's important to remember that OpenCV iterators go through
the columns and automatically skip to the next row. Therefore in case of color images if you use a
simple *uchar* iterator you'll be able to access only the blue channel values.
On-the-fly address calculation with reference returning
-------------------------------------------------------
The final method isn't recommended for scanning. It was made to acquire or modify somehow random
elements in the image. Its basic usage is to specify the row and column number of the item you want
to access. During our earlier scanning methods you could already notice that it is important through
what type we are looking at the image. It's no different here as you need to manually specify what
type to use at the automatic lookup. You can observe this in case of the grayscale images for the
following source code (the usage of the + cv::Mat::at() function):
@snippet how_to_scan_images.cpp scan-random
The function takes your input type and coordinates and calculates the address of the
queried item. Then returns a reference to that. This may be a constant when you *get* the value and
non-constant when you *set* the value. As a safety step in **debug mode only**\* there is a check
performed that your input coordinates are valid and do exist. If this isn't the case you'll get a
nice output message of this on the standard error output stream. Compared to the efficient way in
release mode the only difference in using this is that for every element of the image you'll get a
new row pointer for what we use the C operator[] to acquire the column element.
If you need to do multiple lookups using this method for an image it may be troublesome and time
consuming to enter the type and the at keyword for each of the accesses. To solve this problem
OpenCV has a cv::Mat_ data type. It's the same as Mat with the extra need that at definition
you need to specify the data type through what to look at the data matrix, however in return you can
use the operator() for fast access of items. To make things even better this is easily convertible
from and to the usual cv::Mat data type. A sample usage of this you can see in case of the
color images of the function above. Nevertheless, it's important to note that the same operation
(with the same runtime speed) could have been done with the cv::Mat::at function. It's just a less
to write for the lazy programmer trick.
The Core Function
-----------------
This is a bonus method of achieving lookup table modification in an image. In image
processing it's quite common that you want to modify all of a given image values to some other value.
OpenCV provides a function for modifying image values, without the need to write the scanning logic
of the image. We use the cv::LUT() function of the core module. First we build a Mat type of the
lookup table:
@snippet how_to_scan_images.cpp table-init
Finally call the function (I is our input image and J the output one):
@snippet how_to_scan_images.cpp table-use
Performance Difference
----------------------
For the best result compile the program and run it yourself. To make the differences more
clear, I've used a quite large (2560 X 1600) image. The performance presented here are for
color images. For a more accurate value I've averaged the value I got from the call of the function
for hundred times.
Method | Time
--------------- | ----------------------
Efficient Way | 79.4717 milliseconds
Iterator | 83.7201 milliseconds
On-The-Fly RA | 93.7878 milliseconds
LUT function | 32.5759 milliseconds
We can conclude a couple of things. If possible, use the already made functions of OpenCV (instead
of reinventing these). The fastest method turns out to be the LUT function. This is because the OpenCV
library is multi-thread enabled via Intel Threaded Building Blocks. However, if you need to write a
simple image scan prefer the pointer method. The iterator is a safer bet, however quite slower.
Using the on-the-fly reference access method for full image scan is the most costly in debug mode.
In the release mode it may beat the iterator approach or not, however it surely sacrifices for this
the safety trait of iterators.
Finally, you may watch a sample run of the program on the [video posted](https://www.youtube.com/watch?v=fB3AN5fjgwc) on our YouTube channel.
@youtube{fB3AN5fjgwc}

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.8 KiB

View File

@@ -0,0 +1,190 @@
How to use the OpenCV parallel_for_ to parallelize your code {#tutorial_how_to_use_OpenCV_parallel_for_}
==================================================================
@prev_tutorial{tutorial_file_input_output_with_xml_yml}
Goal
----
The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily
parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set
exploiting almost all the CPU load available.
The full tutorial code is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended
to remain simple.
Precondition
----
The first precondition is to have OpenCV built with a parallel framework.
In OpenCV 3.2, the following parallel frameworks are available in that order:
1. Intel Threading Building Blocks (3rdparty library, should be explicitly enabled)
2. C= Parallel C/C++ Programming Language Extension (3rdparty library, should be explicitly enabled)
3. OpenMP (integrated to compiler, should be explicitly enabled)
4. APPLE GCD (system wide, used automatically (APPLE only))
5. Windows RT concurrency (system wide, used automatically (Windows RT only))
6. Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10))
7. Pthreads (if available)
As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries
are third party libraries and have to be explicitly built and enabled in CMake (e.g. TBB, C=), others are
automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to
have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library.
The second (weak) precondition is more related to the task you want to achieve as not all computations
are suitable / can be adatapted to be run in a parallel way. To remain simple, tasks that can be split
into multiple elementary operations with no memory dependency (no possible race condition) are easily
parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of
one pixel does not depend to the state of other pixels.
Simple example: drawing a Mandelbrot set
----
We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt
the code to parallelize the computation.
Theory
-----------
The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician
Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a
class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a
Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth
introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set).
Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article).
> The Mandelbrot set is the set of values of \f$ c \f$ in the complex plane for which the orbit of 0 under iteration
> of the quadratic map
> \f[\begin{cases} z_0 = 0 \\ z_{n+1} = z_n^2 + c \end{cases}\f]
> remains bounded.
> That is, a complex number \f$ c \f$ is part of the Mandelbrot set if, when starting with \f$ z_0 = 0 \f$ and applying
> the iteration repeatedly, the absolute value of \f$ z_n \f$ remains bounded however large \f$ n \f$ gets.
> This can also be represented as
> \f[\limsup_{n\to\infty}|z_{n+1}|\leqslant2\f]
Pseudocode
-----------
A simple algorithm to generate a representation of the Mandelbrot set is called the
["escape time algorithm"](https://en.wikipedia.org/wiki/Mandelbrot_set#Escape_time_algorithm).
For each pixel in the rendered image, we test using the recurrence relation if the complex number is bounded or not
under a maximum number of iterations. Pixels that do not belong to the Mandelbrot set will escape quickly whereas
we assume that the pixel is in the set after a fixed maximum number of iterations. A high value of iterations will
produce a more detailed image but the computation time will increase accordingly. We use the number of iterations
needed to "escape" to depict the pixel value in the image.
```
For each pixel (Px, Py) on the screen, do:
{
x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2, 1))
y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1))
x = 0.0
y = 0.0
iteration = 0
max_iteration = 1000
while (x*x + y*y < 2*2 AND iteration < max_iteration) {
xtemp = x*x - y*y + x0
y = 2*x*y + y0
x = xtemp
iteration = iteration + 1
}
color = palette[iteration]
plot(Px, Py, color)
}
```
To relate between the pseudocode and the theory, we have:
* \f$ z = x + iy \f$
* \f$ z^2 = x^2 + i2xy - y^2 \f$
* \f$ c = x_0 + iy_0 \f$
![](images/how_to_use_OpenCV_parallel_for_640px-Mandelset_hires.png)
On this figure, we recall that the real part of a complex number is on the x-axis and the imaginary part on the y-axis.
You can see that the whole shape can be repeatedly visible if we zoom at particular locations.
Implementation
-----------
Escape time algorithm implementation
--------------------------
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-escape-time-algorithm
Here, we used the [`std::complex`](http://en.cppreference.com/w/cpp/numeric/complex) template class to represent a
complex number. This function performs the test to check if the pixel is in set or not and returns the "escaped" iteration.
Sequential Mandelbrot implementation
--------------------------
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-sequential
In this implementation, we sequentially iterate over the pixels in the rendered image to perform the test to check if the
pixel is likely to belong to the Mandelbrot set or not.
Another thing to do is to transform the pixel coordinate into the Mandelbrot set space with:
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-transformation
Finally, to assign the grayscale value to the pixels, we use the following rule:
* a pixel is black if it reaches the maximum number of iterations (pixel is assumed to be in the Mandelbrot set),
* otherwise we assign a grayscale value depending on the escaped iteration and scaled to fit the grayscale range.
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-grayscale-value
Using a linear scale transformation is not enough to perceive the grayscale variation. To overcome this, we will boost
the perception by using a square root scale transformation (borrowed from Jeremy D. Frens in his
[blog post](http://www.programming-during-recess.net/2016/06/26/color-schemes-for-mandelbrot-sets/)):
\f$ f \left( x \right) = \sqrt{\frac{x}{\text{maxIter}}} \times 255 \f$
![](images/how_to_use_OpenCV_parallel_for_sqrt_scale_transformation.png)
The green curve corresponds to a simple linear scale transformation, the blue one to a square root scale transformation
and you can observe how the lowest values will be boosted when looking at the slope at these positions.
Parallel Mandelbrot implementation
--------------------------
When looking at the sequential implementation, we can notice that each pixel is computed independently. To optimize the
computation, we can perform multiple pixel calculations in parallel, by exploiting the multi-core architecture of modern
processor. To achieve this easily, we will use the OpenCV @ref cv::parallel_for_ framework.
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel
The first thing is to declare a custom class that inherits from @ref cv::ParallelLoopBody and to override the
`virtual void operator ()(const cv::Range& range) const`.
The range in the `operator ()` represents the subset of pixels that will be treated by an individual thread.
This splitting is done automatically to distribute equally the computation load. We have to convert the pixel index coordinate
to a 2D `[row, col]` coordinate. Also note that we have to keep a reference on the mat image to be able to modify in-place
the image.
The parallel execution is called with:
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call
Here, the range represents the total number of operations to be executed, so the total number of pixels in the image.
To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the
nstripes parameter in @ref cv::parallel_for_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)`
or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the
workload only on two threads.
@note
C++ 11 standard allows to simplify the parallel implementation by get rid of the `ParallelMandelbrot` class and replacing it with lambda expression:
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call-cxx11
Results
-----------
You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads
CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X.
Main reasons should be mostly due to:
* the overhead to create and manage the threads,
* background processes running in parallel,
* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores.
The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color
depending on the escaped iteration and using a color palette to get more aesthetic images):
![Mandelbrot set with xMin=-2.1, xMax=0.6, yMin=-1.2, yMax=1.2, maxIterations=500](images/how_to_use_OpenCV_parallel_for_Mandelbrot.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@@ -0,0 +1,194 @@
Mask operations on matrices {#tutorial_mat_mask_operations}
===========================
@prev_tutorial{tutorial_how_to_scan_images}
@next_tutorial{tutorial_mat_operations}
Mask operations on matrices are quite simple. The idea is that we recalculate each pixel's value in
an image according to a mask matrix (also known as kernel). This mask holds values that will adjust
how much influence neighboring pixels (and the current pixel) have on the new pixel value. From a
mathematical point of view we make a weighted average, with our specified values.
Our test case
-------------
Let's consider the issue of an image contrast enhancement method. Basically we want to apply for
every pixel of the image the following formula:
\f[I(i,j) = 5*I(i,j) - [ I(i-1,j) + I(i+1,j) + I(i,j-1) + I(i,j+1)]\f]\f[\iff I(i,j)*M, \text{where }
M = \bordermatrix{ _i\backslash ^j & -1 & 0 & +1 \cr
-1 & 0 & -1 & 0 \cr
0 & -1 & 5 & -1 \cr
+1 & 0 & -1 & 0 \cr
}\f]
The first notation is by using a formula, while the second is a compacted version of the first by
using a mask. You use the mask by putting the center of the mask matrix (in the upper case noted by
the zero-zero index) on the pixel you want to calculate and sum up the pixel values multiplied with
the overlapped matrix values. It's the same thing, however in case of large matrices the latter
notation is a lot easier to look over.
Code
----
@add_toggle_cpp
You can download this source code from [here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp) or look in the
OpenCV source code libraries sample directory at
`samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp`.
@include samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp
@end_toggle
@add_toggle_java
You can download this source code from [here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java) or look in the
OpenCV source code libraries sample directory at
`samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java`.
@include samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java
@end_toggle
@add_toggle_python
You can download this source code from [here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py) or look in the
OpenCV source code libraries sample directory at
`samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py`.
@include samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py
@end_toggle
The Basic Method
----------------
Now let us see how we can make this happen by using the basic pixel access method or by using the
**filter2D()** function.
Here's a function that will do this:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp basic_method
At first we make sure that the input images data is in unsigned char format. For this we use the
@ref cv::CV_Assert function that throws an error when the expression inside it is false.
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp 8_bit
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java basic_method
At first we make sure that the input images data in unsigned 8 bit format.
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java 8_bit
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py basic_method
At first we make sure that the input images data in unsigned 8 bit format.
@code{.py}
my_image = cv.cvtColor(my_image, cv.CV_8U)
@endcode
@end_toggle
We create an output image with the same size and the same type as our input. As you can see in the
@ref tutorial_how_to_scan_images_storing "storing" section, depending on the number of channels we may have one or more
subcolumns.
@add_toggle_cpp
We will iterate through them via pointers so the total number of elements depends on
this number.
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp create_channels
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java create_channels
@end_toggle
@add_toggle_python
@code{.py}
height, width, n_channels = my_image.shape
result = np.zeros(my_image.shape, my_image.dtype)
@endcode
@end_toggle
@add_toggle_cpp
We'll use the plain C [] operator to access pixels. Because we need to access multiple rows at the
same time we'll acquire the pointers for each of them (a previous, a current and a next line). We
need another pointer to where we're going to save the calculation. Then simply access the right
items with the [] operator. For moving the output pointer ahead we simply increase this (with one
byte) after each operation:
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp basic_method_loop
On the borders of the image the upper notation results inexistent pixel locations (like minus one -
minus one). In these points our formula is undefined. A simple solution is to not apply the kernel
in these points and, for example, set the pixels on the borders to zeros:
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp borders
@end_toggle
@add_toggle_java
We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j).
Then we apply the sum and put the new value in the Result matrix.
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java basic_method_loop
On the borders of the image the upper notation results in inexistent pixel locations (like (-1,-1)).
In these points our formula is undefined. A simple solution is to not apply the kernel
in these points and, for example, set the pixels on the borders to zeros:
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java borders
@end_toggle
@add_toggle_python
We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j).
Then we apply the sum and put the new value in the Result matrix.
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py basic_method_loop
@end_toggle
The filter2D function
---------------------
Applying such filters are so common in image processing that in OpenCV there is a function that
will take care of applying the mask (also called a kernel in some places). For this you first need
to define an object that holds the mask:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp kern
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java kern
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py kern
@end_toggle
Then call the **filter2D()** function specifying the input, the output image and the kernel to
use:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp filter2D
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java filter2D
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py filter2D
@end_toggle
The function even has a fifth optional argument to specify the center of the kernel, a sixth
for adding an optional value to the filtered pixels before storing them in K and a seventh one
for determining what to do in the regions where the operation is undefined (borders).
This function is shorter, less verbose and, because there are some optimizations, it is usually faster
than the *hand-coded method*. For example in my test while the second one took only 13
milliseconds the first took around 31 milliseconds. Quite some difference.
For example:
![](images/resultMatMaskFilter2D.png)
@add_toggle_cpp
Check out an instance of running the program on our [YouTube
channel](http://www.youtube.com/watch?v=7PF1tAU9se4) .
@youtube{7PF1tAU9se4}
@end_toggle

View File

@@ -0,0 +1,264 @@
Operations with images {#tutorial_mat_operations}
======================
@prev_tutorial{tutorial_mat_mask_operations}
@next_tutorial{tutorial_adding_images}
Input/Output
------------
### Images
Load an image from a file:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Load an image from a file
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Load an image from a file
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Load an image from a file
@end_toggle
If you read a jpg file, a 3 channel image is created by default. If you need a grayscale image, use:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Load an image from a file in grayscale
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Load an image from a file in grayscale
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Load an image from a file in grayscale
@end_toggle
@note Format of the file is determined by its content (first few bytes). To save an image to a file:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Save image
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Save image
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Save image
@end_toggle
@note Format of the file is determined by its extension.
@note Use cv::imdecode and cv::imencode to read and write an image from/to memory rather than a file.
Basic operations with images
----------------------------
### Accessing pixel intensity values
In order to get pixel intensity value, you have to know the type of an image and the number of
channels. Here is an example for a single channel grey scale image (type 8UC1) and pixel coordinates
x and y:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Pixel access 1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 1
@end_toggle
C++ version only:
intensity.val[0] contains a value from 0 to 255. Note the ordering of x and y. Since in OpenCV
images are represented by the same structure as matrices, we use the same convention for both
cases - the 0-based row index (or y-coordinate) goes first and the 0-based column index (or
x-coordinate) follows it. Alternatively, you can use the following notation (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 2
Now let us consider a 3 channel image with BGR color ordering (the default format returned by
imread):
**C++ code**
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 3
**Python Python**
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 3
You can use the same method for floating-point images (for example, you can get such an image by
running Sobel on a 3 channel image) (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 4
The same method can be used to change pixel intensities:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 5
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Pixel access 5
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 5
@end_toggle
There are functions in OpenCV, especially from calib3d module, such as cv::projectPoints, that take an
array of 2D or 3D points in the form of Mat. Matrix should contain exactly one column, each row
corresponds to a point, matrix type should be 32FC2 or 32FC3 correspondingly. Such a matrix can be
easily constructed from `std::vector` (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Mat from points vector
One can access a point in this matrix using the same method `Mat::at` (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Point access
### Memory management and reference counting
Mat is a structure that keeps matrix/image characteristics (rows and columns number, data type etc)
and a pointer to data. So nothing prevents us from having several instances of Mat corresponding to
the same data. A Mat keeps a reference count that tells if data has to be deallocated when a
particular instance of Mat is destroyed. Here is an example of creating two matrices without copying
data (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 1
As a result, we get a 32FC1 matrix with 3 columns instead of 32FC3 matrix with 1 column. `pointsMat`
uses data from points and will not deallocate the memory when destroyed. In this particular
instance, however, developer has to make sure that lifetime of `points` is longer than of `pointsMat`
If we need to copy the data, this is done using, for example, cv::Mat::copyTo or cv::Mat::clone:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 2
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Reference counting 2
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Reference counting 2
@end_toggle
An empty output Mat can be supplied to each function.
Each implementation calls Mat::create for a destination matrix.
This method allocates data for a matrix if it is empty.
If it is not empty and has the correct size and type, the method does nothing.
If however, size or type are different from the input arguments, the data is deallocated (and lost) and a new data is allocated.
For example:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 3
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Reference counting 3
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Reference counting 3
@end_toggle
### Primitive operations
There is a number of convenient operators defined on a matrix. For example, here is how we can make
a black image from an existing greyscale image `img`
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Set image to black
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Set image to black
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Set image to black
@end_toggle
Selecting a region of interest:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Select ROI
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Select ROI
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Select ROI
@end_toggle
Conversion from color to greyscale:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp BGR to Gray
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java BGR to Gray
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py BGR to Gray
@end_toggle
Change image type from 8UC1 to 32FC1:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Convert to CV_32F
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Convert to CV_32F
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Convert to CV_32F
@end_toggle
### Visualizing images
It is very useful to see intermediate results of your algorithm during development process. OpenCV
provides a convenient way of visualizing images. A 8U image can be shown using:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp imshow 1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java imshow 1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py imshow 1
@end_toggle
A call to waitKey() starts a message passing cycle that waits for a key stroke in the "image"
window. A 32F image needs to be converted to 8U type. For example:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp imshow 2
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java imshow 2
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py imshow 2
@end_toggle
@note Here cv::namedWindow is not necessary since it is immediately followed by cv::imshow.
Nevertheless, it can be used to change the window properties or when using cv::createTrackbar

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

View File

@@ -0,0 +1,271 @@
Mat - The Basic Image Container {#tutorial_mat_the_basic_image_container}
===============================
@next_tutorial{tutorial_how_to_scan_images}
Goal
----
We have multiple ways to acquire digital images from the real world: digital cameras, scanners,
computed tomography, and magnetic resonance imaging to name a few. In every case what we (humans)
see are images. However, when transforming this to our digital devices what we record are numerical
values for each of the points of the image.
![](images/MatBasicImageForComputer.jpg)
For example in the above image you can see that the mirror of the car is nothing more than a matrix
containing all the intensity values of the pixel points. How we get and store the pixels values may
vary according to our needs, but in the end all images inside a computer world may be reduced to
numerical matrices and other information describing the matrix itself. *OpenCV* is a computer vision
library whose main focus is to process and manipulate this information. Therefore, the first thing
you need to be familiar with is how OpenCV stores and handles images.
Mat
---
OpenCV has been around since 2001. In those days the library was built around a *C* interface and to
store the image in the memory they used a C structure called *IplImage*. This is the one you'll see
in most of the older tutorials and educational materials. The problem with this is that it brings to
the table all the minuses of the C language. The biggest issue is the manual memory management. It
builds on the assumption that the user is responsible for taking care of memory allocation and
deallocation. While this is not a problem with smaller programs, once your code base grows it will
be more of a struggle to handle all this rather than focusing on solving your development goal.
Luckily C++ came around and introduced the concept of classes making easier for the user through
automatic memory management (more or less). The good news is that C++ is fully compatible with C so
no compatibility issues can arise from making the change. Therefore, OpenCV 2.0 introduced a new C++
interface which offered a new way of doing things which means you do not need to fiddle with memory
management, making your code concise (less to write, to achieve more). The main downside of the C++
interface is that many embedded development systems at the moment support only C. Therefore, unless
you are targeting embedded platforms, there's no point to using the *old* methods (unless you're a
masochist programmer and you're asking for trouble).
The first thing you need to know about *Mat* is that you no longer need to manually allocate its
memory and release it as soon as you do not need it. While doing this is still a possibility, most
of the OpenCV functions will allocate its output data automatically. As a nice bonus if you pass on
an already existing *Mat* object, which has already allocated the required space for the matrix,
this will be reused. In other words we use at all times only as much memory as we need to perform
the task.
*Mat* is basically a class with two data parts: the matrix header (containing information such as
the size of the matrix, the method used for storing, at which address is the matrix stored, and so
on) and a pointer to the matrix containing the pixel values (taking any dimensionality depending on
the method chosen for storing) . The matrix header size is constant, however the size of the matrix
itself may vary from image to image and usually is larger by orders of magnitude.
OpenCV is an image processing library. It contains a large collection of image processing functions.
To solve a computational challenge, most of the time you will end up using multiple functions of the
library. Because of this, passing images to functions is a common practice. We should not forget
that we are talking about image processing algorithms, which tend to be quite computational heavy.
The last thing we want to do is further decrease the speed of your program by making unnecessary
copies of potentially *large* images.
To tackle this issue OpenCV uses a reference counting system. The idea is that each *Mat* object has
its own header, however a matrix may be shared between two *Mat* objects by having their matrix
pointers point to the same address. Moreover, the copy operators **will only copy the headers** and
the pointer to the large matrix, not the data itself.
@code{.cpp}
Mat A, C; // creates just the header parts
A = imread(argv[1], IMREAD_COLOR); // here we'll know the method used (allocate matrix)
Mat B(A); // Use the copy constructor
C = A; // Assignment operator
@endcode
All the above objects, in the end, point to the same single data matrix and making a modification
using any of them will affect all the other ones as well. In practice the different objects just
provide different access methods to the same underlying data. Nevertheless, their header parts are
different. The real interesting part is that you can create headers which refer to only a subsection
of the full data. For example, to create a region of interest (*ROI*) in an image you just create
a new header with the new boundaries:
@code{.cpp}
Mat D (A, Rect(10, 10, 100, 100) ); // using a rectangle
Mat E = A(Range::all(), Range(1,3)); // using row and column boundaries
@endcode
Now you may ask -- if the matrix itself may belong to multiple *Mat* objects who takes responsibility
for cleaning it up when it's no longer needed. The short answer is: the last object that used it.
This is handled by using a reference counting mechanism. Whenever somebody copies a header of a
*Mat* object, a counter is increased for the matrix. Whenever a header is cleaned, this counter
is decreased. When the counter reaches zero the matrix is freed. Sometimes you will want to copy
the matrix itself too, so OpenCV provides @ref cv::Mat::clone() and @ref cv::Mat::copyTo() functions.
@code{.cpp}
Mat F = A.clone();
Mat G;
A.copyTo(G);
@endcode
Now modifying *F* or *G* will not affect the matrix pointed by the *A*'s header. What you need to
remember from all this is that:
- Output image allocation for OpenCV functions is automatic (unless specified otherwise).
- You do not need to think about memory management with OpenCV's C++ interface.
- The assignment operator and the copy constructor only copies the header.
- The underlying matrix of an image may be copied using the @ref cv::Mat::clone() and @ref cv::Mat::copyTo()
functions.
Storing methods
-----------------
This is about how you store the pixel values. You can select the color space and the data type used.
The color space refers to how we combine color components in order to code a given color. The
simplest one is the grayscale where the colors at our disposal are black and white. The combination
of these allows us to create many shades of gray.
For *colorful* ways we have a lot more methods to choose from. Each of them breaks it down to three
or four basic components and we can use the combination of these to create the others. The most
popular one is RGB, mainly because this is also how our eye builds up colors. Its base colors are
red, green and blue. To code the transparency of a color sometimes a fourth element: alpha (A) is
added.
There are, however, many other color systems each with their own advantages:
- RGB is the most common as our eyes use something similar, however keep in mind that OpenCV standard display
system composes colors using the BGR color space (red and blue channels are swapped places).
- The HSV and HLS decompose colors into their hue, saturation and value/luminance components,
which is a more natural way for us to describe colors. You might, for example, dismiss the last
component, making your algorithm less sensible to the light conditions of the input image.
- YCrCb is used by the popular JPEG image format.
- CIE L\*a\*b\* is a perceptually uniform color space, which comes in handy if you need to measure
the *distance* of a given color to another color.
Each of the building components has its own valid domains. This leads to the data type used. How
we store a component defines the control we have over its domain. The smallest data type possible is
*char*, which means one byte or 8 bits. This may be unsigned (so can store values from 0 to 255) or
signed (values from -127 to +127). Although in case of three components this already gives 16
million possible colors to represent (like in case of RGB) we may acquire an even finer control by
using the float (4 byte = 32 bit) or double (8 byte = 64 bit) data types for each component.
Nevertheless, remember that increasing the size of a component also increases the size of the whole
picture in the memory.
Creating a Mat object explicitly
----------------------------------
In the @ref tutorial_load_save_image tutorial you have already learned how to write a matrix to an image
file by using the @ref cv::imwrite() function. However, for debugging purposes it's much more
convenient to see the actual values. You can do this using the \<\< operator of *Mat*. Be aware that
this only works for two dimensional matrices.
Although *Mat* works really well as an image container, it is also a general matrix class.
Therefore, it is possible to create and manipulate multidimensional matrices. You can create a Mat
object in multiple ways:
- @ref cv::Mat::Mat Constructor
@snippet mat_the_basic_image_container.cpp constructor
![](images/MatBasicContainerOut1.png)
For two dimensional and multichannel images we first define their size: row and column count wise.
Then we need to specify the data type to use for storing the elements and the number of channels
per matrix point. To do this we have multiple definitions constructed according to the following
convention:
@code
CV_[The number of bits per item][Signed or Unsigned][Type Prefix]C[The channel number]
@endcode
For instance, *CV_8UC3* means we use unsigned char types that are 8 bit long and each pixel has
three of these to form the three channels. There are types predefined for up to four channels. The
@ref cv::Scalar is four element short vector. Specify it and you can initialize all matrix
points with a custom value. If you need more you can create the type with the upper macro, setting
the channel number in parenthesis as you can see below.
- Use C/C++ arrays and initialize via constructor
@snippet mat_the_basic_image_container.cpp init
The upper example shows how to create a matrix with more than two dimensions. Specify its
dimension, then pass a pointer containing the size for each dimension and the rest remains the
same.
- @ref cv::Mat::create function:
@snippet mat_the_basic_image_container.cpp create
![](images/MatBasicContainerOut2.png)
You cannot initialize the matrix values with this construction. It will only reallocate its matrix
data memory if the new size will not fit into the old one.
- MATLAB style initializer: @ref cv::Mat::zeros , @ref cv::Mat::ones , @ref cv::Mat::eye . Specify size and
data type to use:
@snippet mat_the_basic_image_container.cpp matlab
![](images/MatBasicContainerOut3.png)
- For small matrices you may use comma separated initializers or initializer lists (C++11 support is required in the last case):
@snippet mat_the_basic_image_container.cpp comma
@snippet mat_the_basic_image_container.cpp list
![](images/MatBasicContainerOut6.png)
- Create a new header for an existing *Mat* object and @ref cv::Mat::clone or @ref cv::Mat::copyTo it.
@snippet mat_the_basic_image_container.cpp clone
![](images/MatBasicContainerOut7.png)
@note
You can fill out a matrix with random values using the @ref cv::randu() function. You need to
give a lower and upper limit for the random values:
@snippet mat_the_basic_image_container.cpp random
Output formatting
-----------------
In the above examples you could see the default formatting option. OpenCV, however, allows you to
format your matrix output:
- Default
@snippet mat_the_basic_image_container.cpp out-default
![](images/MatBasicContainerOut8.png)
- Python
@snippet mat_the_basic_image_container.cpp out-python
![](images/MatBasicContainerOut16.png)
- Comma separated values (CSV)
@snippet mat_the_basic_image_container.cpp out-csv
![](images/MatBasicContainerOut10.png)
- Numpy
@snippet mat_the_basic_image_container.cpp out-numpy
![](images/MatBasicContainerOut9.png)
- C
@snippet mat_the_basic_image_container.cpp out-c
![](images/MatBasicContainerOut11.png)
Output of other common items
----------------------------
OpenCV offers support for output of other common OpenCV data structures too via the \<\< operator:
- 2D Point
@snippet mat_the_basic_image_container.cpp out-point2
![](images/MatBasicContainerOut12.png)
- 3D Point
@snippet mat_the_basic_image_container.cpp out-point3
![](images/MatBasicContainerOut13.png)
- std::vector via cv::Mat
@snippet mat_the_basic_image_container.cpp out-vector
![](images/MatBasicContainerOut14.png)
- std::vector of points
@snippet mat_the_basic_image_container.cpp out-vector-points
![](images/MatBasicContainerOut15.png)
Most of the samples here have been included in a small console application. You can download it from
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/mat_the_basic_image_container/mat_the_basic_image_container.cpp)
or in the core section of the cpp samples.
You can also find a quick video demonstration of this on
[YouTube](https://www.youtube.com/watch?v=1tibU7vGWpk).
@youtube{1tibU7vGWpk}

View File

@@ -0,0 +1,89 @@
The Core Functionality (core module) {#tutorial_table_of_content_core}
=====================================
Here you will learn the about the basic building blocks of the library. A must read and know for
understanding how to manipulate the images on a pixel level.
- @subpage tutorial_mat_the_basic_image_container
*Compatibility:* \> OpenCV 2.0
*Author:* Bernát Gábor
You will learn how to store images in the memory and how to print out their content to the
console.
- @subpage tutorial_how_to_scan_images
*Compatibility:* \> OpenCV 2.0
*Author:* Bernát Gábor
You'll find out how to scan images (go through each of the image pixels) with OpenCV.
Bonus: time measurement with OpenCV.
- @subpage tutorial_mat_mask_operations
*Languages:* C++, Java, Python
*Compatibility:* \> OpenCV 2.0
*Author:* Bernát Gábor
You'll find out how to scan images with neighbor access and use the @ref cv::filter2D
function to apply kernel filters on images.
- @subpage tutorial_mat_operations
*Languages:* C++, Java, Python
*Compatibility:* \> OpenCV 2.0
Reading/writing images from file, accessing pixels, primitive operations, visualizing images.
- @subpage tutorial_adding_images
*Languages:* C++, Java, Python
*Compatibility:* \> OpenCV 2.0
*Author:* Ana Huamán
We will learn how to blend two images!
- @subpage tutorial_basic_linear_transform
*Languages:* C++, Java, Python
*Compatibility:* \> OpenCV 2.0
*Author:* Ana Huamán
We will learn how to change our image appearance!
- @subpage tutorial_discrete_fourier_transform
*Languages:* C++, Java, Python
*Compatibility:* \> OpenCV 2.0
*Author:* Bernát Gábor
You will see how and why use the Discrete Fourier transformation with OpenCV.
- @subpage tutorial_file_input_output_with_xml_yml
*Compatibility:* \> OpenCV 2.0
*Author:* Bernát Gábor
You will see how to use the @ref cv::FileStorage data structure of OpenCV to write and read
data to XML or YAML file format.
- @subpage tutorial_how_to_use_OpenCV_parallel_for_
*Compatibility:* \>= OpenCV 2.4.3
You will see how to use the OpenCV parallel_for_ to easily parallelize your code.

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

View File

@@ -0,0 +1,97 @@
# How to run deep networks on Android device {#tutorial_dnn_android}
## Introduction
In this tutorial you'll know how to run deep learning networks on Android device
using OpenCV deep learning module.
Tutorial was written for the following versions of corresponding software:
- Android Studio 2.3.3
- OpenCV 3.3.0+
## Requirements
- Download and install Android Studio from https://developer.android.com/studio.
- Get the latest pre-built OpenCV for Android release from https://github.com/opencv/opencv/releases and unpack it (for example, `opencv-4.2.0-android-sdk.zip`).
- Download MobileNet object detection model from https://github.com/chuanqi305/MobileNet-SSD. We need a configuration file `MobileNetSSD_deploy.prototxt` and weights `MobileNetSSD_deploy.caffemodel`.
## Create an empty Android Studio project
- Open Android Studio. Start a new project. Let's call it `opencv_mobilenet`.
![](1_start_new_project.png)
- Keep default target settings.
![](2_start_new_project.png)
- Use "Empty Activity" template. Name activity as `MainActivity` with a
corresponding layout `activity_main`.
![](3_start_new_project.png)
![](4_start_new_project.png)
- Wait until a project was created. Go to `Run->Edit Configurations`.
Choose `USB Device` as target device for runs.
![](5_setup.png)
Plug in your device and run the project. It should be installed and launched
successfully before we'll go next.
@note Read @ref tutorial_android_dev_intro in case of problems.
![](6_run_empty_project.png)
## Add OpenCV dependency
- Go to `File->New->Import module` and provide a path to `unpacked_OpenCV_package/sdk/java`. The name of module detects automatically.
Disable all features that Android Studio will suggest you on the next window.
![](7_import_module.png)
![](8_import_module.png)
- Open two files:
1. `AndroidStudioProjects/opencv_mobilenet/app/build.gradle`
2. `AndroidStudioProjects/opencv_mobilenet/openCVLibrary330/build.gradle`
Copy both `compileSdkVersion` and `buildToolsVersion` from the first file to
the second one.
`compileSdkVersion 14` -> `compileSdkVersion 26`
`buildToolsVersion "25.0.0"` -> `buildToolsVersion "26.0.1"`
- Make the project. There is no errors should be at this point.
- Go to `File->Project Structure`. Add OpenCV module dependency.
![](9_opencv_dependency.png)
![](10_opencv_dependency.png)
- Install once an appropriate OpenCV manager from `unpacked_OpenCV_package/apk`
to target device.
@code
adb install OpenCV_3.3.0_Manager_3.30_armeabi-v7a.apk
@endcode
- Congratulations! We're ready now to make a sample using OpenCV.
## Make a sample
Our sample will takes pictures from a camera, forwards it into a deep network and
receives a set of rectangles, class identifiers and confidence values in `[0, 1]`
range.
- First of all, we need to add a necessary widget which displays processed
frames. Modify `app/src/main/res/layout/activity_main.xml`:
@include android/mobilenet-objdetect/res/layout/activity_main.xml
- Put downloaded `MobileNetSSD_deploy.prototxt` and `MobileNetSSD_deploy.caffemodel`
into `app/build/intermediates/assets/debug` folder.
- Modify `/app/src/main/AndroidManifest.xml` to enable full-screen mode, set up
a correct screen orientation and allow to use a camera.
@include android/mobilenet-objdetect/AndroidManifest.xml
- Replace content of `app/src/main/java/org/opencv/samples/opencv_mobilenet/MainActivity.java`:
@include android/mobilenet-objdetect/src/org/opencv/samples/opencv_mobilenet/MainActivity.java
- Launch an application and make a fun!
![](11_demo.jpg)

View File

@@ -0,0 +1,226 @@
# Custom deep learning layers support {#tutorial_dnn_custom_layers}
## Introduction
Deep learning is a fast growing area. The new approaches to build neural networks
usually introduce new types of layers. They could be modifications of existing
ones or implement outstanding researching ideas.
OpenCV gives an opportunity to import and run networks from different deep learning
frameworks. There are a number of the most popular layers. However you can face
a problem that your network cannot be imported using OpenCV because of unimplemented layers.
The first solution is to create a feature request at https://github.com/opencv/opencv/issues
mentioning details such a source of model and type of new layer. A new layer could
be implemented if OpenCV community shares this need.
The second way is to define a **custom layer** so OpenCV's deep learning engine
will know how to use it. This tutorial is dedicated to show you a process of deep
learning models import customization.
## Define a custom layer in C++
Deep learning layer is a building block of network's pipeline.
It has connections to **input blobs** and produces results to **output blobs**.
There are trained **weights** and **hyper-parameters**.
Layers' names, types, weights and hyper-parameters are stored in files are generated by
native frameworks during training. If OpenCV mets unknown layer type it throws an
exception trying to read a model:
```
Unspecified error: Can't create layer "layer_name" of type "MyType" in function getLayerInstance
```
To import the model correctly you have to derive a class from cv::dnn::Layer with
the following methods:
@snippet dnn/custom_layers.hpp A custom layer interface
And register it before the import:
@snippet dnn/custom_layers.hpp Register a custom layer
@note `MyType` is a type of unimplemented layer from the thrown exception.
Let's see what all the methods do:
- Constructor
@snippet dnn/custom_layers.hpp MyLayer::MyLayer
Retrieves hyper-parameters from cv::dnn::LayerParams. If your layer has trainable
weights they will be already stored in the Layer's member cv::dnn::Layer::blobs.
- A static method `create`
@snippet dnn/custom_layers.hpp MyLayer::create
This method should create an instance of you layer and return cv::Ptr with it.
- Output blobs' shape computation
@snippet dnn/custom_layers.hpp MyLayer::getMemoryShapes
Returns layer's output shapes depends on input shapes. You may request an extra
memory using `internals`.
- Run a layer
@snippet dnn/custom_layers.hpp MyLayer::forward
Implement a layer's logic here. Compute outputs for given inputs.
@note OpenCV manages memory allocated for layers. In the most cases the same memory
can be reused between layers. So your `forward` implementation should not rely that
the second invocation of `forward` will has the same data at `outputs` and `internals`.
- Optional `finalize` method
@snippet dnn/custom_layers.hpp MyLayer::finalize
The chain of methods are the following: OpenCV deep learning engine calls `create`
method once then it calls `getMemoryShapes` for an every created layer then you
can make some preparations depends on known input dimensions at cv::dnn::Layer::finalize.
After network was initialized only `forward` method is called for an every network's input.
@note Varying input blobs' sizes such height or width or batch size you make OpenCV
reallocate all the internal memory. That leads efficiency gaps. Try to initialize
and deploy models using a fixed batch size and image's dimensions.
## Example: custom layer from Caffe
Let's create a custom layer `Interp` from https://github.com/cdmh/deeplab-public.
It's just a simple resize that takes an input blob of size `N x C x Hi x Wi` and returns
an output blob of size `N x C x Ho x Wo` where `N` is a batch size, `C` is a number of channels,
`Hi x Wi` and `Ho x Wo` are input and output `height x width` correspondingly.
This layer has no trainable weights but it has hyper-parameters to specify an output size.
In example,
~~~~~~~~~~~~~
layer {
name: "output"
type: "Interp"
bottom: "input"
top: "output"
interp_param {
height: 9
width: 8
}
}
~~~~~~~~~~~~~
This way our implementation can look like:
@snippet dnn/custom_layers.hpp InterpLayer
Next we need to register a new layer type and try to import the model.
@snippet dnn/custom_layers.hpp Register InterpLayer
## Example: custom layer from TensorFlow
This is an example of how to import a network with [tf.image.resize_bilinear](https://www.tensorflow.org/versions/master/api_docs/python/tf/image/resize_bilinear)
operation. This is also a resize but with an implementation different from OpenCV's or `Interp` above.
Let's create a single layer network:
~~~~~~~~~~~~~{.py}
inp = tf.placeholder(tf.float32, [2, 3, 4, 5], 'input')
resized = tf.image.resize_bilinear(inp, size=[9, 8], name='resize_bilinear')
~~~~~~~~~~~~~
OpenCV sees that TensorFlow's graph in the following way:
```
node {
name: "input"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
}
node {
name: "resize_bilinear/size"
op: "Const"
attr {
key: "dtype"
value {
type: DT_INT32
}
}
attr {
key: "value"
value {
tensor {
dtype: DT_INT32
tensor_shape {
dim {
size: 2
}
}
tensor_content: "\t\000\000\000\010\000\000\000"
}
}
}
}
node {
name: "resize_bilinear"
op: "ResizeBilinear"
input: "input:0"
input: "resize_bilinear/size"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
attr {
key: "align_corners"
value {
b: false
}
}
}
library {
}
```
Custom layers import from TensorFlow is designed to put all layer's `attr` into
cv::dnn::LayerParams but input `Const` blobs into cv::dnn::Layer::blobs.
In our case resize's output shape will be stored in layer's `blobs[0]`.
@snippet dnn/custom_layers.hpp ResizeBilinearLayer
Next we register a layer and try to import the model.
@snippet dnn/custom_layers.hpp Register ResizeBilinearLayer
## Define a custom layer in Python
The following example shows how to customize OpenCV's layers in Python.
Let's consider [Holistically-Nested Edge Detection](https://arxiv.org/abs/1504.06375)
deep learning model. That was trained with one and only difference comparing to
a current version of [Caffe framework](http://caffe.berkeleyvision.org/). `Crop`
layers that receive two input blobs and crop the first one to match spatial dimensions
of the second one used to crop from the center. Nowadays Caffe's layer does it
from the top-left corner. So using the latest version of Caffe or OpenCV you'll
get shifted results with filled borders.
Next we're going to replace OpenCV's `Crop` layer that makes top-left cropping by
a centric one.
- Create a class with `getMemoryShapes` and `forward` methods
@snippet dnn/edge_detection.py CropLayer
@note Both methods should return lists.
- Register a new layer.
@snippet dnn/edge_detection.py Register
That's it! We've replaced an implemented OpenCV's layer to a custom one.
You may find a full script in the [source code](https://github.com/opencv/opencv/tree/master/samples/dnn/edge_detection.py).
<table border="0">
<tr>
<td>![](js_tutorials/js_assets/lena.jpg)</td>
<td>![](images/lena_hed.jpg)</td>
</tr>
</table>

View File

@@ -0,0 +1,65 @@
Load Caffe framework models {#tutorial_dnn_googlenet}
===========================
Introduction
------------
In this tutorial you will learn how to use opencv_dnn module for image classification by using
GoogLeNet trained network from [Caffe model zoo](http://caffe.berkeleyvision.org/model_zoo.html).
We will demonstrate results of this example on the following picture.
![Buran space shuttle](images/space_shuttle.jpg)
Source Code
-----------
We will be using snippets from the example application, that can be downloaded [here](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.cpp).
@include dnn/classification.cpp
Explanation
-----------
-# Firstly, download GoogLeNet model files:
[bvlc_googlenet.prototxt ](https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/bvlc_googlenet.prototxt) and
[bvlc_googlenet.caffemodel](http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel)
Also you need file with names of [ILSVRC2012](http://image-net.org/challenges/LSVRC/2012/browse-synsets) classes:
[classification_classes_ILSVRC2012.txt](https://github.com/opencv/opencv/blob/master/samples/data/dnn/classification_classes_ILSVRC2012.txt).
Put these files into working dir of this program example.
-# Read and initialize network using path to .prototxt and .caffemodel files
@snippet dnn/classification.cpp Read and initialize network
You can skip an argument `framework` if one of the files `model` or `config` has an
extension `.caffemodel` or `.prototxt`.
This way function cv::dnn::readNet can automatically detects a model's format.
-# Read input image and convert to the blob, acceptable by GoogleNet
@snippet dnn/classification.cpp Open a video file or an image file or a camera stream
cv::VideoCapture can load both images and videos.
@snippet dnn/classification.cpp Create a 4D blob from a frame
We convert the image to a 4-dimensional blob (so-called batch) with `1x3x224x224` shape
after applying necessary pre-processing like resizing and mean subtraction
`(-104, -117, -123)` for each blue, green and red channels correspondingly using cv::dnn::blobFromImage function.
-# Pass the blob to the network
@snippet dnn/classification.cpp Set input blob
-# Make forward pass
@snippet dnn/classification.cpp Make forward pass
During the forward pass output of each network layer is computed, but in this example we need output from the last layer only.
-# Determine the best class
@snippet dnn/classification.cpp Get a class with a highest score
We put the output of network, which contain probabilities for each of 1000 ILSVRC2012 image classes, to the `prob` blob.
And find the index of element with maximal value in this one. This index corresponds to the class of the image.
-# Run an example from command line
@code
./example_dnn_classification --model=bvlc_googlenet.caffemodel --config=bvlc_googlenet.prototxt --width=224 --height=224 --classes=classification_classes_ILSVRC2012.txt --input=space_shuttle.jpg --mean="104 117 123"
@endcode
For our image we get prediction of class `space shuttle` with more than 99% sureness.

View File

@@ -0,0 +1,78 @@
# How to enable Halide backend for improve efficiency {#tutorial_dnn_halide}
## Introduction
This tutorial guidelines how to run your models in OpenCV deep learning module
using Halide language backend. Halide is an open-source project that let us
write image processing algorithms in well-readable format, schedule computations
according to specific device and evaluate it with a quite good efficiency.
An official website of the Halide project: http://halide-lang.org/.
An up to date efficiency comparison: https://github.com/opencv/opencv/wiki/DNN-Efficiency
## Requirements
### LLVM compiler
@note LLVM compilation might take a long time.
- Download LLVM source code from http://releases.llvm.org/4.0.0/llvm-4.0.0.src.tar.xz.
Unpack it. Let **llvm_root** is a root directory of source code.
- Create directory **llvm_root**/tools/clang
- Download Clang with the same version as LLVM. In our case it will be from
http://releases.llvm.org/4.0.0/cfe-4.0.0.src.tar.xz. Unpack it into
**llvm_root**/tools/clang. Note that it should be a root for Clang source code.
- Build LLVM on Linux
@code
cd llvm_root
mkdir build && cd build
cmake -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release ..
make -j4
@endcode
- Build LLVM on Windows (Developer Command Prompt)
@code
mkdir \\path-to-llvm-build\\ && cd \\path-to-llvm-build\\
cmake.exe -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=\\path-to-llvm-install\\ -G "Visual Studio 14 Win64" \\path-to-llvm-src\\
MSBuild.exe /m:4 /t:Build /p:Configuration=Release .\\INSTALL.vcxproj
@endcode
@note `\\path-to-llvm-build\\` and `\\path-to-llvm-install\\` are different directories.
### Halide language.
- Download source code from GitHub repository, https://github.com/halide/Halide
or using git. The root directory will be a **halide_root**.
@code
git clone https://github.com/halide/Halide.git
@endcode
- Build Halide on Linux
@code
cd halide_root
mkdir build && cd build
cmake -DLLVM_DIR=llvm_root/build/lib/cmake/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_VERSION=40 -DWITH_TESTS=OFF -DWITH_APPS=OFF -DWITH_TUTORIALS=OFF ..
make -j4
@endcode
- Build Halide on Windows (Developer Command Prompt)
@code
cd halide_root
mkdir build && cd build
cmake.exe -DLLVM_DIR=\\path-to-llvm-install\\lib\\cmake\\llvm -DLLVM_VERSION=40 -DWITH_TESTS=OFF -DWITH_APPS=OFF -DWITH_TUTORIALS=OFF -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 14 Win64" ..
MSBuild.exe /m:4 /t:Build /p:Configuration=Release .\\ALL_BUILD.vcxproj
@endcode
## Build OpenCV with Halide backend
When you build OpenCV add the following configuration flags:
- `WITH_HALIDE` - enable Halide linkage
- `HALIDE_ROOT_DIR` - path to Halide build directory
## Set Halide as a preferable backend
@code
net.setPreferableBackend(DNN_BACKEND_HALIDE);
@endcode

View File

@@ -0,0 +1,82 @@
# How to schedule your network for Halide backend {#tutorial_dnn_halide_scheduling}
## Introduction
Halide code is the same for every device we use. But for achieving the satisfied
efficiency we should schedule computations properly. In this tutorial we describe
the ways to schedule your networks using Halide backend in OpenCV deep learning module.
For better understanding of Halide scheduling you might want to read tutorials @ http://halide-lang.org/tutorials.
If it's your first meeting with Halide in OpenCV, we recommend to start from @ref tutorial_dnn_halide.
## Configuration files
You can schedule computations of Halide pipeline by writing textual configuration files.
It means that you can easily vectorize, parallelize and manage loops order of
layers computation. Pass path to file with scheduling directives for specific
device into ```cv::dnn::Net::setHalideScheduler``` before the first ```cv::dnn::Net::forward``` call.
Scheduling configuration files represented as YAML files where each node is a
scheduled function or a scheduling directive.
@code
relu1:
reorder: [x, c, y]
split: { y: 2, c: 8 }
parallel: [yo, co]
unroll: yi
vectorize: { x: 4 }
conv1_constant_exterior:
compute_at: { relu1: yi }
@endcode
Considered use variables `n` for batch dimension, `c` for channels,
`y` for rows and `x` for columns. For variables after split are used names
with the same prefix but `o` and `i` suffixes for outer and inner variables
correspondingly. In example, for variable `x` in range `[0, 10)` directive
`split: { x: 2 }` gives new ones `xo` in range `[0, 5)` and `xi` in range `[0, 2)`.
Variable name `x` is no longer available in the same scheduling node.
You can find scheduling examples at [opencv_extra/testdata/dnn](https://github.com/opencv/opencv_extra/tree/master/testdata/dnn)
and use it for schedule your networks.
## Layers fusing
Thanks to layers fusing we can schedule only the top layers of fused sets.
Because for every output value we use the fused formula.
In example, if you have three layers Convolution + Scale + ReLU one by one,
@code
conv(x, y, c, n) = sum(...) + bias(c);
scale(x, y, c, n) = conv(x, y, c, n) * weights(c);
relu(x, y, c, n) = max(scale(x, y, c, n), 0);
@endcode
fused function is something like
@code
relu(x, y, c, n) = max((sum(...) + bias(c)) * weights(c), 0);
@endcode
So only function called `relu` require scheduling.
## Scheduling patterns
Sometimes networks built using blocked structure that means some layer are
identical or quite similar. If you want to apply the same scheduling for
different layers accurate to tiling or vectorization factors, define scheduling
patterns in section `patterns` at the beginning of scheduling file.
Also, your patters may use some parametric variables.
@code
# At the beginning of the file
patterns:
fully_connected:
split: { c: c_split }
fuse: { src: [x, y, co], dst: block }
parallel: block
vectorize: { ci: c_split }
# Somewhere below
fc8:
pattern: fully_connected
params: { c_split: 8 }
@endcode
## Automatic scheduling
You can let DNN to schedule layers automatically. Just skip call of ```cv::dnn::Net::setHalideScheduler```. Sometimes it might be even more efficient than manual scheduling.
But if specific layers require be scheduled manually, you would be able to
mix both manual and automatic scheduling ways. Write scheduling file
and skip layers that you want to be scheduled automatically.

View File

@@ -0,0 +1,44 @@
# How to run deep networks in browser {#tutorial_dnn_javascript}
## Introduction
This tutorial will show us how to run deep learning models using OpenCV.js right
in a browser. Tutorial refers a sample of face detection and face recognition
models pipeline.
## Face detection
Face detection network gets BGR image as input and produces set of bounding boxes
that might contain faces. All that we need is just select the boxes with a strong
confidence.
## Face recognition
Network is called OpenFace (project https://github.com/cmusatyalab/openface).
Face recognition model receives RGB face image of size `96x96`. Then it returns
`128`-dimensional unit vector that represents input face as a point on the unit
multidimensional sphere. So difference between two faces is an angle between two
output vectors.
## Sample
All the sample is an HTML page that has JavaScript code to use OpenCV.js functionality.
You may see an insertion of this page below. Press `Start` button to begin a demo.
Press `Add a person` to name a person that is recognized as an unknown one.
Next we'll discuss main parts of the code.
@htmlinclude js_face_recognition.html
-# Run face detection network to detect faces on input image.
@snippet dnn/js_face_recognition.html Run face detection model
You may play with input blob sizes to balance detection quality and efficiency.
The bigger input blob the smaller faces may be detected.
-# Run face recognition network to receive `128`-dimensional unit feature vector by input face image.
@snippet dnn/js_face_recognition.html Get 128 floating points feature vector
-# Perform a recognition.
@snippet dnn/js_face_recognition.html Recognize
Match a new feature vector with registered ones. Return a name of the best matched person.
-# The main loop.
@snippet dnn/js_face_recognition.html Define frames processing
A main loop of our application receives a frames from a camera and makes a recognition
of an every detected face on the frame. We start this function ones when OpenCV.js was
initialized and deep learning models were downloaded.

View File

@@ -0,0 +1,44 @@
YOLO DNNs {#tutorial_dnn_yolo}
===============================
Introduction
------------
In this text you will learn how to use opencv_dnn module using yolo_object_detection (Sample of using OpenCV dnn module in real time with device capture, video and image).
We will demonstrate results of this example on the following picture.
![Picture example](images/yolo.jpg)
Examples
--------
VIDEO DEMO:
@youtube{NHtRlndE2cg}
Source Code
-----------
Use a universal sample for object detection models written
[in C++](https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp) and
[in Python](https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py) languages
Usage examples
--------------
Execute in webcam:
@code{.bash}
$ example_dnn_object_detection --config=[PATH-TO-DARKNET]/cfg/yolo.cfg --model=[PATH-TO-DARKNET]/yolo.weights --classes=object_detection_classes_pascal_voc.txt --width=416 --height=416 --scale=0.00392 --rgb
@endcode
Execute with image or video file:
@code{.bash}
$ example_dnn_object_detection --config=[PATH-TO-DARKNET]/cfg/yolo.cfg --model=[PATH-TO-DARKNET]/yolo.weights --classes=object_detection_classes_pascal_voc.txt --width=416 --height=416 --scale=0.00392 --input=[PATH-TO-IMAGE-OR-VIDEO-FILE] --rgb
@endcode
Questions and suggestions email to: Alessandro de Oliveira Faria cabelo@opensuse.org or OpenCV Team.

Binary file not shown.

After

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

View File

@@ -0,0 +1,58 @@
Deep Neural Networks (dnn module) {#tutorial_table_of_content_dnn}
=====================================
- @subpage tutorial_dnn_googlenet
*Compatibility:* \> OpenCV 3.3
*Author:* Vitaliy Lyudvichenko
In this tutorial you will learn how to use opencv_dnn module for image classification by using GoogLeNet trained network from Caffe model zoo.
- @subpage tutorial_dnn_halide
*Compatibility:* \> OpenCV 3.3
*Author:* Dmitry Kurtaev
This tutorial guidelines how to run your models in OpenCV deep learning module using Halide language backend.
- @subpage tutorial_dnn_halide_scheduling
*Compatibility:* \> OpenCV 3.3
*Author:* Dmitry Kurtaev
In this tutorial we describe the ways to schedule your networks using Halide backend in OpenCV deep learning module.
- @subpage tutorial_dnn_android
*Compatibility:* \> OpenCV 3.3
*Author:* Dmitry Kurtaev
This tutorial will show you how to run deep learning model using OpenCV on Android device.
- @subpage tutorial_dnn_yolo
*Compatibility:* \> OpenCV 3.3.1
*Author:* Alessandro de Oliveira Faria
In this tutorial you will learn how to use opencv_dnn module using yolo_object_detection with device capture, video file or image.
- @subpage tutorial_dnn_javascript
*Compatibility:* \> OpenCV 3.3.1
*Author:* Dmitry Kurtaev
In this tutorial we'll run deep learning models in browser using OpenCV.js.
- @subpage tutorial_dnn_custom_layers
*Compatibility:* \> OpenCV 3.4.1
*Author:* Dmitry Kurtaev
How to define custom layers to import networks.

View File

@@ -0,0 +1,173 @@
AKAZE local features matching {#tutorial_akaze_matching}
=============================
Introduction
------------
In this tutorial we will learn how to use AKAZE @cite ANB13 local features to detect and match keypoints on
two images.
We will find keypoints on a pair of images with given homography matrix, match them and count the
number of inliers (i.e. matches that fit in the given homography).
You can find expanded version of this example here:
<https://github.com/pablofdezalc/test_kaze_akaze_opencv>
Data
----
We are going to use images 1 and 3 from *Graffiti* sequence of [Oxford dataset](http://www.robots.ox.ac.uk/~vgg/data/data-aff.html).
![](images/graf.png)
Homography is given by a 3 by 3 matrix:
@code{.none}
7.6285898e-01 -2.9922929e-01 2.2567123e+02
3.3443473e-01 1.0143901e+00 -7.6999973e+01
3.4663091e-04 -1.4364524e-05 1.0000000e+00
@endcode
You can find the images (*graf1.png*, *graf3.png*) and homography (*H1to3p.xml*) in
*opencv/samples/data/*.
### Source Code
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/features2D/AKAZE_match.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/features2D/AKAZE_match.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py)
- **Code at glance:**
@include samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py
@end_toggle
### Explanation
- **Load images and homography**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp load
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java load
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py load
@end_toggle
We are loading grayscale images here. Homography is stored in the xml created with FileStorage.
- **Detect keypoints and compute descriptors using AKAZE**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp AKAZE
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java AKAZE
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py AKAZE
@end_toggle
We create AKAZE and detect and compute AKAZE keypoints and descriptors. Since we don't need the *mask*
parameter, *noArray()* is used.
- **Use brute-force matcher to find 2-nn matches**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp 2-nn matching
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java 2-nn matching
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py 2-nn matching
@end_toggle
We use Hamming distance, because AKAZE uses binary descriptor by default.
- **Use 2-nn matches and ratio criterion to find correct keypoint matches**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp ratio test filtering
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java ratio test filtering
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py ratio test filtering
@end_toggle
If the closest match distance is significantly lower than the second closest one, then the match is correct (match is not ambiguous).
- **Check if our matches fit in the homography model**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp homography check
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java homography check
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py homography check
@end_toggle
If the distance from first keypoint's projection to the second keypoint is less than threshold,
then it fits the homography model.
We create a new set of matches for the inliers, because it is required by the drawing function.
- **Output results**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp draw final matches
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java draw final matches
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py draw final matches
@end_toggle
Here we save the resulting image and print some statistics.
Results
-------
### Found matches
![](images/res.png)
Depending on your OpenCV version, you should get results coherent with:
@code{.none}
Keypoints 1: 2943
Keypoints 2: 3511
Matches: 447
Inliers: 308
Inlier Ratio: 0.689038
@endcode

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

View File

@@ -0,0 +1,140 @@
AKAZE and ORB planar tracking {#tutorial_akaze_tracking}
=============================
Introduction
------------
In this tutorial we will compare *AKAZE* and *ORB* local features using them to find matches between
video frames and track object movements.
The algorithm is as follows:
- Detect and describe keypoints on the first frame, manually set object boundaries
- For every next frame:
-# Detect and describe keypoints
-# Match them using bruteforce matcher
-# Estimate homography transformation using RANSAC
-# Filter inliers from all the matches
-# Apply homography transformation to the bounding box to find the object
-# Draw bounding box and inliers, compute inlier ratio as evaluation metric
![](images/frame.png)
Data
----
To do the tracking we need a video and object position on the first frame.
You can download our example video and data from
[here](https://docs.google.com/file/d/0B72G7D4snftJandBb0taLVJHMFk).
To run the code you have to specify input (camera id or video_file). Then, select a bounding box with the mouse, and press any key to start tracking
@code{.none}
./planar_tracking blais.mp4
@endcode
Source Code
-----------
@include cpp/tutorial_code/features2D/AKAZE_tracking/planar_tracking.cpp
Explanation
-----------
### Tracker class
This class implements algorithm described abobve using given feature detector and descriptor
matcher.
- **Setting up the first frame**
@code{.cpp}
void Tracker::setFirstFrame(const Mat frame, vector<Point2f> bb, string title, Stats& stats)
{
first_frame = frame.clone();
(*detector)(first_frame, noArray(), first_kp, first_desc);
stats.keypoints = (int)first_kp.size();
drawBoundingBox(first_frame, bb);
putText(first_frame, title, Point(0, 60), FONT_HERSHEY_PLAIN, 5, Scalar::all(0), 4);
object_bb = bb;
}
@endcode
We compute and store keypoints and descriptors from the first frame and prepare it for the
output.
We need to save number of detected keypoints to make sure both detectors locate roughly the same
number of those.
- **Processing frames**
-# Locate keypoints and compute descriptors
@code{.cpp}
(*detector)(frame, noArray(), kp, desc);
@endcode
To find matches between frames we have to locate the keypoints first.
In this tutorial detectors are set up to find about 1000 keypoints on each frame.
-# Use 2-nn matcher to find correspondences
@code{.cpp}
matcher->knnMatch(first_desc, desc, matches, 2);
for(unsigned i = 0; i < matches.size(); i++) {
if(matches[i][0].distance < nn_match_ratio * matches[i][1].distance) {
matched1.push_back(first_kp[matches[i][0].queryIdx]);
matched2.push_back( kp[matches[i][0].trainIdx]);
}
}
@endcode
If the closest match is *nn_match_ratio* closer than the second closest one, then it's a
match.
-# Use *RANSAC* to estimate homography transformation
@code{.cpp}
homography = findHomography(Points(matched1), Points(matched2),
RANSAC, ransac_thresh, inlier_mask);
@endcode
If there are at least 4 matches we can use random sample consensus to estimate image
transformation.
-# Save the inliers
@code{.cpp}
for(unsigned i = 0; i < matched1.size(); i++) {
if(inlier_mask.at<uchar>(i)) {
int new_i = static_cast<int>(inliers1.size());
inliers1.push_back(matched1[i]);
inliers2.push_back(matched2[i]);
inlier_matches.push_back(DMatch(new_i, new_i, 0));
}
}
@endcode
Since *findHomography* computes the inliers we only have to save the chosen points and
matches.
-# Project object bounding box
@code{.cpp}
perspectiveTransform(object_bb, new_bb, homography);
@endcode
If there is a reasonable number of inliers we can use estimated transformation to locate the
object.
Results
-------
You can watch the resulting [video on youtube](http://www.youtube.com/watch?v=LWY-w8AGGhE).
*AKAZE* statistics:
@code{.none}
Matches 626
Inliers 410
Inlier ratio 0.58
Keypoints 1117
@endcode
*ORB* statistics:
@code{.none}
Matches 504
Inliers 319
Inlier ratio 0.56
Keypoints 1112
@endcode

Binary file not shown.

After

Width:  |  Height:  |  Size: 318 KiB

View File

@@ -0,0 +1,52 @@
Detection of planar objects {#tutorial_detection_of_planar_objects}
===========================
The goal of this tutorial is to learn how to use *features2d* and *calib3d* modules for detecting
known planar objects in scenes.
*Test data*: use images in your data folder, for instance, box.png and box_in_scene.png.
- Create a new console project. Read two input images. :
Mat img1 = imread(argv[1], IMREAD_GRAYSCALE);
Mat img2 = imread(argv[2], IMREAD_GRAYSCALE);
- Detect keypoints in both images and compute descriptors for each of the keypoints. :
// detecting keypoints
Ptr<Feature2D> surf = SURF::create();
vector<KeyPoint> keypoints1;
Mat descriptors1;
surf->detectAndCompute(img1, Mat(), keypoints1, descriptors1);
... // do the same for the second image
- Now, find the closest matches between descriptors from the first image to the second: :
// matching descriptors
BruteForceMatcher<L2<float> > matcher;
vector<DMatch> matches;
matcher.match(descriptors1, descriptors2, matches);
- Visualize the results: :
// drawing the results
namedWindow("matches", 1);
Mat img_matches;
drawMatches(img1, keypoints1, img2, keypoints2, matches, img_matches);
imshow("matches", img_matches);
waitKey(0);
- Find the homography transformation between two sets of points: :
vector<Point2f> points1, points2;
// fill the arrays with the points
....
Mat H = findHomography(Mat(points1), Mat(points2), RANSAC, ransacReprojThreshold);
- Create a set of inlier matches and draw them. Use perspectiveTransform function to map points
with homography:
Mat points1Projected; perspectiveTransform(Mat(points1), points1Projected, H);
- Use drawMatches for drawing inliers.

View File

@@ -0,0 +1,51 @@
Feature Description {#tutorial_feature_description}
===================
Goal
----
In this tutorial you will learn how to:
- Use the @ref cv::DescriptorExtractor interface in order to find the feature vector correspondent
to the keypoints. Specifically:
- Use cv::xfeatures2d::SURF and its function cv::xfeatures2d::SURF::compute to perform the
required calculations.
- Use a @ref cv::DescriptorMatcher to match the features vector
- Use the function @ref cv::drawMatches to draw the detected matches.
\warning You need the <a href="https://github.com/opencv/opencv_contrib">OpenCV contrib modules</a> to be able to use the SURF features
(alternatives are ORB, KAZE, ... features).
Theory
------
Code
----
@add_toggle_cpp
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/features2D/feature_description/SURF_matching_Demo.cpp)
@include samples/cpp/tutorial_code/features2D/feature_description/SURF_matching_Demo.cpp
@end_toggle
@add_toggle_java
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/features2D/feature_description/SURFMatchingDemo.java)
@include samples/java/tutorial_code/features2D/feature_description/SURFMatchingDemo.java
@end_toggle
@add_toggle_python
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/features2D/feature_description/SURF_matching_Demo.py)
@include samples/python/tutorial_code/features2D/feature_description/SURF_matching_Demo.py
@end_toggle
Explanation
-----------
Result
------
Here is the result after applying the BruteForce matcher between the two original images:
![](images/Feature_Description_BruteForce_Result.jpg)

Some files were not shown because too many files have changed in this diff Show More