add part of opencv
@@ -0,0 +1,308 @@
|
||||
Camera calibration With OpenCV {#tutorial_camera_calibration}
|
||||
==============================
|
||||
|
||||
Cameras have been around for a long-long time. However, with the introduction of the cheap *pinhole*
|
||||
cameras in the late 20th century, they became a common occurrence in our everyday life.
|
||||
Unfortunately, this cheapness comes with its price: significant distortion. Luckily, these are
|
||||
constants and with a calibration and some remapping we can correct this. Furthermore, with
|
||||
calibration you may also determine the relation between the camera's natural units (pixels) and the
|
||||
real world units (for example millimeters).
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
For the distortion OpenCV takes into account the radial and tangential factors. For the radial
|
||||
factor one uses the following formula:
|
||||
|
||||
\f[x_{distorted} = x( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \\
|
||||
y_{distorted} = y( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6)\f]
|
||||
|
||||
So for an undistorted pixel point at \f$(x,y)\f$ coordinates, its position on the distorted image
|
||||
will be \f$(x_{distorted} y_{distorted})\f$. The presence of the radial distortion manifests in form
|
||||
of the "barrel" or "fish-eye" effect.
|
||||
|
||||
Tangential distortion occurs because the image taking lenses are not perfectly parallel to the
|
||||
imaging plane. It can be represented via the formulas:
|
||||
|
||||
\f[x_{distorted} = x + [ 2p_1xy + p_2(r^2+2x^2)] \\
|
||||
y_{distorted} = y + [ p_1(r^2+ 2y^2)+ 2p_2xy]\f]
|
||||
|
||||
So we have five distortion parameters which in OpenCV are presented as one row matrix with 5
|
||||
columns:
|
||||
|
||||
\f[distortion\_coefficients=(k_1 \hspace{10pt} k_2 \hspace{10pt} p_1 \hspace{10pt} p_2 \hspace{10pt} k_3)\f]
|
||||
|
||||
Now for the unit conversion we use the following formula:
|
||||
|
||||
\f[\left [ \begin{matrix} x \\ y \\ w \end{matrix} \right ] = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix} X \\ Y \\ Z \end{matrix} \right ]\f]
|
||||
|
||||
Here the presence of \f$w\f$ is explained by the use of homography coordinate system (and \f$w=Z\f$). The
|
||||
unknown parameters are \f$f_x\f$ and \f$f_y\f$ (camera focal lengths) and \f$(c_x, c_y)\f$ which are the optical
|
||||
centers expressed in pixels coordinates. If for both axes a common focal length is used with a given
|
||||
\f$a\f$ aspect ratio (usually 1), then \f$f_y=f_x*a\f$ and in the upper formula we will have a single focal
|
||||
length \f$f\f$. The matrix containing these four parameters is referred to as the *camera matrix*. While
|
||||
the distortion coefficients are the same regardless of the camera resolutions used, these should be
|
||||
scaled along with the current resolution from the calibrated resolution.
|
||||
|
||||
The process of determining these two matrices is the calibration. Calculation of these parameters is
|
||||
done through basic geometrical equations. The equations used depend on the chosen calibrating
|
||||
objects. Currently OpenCV supports three types of objects for calibration:
|
||||
|
||||
- Classical black-white chessboard
|
||||
- Symmetrical circle pattern
|
||||
- Asymmetrical circle pattern
|
||||
|
||||
Basically, you need to take snapshots of these patterns with your camera and let OpenCV find them.
|
||||
Each found pattern results in a new equation. To solve the equation you need at least a
|
||||
predetermined number of pattern snapshots to form a well-posed equation system. This number is
|
||||
higher for the chessboard pattern and less for the circle ones. For example, in theory the
|
||||
chessboard pattern requires at least two snapshots. However, in practice we have a good amount of
|
||||
noise present in our input images, so for good results you will probably need at least 10 good
|
||||
snapshots of the input pattern in different positions.
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
The sample application will:
|
||||
|
||||
- Determine the distortion matrix
|
||||
- Determine the camera matrix
|
||||
- Take input from Camera, Video and Image file list
|
||||
- Read configuration from XML/YAML file
|
||||
- Save the results into XML/YAML file
|
||||
- Calculate re-projection error
|
||||
|
||||
Source code
|
||||
-----------
|
||||
|
||||
You may also find the source code in the `samples/cpp/tutorial_code/calib3d/camera_calibration/`
|
||||
folder of the OpenCV source library or [download it from here
|
||||
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp). For the usage of the program, run it with `-h` argument. The program has an
|
||||
essential argument: the name of its configuration file. If none is given then it will try to open the
|
||||
one named "default.xml". [Here's a sample configuration file
|
||||
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/calib3d/camera_calibration/in_VID5.xml) in XML format. In the
|
||||
configuration file you may choose to use camera as an input, a video file or an image list. If you
|
||||
opt for the last one, you will need to create a configuration file where you enumerate the images to
|
||||
use. Here's [an example of this ](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/calib3d/camera_calibration/VID5.xml).
|
||||
The important part to remember is that the images need to be specified using the absolute path or
|
||||
the relative one from your application's working directory. You may find all this in the samples
|
||||
directory mentioned above.
|
||||
|
||||
The application starts up with reading the settings from the configuration file. Although, this is
|
||||
an important part of it, it has nothing to do with the subject of this tutorial: *camera
|
||||
calibration*. Therefore, I've chosen not to post the code for that part here. Technical background
|
||||
on how to do this you can find in the @ref tutorial_file_input_output_with_xml_yml tutorial.
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
-# **Read the settings**
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp file_read
|
||||
|
||||
For this I've used simple OpenCV class input operation. After reading the file I've an
|
||||
additional post-processing function that checks validity of the input. Only if all inputs are
|
||||
good then *goodInput* variable will be true.
|
||||
|
||||
-# **Get next input, if it fails or we have enough of them - calibrate**
|
||||
|
||||
After this we have a big
|
||||
loop where we do the following operations: get the next image from the image list, camera or
|
||||
video file. If this fails or we have enough images then we run the calibration process. In case
|
||||
of image we step out of the loop and otherwise the remaining frames will be undistorted (if the
|
||||
option is set) via changing from *DETECTION* mode to the *CALIBRATED* one.
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp get_input
|
||||
For some cameras we may need to flip the input image. Here we do this too.
|
||||
|
||||
-# **Find the pattern in the current input**
|
||||
|
||||
The formation of the equations I mentioned above aims
|
||||
to finding major patterns in the input: in case of the chessboard this are corners of the
|
||||
squares and for the circles, well, the circles themselves. The position of these will form the
|
||||
result which will be written into the *pointBuf* vector.
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp find_pattern
|
||||
Depending on the type of the input pattern you use either the @ref cv::findChessboardCorners or
|
||||
the @ref cv::findCirclesGrid function. For both of them you pass the current image and the size
|
||||
of the board and you'll get the positions of the patterns. Furthermore, they return a boolean
|
||||
variable which states if the pattern was found in the input (we only need to take into account
|
||||
those images where this is true!).
|
||||
|
||||
Then again in case of cameras we only take camera images when an input delay time is passed.
|
||||
This is done in order to allow user moving the chessboard around and getting different images.
|
||||
Similar images result in similar equations, and similar equations at the calibration step will
|
||||
form an ill-posed problem, so the calibration will fail. For square images the positions of the
|
||||
corners are only approximate. We may improve this by calling the @ref cv::cornerSubPix function.
|
||||
(`winSize` is used to control the side length of the search window. Its default value is 11.
|
||||
`winSzie` may be changed by command line parameter `--winSize=<number>`.)
|
||||
It will produce better calibration result. After this we add a valid inputs result to the
|
||||
*imagePoints* vector to collect all of the equations into a single container. Finally, for
|
||||
visualization feedback purposes we will draw the found points on the input image using @ref
|
||||
cv::findChessboardCorners function.
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp pattern_found
|
||||
-# **Show state and result to the user, plus command line control of the application**
|
||||
|
||||
This part shows text output on the image.
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp output_text
|
||||
If we ran calibration and got camera's matrix with the distortion coefficients we may want to
|
||||
correct the image using @ref cv::undistort function:
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp output_undistorted
|
||||
Then we show the image and wait for an input key and if this is *u* we toggle the distortion removal,
|
||||
if it is *g* we start again the detection process, and finally for the *ESC* key we quit the application:
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp await_input
|
||||
-# **Show the distortion removal for the images too**
|
||||
|
||||
When you work with an image list it is not
|
||||
possible to remove the distortion inside the loop. Therefore, you must do this after the loop.
|
||||
Taking advantage of this now I'll expand the @ref cv::undistort function, which is in fact first
|
||||
calls @ref cv::initUndistortRectifyMap to find transformation matrices and then performs
|
||||
transformation using @ref cv::remap function. Because, after successful calibration map
|
||||
calculation needs to be done only once, by using this expanded form you may speed up your
|
||||
application:
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp show_results
|
||||
|
||||
The calibration and save
|
||||
------------------------
|
||||
|
||||
Because the calibration needs to be done only once per camera, it makes sense to save it after a
|
||||
successful calibration. This way later on you can just load these values into your program. Due to
|
||||
this we first make the calibration, and if it succeeds we save the result into an OpenCV style XML
|
||||
or YAML file, depending on the extension you give in the configuration file.
|
||||
|
||||
Therefore in the first function we just split up these two processes. Because we want to save many
|
||||
of the calibration variables we'll create these variables here and pass on both of them to the
|
||||
calibration and saving function. Again, I'll not show the saving part as that has little in common
|
||||
with the calibration. Explore the source file in order to find out how and what:
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp run_and_save
|
||||
We do the calibration with the help of the @ref cv::calibrateCameraRO function. It has the following
|
||||
parameters:
|
||||
|
||||
- The object points. This is a vector of *Point3f* vector that for each input image describes how
|
||||
should the pattern look. If we have a planar pattern (like a chessboard) then we can simply set
|
||||
all Z coordinates to zero. This is a collection of the points where these important points are
|
||||
present. Because, we use a single pattern for all the input images we can calculate this just
|
||||
once and multiply it for all the other input views. We calculate the corner points with the
|
||||
*calcBoardCornerPositions* function as:
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp board_corners
|
||||
And then multiply it as:
|
||||
@code{.cpp}
|
||||
vector<vector<Point3f> > objectPoints(1);
|
||||
calcBoardCornerPositions(s.boardSize, s.squareSize, objectPoints[0], s.calibrationPattern);
|
||||
objectPoints[0][s.boardSize.width - 1].x = objectPoints[0][0].x + grid_width;
|
||||
newObjPoints = objectPoints[0];
|
||||
|
||||
objectPoints.resize(imagePoints.size(),objectPoints[0]);
|
||||
@endcode
|
||||
@note If your calibration board is inaccurate, unmeasured, roughly planar targets (Checkerboard
|
||||
patterns on paper using off-the-shelf printers are the most convenient calibration targets and
|
||||
most of them are not accurate enough.), a method from @cite strobl2011iccv can be utilized to
|
||||
dramatically improve the accuracies of the estimated camera intrinsic parameters. This new
|
||||
calibration method will be called if command line parameter `-d=<number>` is provided. In the
|
||||
above code snippet, `grid_width` is actually the value set by `-d=<number>`. It's the measured
|
||||
distance between top-left (0, 0, 0) and top-right (s.squareSize*(s.boardSize.width-1), 0, 0)
|
||||
corners of the pattern grid points. It should be measured precisely with rulers or vernier calipers.
|
||||
After calibration, newObjPoints will be updated with refined 3D coordinates of object points.
|
||||
- The image points. This is a vector of *Point2f* vector which for each input image contains
|
||||
coordinates of the important points (corners for chessboard and centers of the circles for the
|
||||
circle pattern). We have already collected this from @ref cv::findChessboardCorners or @ref
|
||||
cv::findCirclesGrid function. We just need to pass it on.
|
||||
- The size of the image acquired from the camera, video file or the images.
|
||||
- The index of the object point to be fixed. We set it to -1 to request standard calibration method.
|
||||
If the new object-releasing method to be used, set it to the index of the top-right corner point
|
||||
of the calibration board grid. See cv::calibrateCameraRO for detailed explanation.
|
||||
@code{.cpp}
|
||||
int iFixedPoint = -1;
|
||||
if (release_object)
|
||||
iFixedPoint = s.boardSize.width - 1;
|
||||
@endcode
|
||||
- The camera matrix. If we used the fixed aspect ratio option we need to set \f$f_x\f$:
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp fixed_aspect
|
||||
- The distortion coefficient matrix. Initialize with zero.
|
||||
@code{.cpp}
|
||||
distCoeffs = Mat::zeros(8, 1, CV_64F);
|
||||
@endcode
|
||||
- For all the views the function will calculate rotation and translation vectors which transform
|
||||
the object points (given in the model coordinate space) to the image points (given in the world
|
||||
coordinate space). The 7-th and 8-th parameters are the output vector of matrices containing in
|
||||
the i-th position the rotation and translation vector for the i-th object point to the i-th
|
||||
image point.
|
||||
- The updated output vector of calibration pattern points. This parameter is ignored with standard
|
||||
calibration method.
|
||||
- The final argument is the flag. You need to specify here options like fix the aspect ratio for
|
||||
the focal length, assume zero tangential distortion or to fix the principal point. Here we use
|
||||
CALIB_USE_LU to get faster calibration speed.
|
||||
@code{.cpp}
|
||||
rms = calibrateCameraRO(objectPoints, imagePoints, imageSize, iFixedPoint,
|
||||
cameraMatrix, distCoeffs, rvecs, tvecs, newObjPoints,
|
||||
s.flag | CALIB_USE_LU);
|
||||
@endcode
|
||||
- The function returns the average re-projection error. This number gives a good estimation of
|
||||
precision of the found parameters. This should be as close to zero as possible. Given the
|
||||
intrinsic, distortion, rotation and translation matrices we may calculate the error for one view
|
||||
by using the @ref cv::projectPoints to first transform the object point to image point. Then we
|
||||
calculate the absolute norm between what we got with our transformation and the corner/circle
|
||||
finding algorithm. To find the average error we calculate the arithmetical mean of the errors
|
||||
calculated for all the calibration images.
|
||||
@snippet samples/cpp/tutorial_code/calib3d/camera_calibration/camera_calibration.cpp compute_errors
|
||||
|
||||
Results
|
||||
-------
|
||||
|
||||
Let there be [this input chessboard pattern ](pattern.png) which has a size of 9 X 6. I've used an
|
||||
AXIS IP camera to create a couple of snapshots of the board and saved it into VID5 directory. I've
|
||||
put this inside the `images/CameraCalibration` folder of my working directory and created the
|
||||
following `VID5.XML` file that describes which images to use:
|
||||
@code{.xml}
|
||||
<?xml version="1.0"?>
|
||||
<opencv_storage>
|
||||
<images>
|
||||
images/CameraCalibration/VID5/xx1.jpg
|
||||
images/CameraCalibration/VID5/xx2.jpg
|
||||
images/CameraCalibration/VID5/xx3.jpg
|
||||
images/CameraCalibration/VID5/xx4.jpg
|
||||
images/CameraCalibration/VID5/xx5.jpg
|
||||
images/CameraCalibration/VID5/xx6.jpg
|
||||
images/CameraCalibration/VID5/xx7.jpg
|
||||
images/CameraCalibration/VID5/xx8.jpg
|
||||
</images>
|
||||
</opencv_storage>
|
||||
@endcode
|
||||
Then passed `images/CameraCalibration/VID5/VID5.XML` as an input in the configuration file. Here's a
|
||||
chessboard pattern found during the runtime of the application:
|
||||
|
||||

|
||||
|
||||
After applying the distortion removal we get:
|
||||
|
||||

|
||||
|
||||
The same works for [this asymmetrical circle pattern ](acircles_pattern.png) by setting the input
|
||||
width to 4 and height to 11. This time I've used a live camera feed by specifying its ID ("1") for
|
||||
the input. Here's, how a detected pattern should look:
|
||||
|
||||

|
||||
|
||||
In both cases in the specified output XML/YAML file you'll find the camera and distortion
|
||||
coefficients matrices:
|
||||
@code{.xml}
|
||||
<camera_matrix type_id="opencv-matrix">
|
||||
<rows>3</rows>
|
||||
<cols>3</cols>
|
||||
<dt>d</dt>
|
||||
<data>
|
||||
6.5746697944293521e+002 0. 3.1950000000000000e+002 0.
|
||||
6.5746697944293521e+002 2.3950000000000000e+002 0. 0. 1.</data></camera_matrix>
|
||||
<distortion_coefficients type_id="opencv-matrix">
|
||||
<rows>5</rows>
|
||||
<cols>1</cols>
|
||||
<dt>d</dt>
|
||||
<data>
|
||||
-4.1802327176423804e-001 5.0715244063187526e-001 0. 0.
|
||||
-5.7843597214487474e-001</data></distortion_coefficients>
|
||||
@endcode
|
||||
Add these values as constants to your program, call the @ref cv::initUndistortRectifyMap and the
|
||||
@ref cv::remap function to remove distortion and enjoy distortion free inputs for cheap and low
|
||||
quality cameras.
|
||||
|
||||
You may observe a runtime instance of this on the [YouTube
|
||||
here](https://www.youtube.com/watch?v=ViPN810E0SU).
|
||||
|
||||
@youtube{ViPN810E0SU}
|
||||
|
After Width: | Height: | Size: 45 KiB |
|
After Width: | Height: | Size: 34 KiB |
|
After Width: | Height: | Size: 29 KiB |
@@ -0,0 +1,38 @@
|
||||
Create calibration pattern {#tutorial_camera_calibration_pattern}
|
||||
=========================================
|
||||
|
||||
The goal of this tutorial is to learn how to create calibration pattern.
|
||||
|
||||
You can find a chessboard pattern in https://github.com/opencv/opencv/blob/master/doc/pattern.png
|
||||
|
||||
You can find a circleboard pattern in https://github.com/opencv/opencv/blob/master/doc/acircles_pattern.png
|
||||
|
||||
Create your own pattern
|
||||
---------------
|
||||
|
||||
Now, if you want to create your own pattern, you will need python to use https://github.com/opencv/opencv/blob/master/doc/pattern_tools/gen_pattern.py
|
||||
|
||||
Example
|
||||
|
||||
create a checkerboard pattern in file chessboard.svg with 9 rows, 6 columns and a square size of 20mm:
|
||||
|
||||
python gen_pattern.py -o chessboard.svg --rows 9 --columns 6 --type checkerboard --square_size 20
|
||||
|
||||
create a circle board pattern in file circleboard.svg with 7 rows, 5 columns and a radius of 15mm:
|
||||
|
||||
python gen_pattern.py -o circleboard.svg --rows 7 --columns 5 --type circles --square_size 15
|
||||
|
||||
create a circle board pattern in file acircleboard.svg with 7 rows, 5 columns and a square size of 10mm and less spacing between circle:
|
||||
|
||||
python gen_pattern.py -o acircleboard.svg --rows 7 --columns 5 --type acircles --square_size 10 --radius_rate 2
|
||||
|
||||
If you want to change unit use -u option (mm inches, px, m)
|
||||
|
||||
If you want to change page size use -w and -h options
|
||||
|
||||
@cond HAVE_opencv_aruco
|
||||
If you want to create a ChArUco board read @ref tutorial_charuco_detection "tutorial Detection of ChArUco Corners" in opencv_contrib tutorial.
|
||||
@endcond
|
||||
@cond !HAVE_opencv_aruco
|
||||
If you want to create a ChArUco board read tutorial Detection of ChArUco Corners in opencv_contrib tutorial.
|
||||
@endcond
|
||||
@@ -0,0 +1,55 @@
|
||||
Camera calibration with square chessboard {#tutorial_camera_calibration_square_chess}
|
||||
=========================================
|
||||
|
||||
The goal of this tutorial is to learn how to calibrate a camera given a set of chessboard images.
|
||||
|
||||
*Test data*: use images in your data/chess folder.
|
||||
|
||||
- Compile OpenCV with samples by setting BUILD_EXAMPLES to ON in cmake configuration.
|
||||
|
||||
- Go to bin folder and use imagelist_creator to create an XML/YAML list of your images.
|
||||
|
||||
- Then, run calibration sample to get camera parameters. Use square size equal to 3cm.
|
||||
|
||||
Pose estimation
|
||||
---------------
|
||||
|
||||
Now, let us write code that detects a chessboard in an image and finds its distance from the
|
||||
camera. You can apply this method to any object with known 3D geometry; which you detect in an
|
||||
image.
|
||||
|
||||
*Test data*: use chess_test\*.jpg images from your data folder.
|
||||
|
||||
- Create an empty console project. Load a test image :
|
||||
|
||||
Mat img = imread(argv[1], IMREAD_GRAYSCALE);
|
||||
|
||||
- Detect a chessboard in this image using findChessboard function :
|
||||
|
||||
bool found = findChessboardCorners( img, boardSize, ptvec, CALIB_CB_ADAPTIVE_THRESH );
|
||||
|
||||
- Now, write a function that generates a vector\<Point3f\> array of 3d coordinates of a chessboard
|
||||
in any coordinate system. For simplicity, let us choose a system such that one of the chessboard
|
||||
corners is in the origin and the board is in the plane *z = 0*
|
||||
|
||||
- Read camera parameters from XML/YAML file :
|
||||
|
||||
FileStorage fs( filename, FileStorage::READ );
|
||||
Mat intrinsics, distortion;
|
||||
fs["camera_matrix"] >> intrinsics;
|
||||
fs["distortion_coefficients"] >> distortion;
|
||||
|
||||
- Now we are ready to find a chessboard pose by running \`solvePnP\` :
|
||||
|
||||
vector<Point3f> boardPoints;
|
||||
// fill the array
|
||||
...
|
||||
|
||||
solvePnP(Mat(boardPoints), Mat(foundBoardCorners), cameraMatrix,
|
||||
distCoeffs, rvec, tvec, false);
|
||||
|
||||
- Calculate reprojection error like it is done in calibration sample (see
|
||||
opencv/samples/cpp/calibration.cpp, function computeReprojectionErrors).
|
||||
|
||||
Question: how would you calculate distance from the camera origin to any one of the corners?
|
||||
Answer: As our image lies in a 3D space, firstly we would calculate the relative camera pose. This would give us 3D to 2D correspondences. Next, we can apply a simple L2 norm to calculate distance between any point (end point for corners).
|
||||
|
After Width: | Height: | Size: 8.0 KiB |
|
After Width: | Height: | Size: 6.6 KiB |
|
After Width: | Height: | Size: 83 KiB |
|
After Width: | Height: | Size: 10 KiB |
|
After Width: | Height: | Size: 70 KiB |
|
After Width: | Height: | Size: 84 KiB |
|
After Width: | Height: | Size: 78 KiB |
@@ -0,0 +1,198 @@
|
||||
Interactive camera calibration application {#tutorial_interactive_calibration}
|
||||
==============================
|
||||
|
||||
According to classical calibration technique user must collect all data first and when run @ref cv::calibrateCamera function
|
||||
to obtain camera parameters. If average re-projection error is huge or if estimated parameters seems to be wrong, process of
|
||||
selection or collecting data and starting of @ref cv::calibrateCamera repeats.
|
||||
|
||||
Interactive calibration process assumes that after each new data portion user can see results and errors estimation, also
|
||||
he can delete last data portion and finally, when dataset for calibration is big enough starts process of auto data selection.
|
||||
|
||||
Main application features
|
||||
------
|
||||
|
||||
The sample application will:
|
||||
|
||||
- Determine the distortion matrix and confidence interval for each element
|
||||
- Determine the camera matrix and confidence interval for each element
|
||||
- Take input from camera or video file
|
||||
- Read configuration from XML file
|
||||
- Save the results into XML file
|
||||
- Calculate re-projection error
|
||||
- Reject patterns views on sharp angles to prevent appear of ill-conditioned jacobian blocks
|
||||
- Auto switch calibration flags (fix aspect ratio and elements of distortion matrix if needed)
|
||||
- Auto detect when calibration is done by using several criteria
|
||||
- Auto capture of static patterns (user doesn't need press any keys to capture frame, just don't move pattern for a second)
|
||||
|
||||
Supported patterns:
|
||||
|
||||
- Black-white chessboard
|
||||
- Asymmetrical circle pattern
|
||||
- Dual asymmetrical circle pattern
|
||||
- chAruco (chessboard with Aruco markers)
|
||||
|
||||
Description of parameters
|
||||
------
|
||||
|
||||
Application has two groups of parameters: primary (passed through command line) and advances (passed through XML file).
|
||||
|
||||
### Primary parameters:
|
||||
|
||||
All of this parameters are passed to application through a command line.
|
||||
|
||||
-[parameter]=[default value]: description
|
||||
|
||||
- -v=[filename]: get video from filename, default input -- camera with id=0
|
||||
- -ci=[0]: get video from camera with specified id
|
||||
- -flip=[false]: vertical flip of input frames
|
||||
- -t=[circles]: pattern for calibration (circles, chessboard, dualCircles, chAruco)
|
||||
- -sz=[16.3]: distance between two nearest centers of circles or squares on calibration board
|
||||
- -dst=[295] distance between white and black parts of daulCircles pattern
|
||||
- -w=[width]: width of pattern (in corners or circles)
|
||||
- -h=[height]: height of pattern (in corners or circles)
|
||||
- -of=[camParams.xml]: output file name
|
||||
- -ft=[true]: auto tuning of calibration flags
|
||||
- -vis=[grid]: captured boards visualization (grid, window)
|
||||
- -d=[0.8]: delay between captures in seconds
|
||||
- -pf=[defaultConfig.xml]: advanced application parameters file
|
||||
|
||||
### Advanced parameters:
|
||||
|
||||
By default values of advanced parameters are stored in defaultConfig.xml
|
||||
|
||||
@code{.xml}
|
||||
<?xml version="1.0"?>
|
||||
<opencv_storage>
|
||||
<charuco_dict>0</charuco_dict>
|
||||
<charuco_square_lenght>200</charuco_square_lenght>
|
||||
<charuco_marker_size>100</charuco_marker_size>
|
||||
<calibration_step>1</calibration_step>
|
||||
<max_frames_num>30</max_frames_num>
|
||||
<min_frames_num>10</min_frames_num>
|
||||
<solver_eps>1e-7</solver_eps>
|
||||
<solver_max_iters>30</solver_max_iters>
|
||||
<fast_solver>0</fast_solver>
|
||||
<frame_filter_conv_param>0.1</frame_filter_conv_param>
|
||||
<camera_resolution>1280 720</camera_resolution>
|
||||
</opencv_storage>
|
||||
@endcode
|
||||
|
||||
- *charuco_dict*: name of special dictionary, which has been used for generation of chAruco pattern
|
||||
- *charuco_square_lenght*: size of square on chAruco board (in pixels)
|
||||
- *charuco_marker_size*: size of Aruco markers on chAruco board (in pixels)
|
||||
- *calibration_step*: interval in frames between launches of @ref cv::calibrateCamera
|
||||
- *max_frames_num*: if number of frames for calibration is greater then this value frames filter starts working.
|
||||
After filtration size of calibration dataset is equals to *max_frames_num*
|
||||
- *min_frames_num*: if number of frames is greater then this value turns on auto flags tuning, undistorted view and quality evaluation
|
||||
- *solver_eps*: precision of Levenberg-Marquardt solver in @ref cv::calibrateCamera
|
||||
- *solver_max_iters*: iterations limit of solver
|
||||
- *fast_solver*: if this value is nonzero and Lapack is found QR decomposition is used instead of SVD in solver.
|
||||
QR faster than SVD, but potentially less precise
|
||||
- *frame_filter_conv_param*: parameter which used in linear convolution of bicriterial frames filter
|
||||
- *camera_resolution*: resolution of camera which is used for calibration
|
||||
|
||||
**Note:** *charuco_dict*, *charuco_square_lenght* and *charuco_marker_size* are used for chAruco pattern generation
|
||||
(see Aruco module description for details: [Aruco tutorials](https://github.com/opencv/opencv_contrib/tree/master/modules/aruco/tutorials))
|
||||
|
||||
Default chAruco pattern:
|
||||
|
||||

|
||||
|
||||
Dual circles pattern
|
||||
------
|
||||
|
||||
To make this pattern you need standard OpenCV circles pattern and binary inverted one.
|
||||
Place two patterns on one plane in order when all horizontal lines of circles in one pattern are
|
||||
continuations of similar lines in another.
|
||||
Measure distance between patterns as shown at picture below pass it as **dst** command line parameter. Also measure distance between centers of nearest circles and pass
|
||||
this value as **sz** command line parameter.
|
||||
|
||||

|
||||
|
||||
This pattern is very sensitive to quality of production and measurements.
|
||||
|
||||
|
||||
Data filtration
|
||||
------
|
||||
When size of calibration dataset is greater then *max_frames_num* starts working
|
||||
data filter. It tries to remove "bad" frames from dataset. Filter removes the frame
|
||||
on which \f$loss\_function\f$ takes maximum.
|
||||
|
||||
\f[loss\_function(i)=\alpha RMS(i)+(1-\alpha)reducedGridQuality(i)\f]
|
||||
|
||||
**RMS** is an average re-projection error calculated for frame *i*, **reducedGridQuality**
|
||||
is scene coverage quality evaluation without frame *i*. \f$\alpha\f$ is equals to
|
||||
**frame_filter_conv_param**.
|
||||
|
||||
|
||||
Calibration process
|
||||
------
|
||||
|
||||
To start calibration just run application. Place pattern ahead the camera and fixate pattern in some pose.
|
||||
After that wait for capturing (will be shown message like "Frame #i captured").
|
||||
Current focal distance and re-projection error will be shown at the main screen. Move pattern to the next position and repeat procedure. Try to cover image plane
|
||||
uniformly and don't show pattern on sharp angles to the image plane.
|
||||
|
||||

|
||||
|
||||
If calibration seems to be successful (confidence intervals and average re-projection
|
||||
error are small, frame coverage quality and number of pattern views are big enough)
|
||||
application will show a message like on screen below.
|
||||
|
||||
|
||||

|
||||
|
||||
Hot keys:
|
||||
|
||||
- Esc -- exit application
|
||||
- s -- save current data to XML file
|
||||
- r -- delete last frame
|
||||
- d -- delete all frames
|
||||
- u -- enable/disable applying of undistortion
|
||||
- v -- switch visualization mode
|
||||
|
||||
Results
|
||||
------
|
||||
|
||||
As result you will get camera parameters and confidence intervals for them.
|
||||
|
||||
Example of output XML file:
|
||||
|
||||
@code{.xml}
|
||||
<?xml version="1.0"?>
|
||||
<opencv_storage>
|
||||
<calibrationDate>"Thu 07 Apr 2016 04:23:03 PM MSK"</calibrationDate>
|
||||
<framesCount>21</framesCount>
|
||||
<cameraResolution>
|
||||
1280 720</cameraResolution>
|
||||
<cameraMatrix type_id="opencv-matrix">
|
||||
<rows>3</rows>
|
||||
<cols>3</cols>
|
||||
<dt>d</dt>
|
||||
<data>
|
||||
1.2519588293098975e+03 0. 6.6684948780852471e+02 0.
|
||||
1.2519588293098975e+03 3.6298123112613683e+02 0. 0. 1.</data></cameraMatrix>
|
||||
<cameraMatrix_std_dev type_id="opencv-matrix">
|
||||
<rows>4</rows>
|
||||
<cols>1</cols>
|
||||
<dt>d</dt>
|
||||
<data>
|
||||
0. 1.2887048808572649e+01 2.8536856683866230e+00
|
||||
2.8341737483430314e+00</data></cameraMatrix_std_dev>
|
||||
<dist_coeffs type_id="opencv-matrix">
|
||||
<rows>1</rows>
|
||||
<cols>5</cols>
|
||||
<dt>d</dt>
|
||||
<data>
|
||||
1.3569117181595716e-01 -8.2513063822554633e-01 0. 0.
|
||||
1.6412101575010554e+00</data></dist_coeffs>
|
||||
<dist_coeffs_std_dev type_id="opencv-matrix">
|
||||
<rows>5</rows>
|
||||
<cols>1</cols>
|
||||
<dt>d</dt>
|
||||
<data>
|
||||
1.5570675523402111e-02 8.7229075437543435e-02 0. 0.
|
||||
1.8382427901856876e-01</data></dist_coeffs_std_dev>
|
||||
<avg_reprojection_error>4.2691743074130178e-01</avg_reprojection_error>
|
||||
</opencv_storage>
|
||||
@endcode
|
||||
|
After Width: | Height: | Size: 31 KiB |
|
After Width: | Height: | Size: 106 KiB |
@@ -0,0 +1,795 @@
|
||||
Real Time pose estimation of a textured object {#tutorial_real_time_pose}
|
||||
==============================================
|
||||
|
||||
Nowadays, augmented reality is one of the top research topic in computer vision and robotics fields.
|
||||
The most elemental problem in augmented reality is the estimation of the camera pose respect of an
|
||||
object in the case of computer vision area to do later some 3D rendering or in the case of robotics
|
||||
obtain an object pose in order to grasp it and do some manipulation. However, this is not a trivial
|
||||
problem to solve due to the fact that the most common issue in image processing is the computational
|
||||
cost of applying a lot of algorithms or mathematical operations for solving a problem which is basic
|
||||
and immediately for humans.
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this tutorial is explained how to build a real time application to estimate the camera pose in
|
||||
order to track a textured object with six degrees of freedom given a 2D image and its 3D textured
|
||||
model.
|
||||
|
||||
The application will have the following parts:
|
||||
|
||||
- Read 3D textured object model and object mesh.
|
||||
- Take input from Camera or Video.
|
||||
- Extract ORB features and descriptors from the scene.
|
||||
- Match scene descriptors with model descriptors using Flann matcher.
|
||||
- Pose estimation using PnP + Ransac.
|
||||
- Linear Kalman Filter for bad poses rejection.
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
In computer vision estimate the camera pose from *n* 3D-to-2D point correspondences is a fundamental
|
||||
and well understood problem. The most general version of the problem requires estimating the six
|
||||
degrees of freedom of the pose and five calibration parameters: focal length, principal point,
|
||||
aspect ratio and skew. It could be established with a minimum of 6 correspondences, using the well
|
||||
known Direct Linear Transform (DLT) algorithm. There are, though, several simplifications to the
|
||||
problem which turn into an extensive list of different algorithms that improve the accuracy of the
|
||||
DLT.
|
||||
|
||||
The most common simplification is to assume known calibration parameters which is the so-called
|
||||
Perspective-*n*-Point problem:
|
||||
|
||||

|
||||
|
||||
**Problem Formulation:** Given a set of correspondences between 3D points \f$p_i\f$ expressed in a world
|
||||
reference frame, and their 2D projections \f$u_i\f$ onto the image, we seek to retrieve the pose (\f$R\f$
|
||||
and \f$t\f$) of the camera w.r.t. the world and the focal length \f$f\f$.
|
||||
|
||||
OpenCV provides four different approaches to solve the Perspective-*n*-Point problem which return
|
||||
\f$R\f$ and \f$t\f$. Then, using the following formula it's possible to project 3D points into the image
|
||||
plane:
|
||||
|
||||
\f[s\ \left [ \begin{matrix} u \\ v \\ 1 \end{matrix} \right ] = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix} r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\ r_{31} & r_{32} & r_{33} & t_3 \end{matrix} \right ] \left [ \begin{matrix} X \\ Y \\ Z\\ 1 \end{matrix} \right ]\f]
|
||||
|
||||
The complete documentation of how to manage with this equations is in @ref calib3d .
|
||||
|
||||
Source code
|
||||
-----------
|
||||
|
||||
You can find the source code of this tutorial in the
|
||||
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/` folder of the OpenCV source library.
|
||||
|
||||
The tutorial consists of two main programs:
|
||||
|
||||
-# **Model registration**
|
||||
|
||||
This application is exclusive to whom don't have a 3D textured model of the object to be detected.
|
||||
You can use this program to create your own textured 3D model. This program only works for planar
|
||||
objects, then if you want to model an object with complex shape you should use a sophisticated
|
||||
software to create it.
|
||||
|
||||
The application needs an input image of the object to be registered and its 3D mesh. We have also
|
||||
to provide the intrinsic parameters of the camera with which the input image was taken. All the
|
||||
files need to be specified using the absolute path or the relative one from your application’s
|
||||
working directory. If none files are specified the program will try to open the provided default
|
||||
parameters.
|
||||
|
||||
The application starts up extracting the ORB features and descriptors from the input image and
|
||||
then uses the mesh along with the [Möller–Trumbore intersection
|
||||
algorithm](http://http://en.wikipedia.org/wiki/M%C3%B6ller%E2%80%93Trumbore_intersection_algorithm/)
|
||||
to compute the 3D coordinates of the found features. Finally, the 3D points and the descriptors
|
||||
are stored in different lists in a file with YAML format which each row is a different point. The
|
||||
technical background on how to store the files can be found in the @ref tutorial_file_input_output_with_xml_yml
|
||||
tutorial.
|
||||
|
||||

|
||||
|
||||
-# **Model detection**
|
||||
|
||||
The aim of this application is estimate in real time the object pose given its 3D textured model.
|
||||
|
||||
The application starts up loading the 3D textured model in YAML file format with the same
|
||||
structure explained in the model registration program. From the scene, the ORB features and
|
||||
descriptors are detected and extracted. Then, is used @ref cv::FlannBasedMatcher with
|
||||
@ref cv::flann::GenericIndex to do the matching between the scene descriptors and the model descriptors.
|
||||
Using the found matches along with @ref cv::solvePnPRansac function the `R` and `t` of
|
||||
the camera are computed. Finally, a KalmanFilter is applied in order to reject bad poses.
|
||||
|
||||
In the case that you compiled OpenCV with the samples, you can find it in opencv/build/bin/cpp-tutorial-pnp_detection\`.
|
||||
Then you can run the application and change some parameters:
|
||||
@code{.cpp}
|
||||
This program shows how to detect an object given its 3D textured model. You can choose to use a recorded video or the webcam.
|
||||
Usage:
|
||||
./cpp-tutorial-pnp_detection -help
|
||||
Keys:
|
||||
'esc' - to quit.
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
Usage: cpp-tutorial-pnp_detection [params]
|
||||
|
||||
-c, --confidence (value:0.95)
|
||||
RANSAC confidence
|
||||
-e, --error (value:2.0)
|
||||
RANSAC reprojection error
|
||||
-f, --fast (value:true)
|
||||
use of robust fast match
|
||||
-h, --help (value:true)
|
||||
print this message
|
||||
--in, --inliers (value:30)
|
||||
minimum inliers for Kalman update
|
||||
--it, --iterations (value:500)
|
||||
RANSAC maximum iterations count
|
||||
-k, --keypoints (value:2000)
|
||||
number of keypoints to detect
|
||||
--mesh
|
||||
path to ply mesh
|
||||
--method, --pnp (value:0)
|
||||
PnP method: (0) ITERATIVE - (1) EPNP - (2) P3P - (3) DLS
|
||||
--model
|
||||
path to yml model
|
||||
-r, --ratio (value:0.7)
|
||||
threshold for ratio test
|
||||
-v, --video
|
||||
path to recorded video
|
||||
@endcode
|
||||
For example, you can run the application changing the pnp method:
|
||||
@code{.cpp}
|
||||
./cpp-tutorial-pnp_detection --method=2
|
||||
@endcode
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
Here is explained in detail the code for the real time application:
|
||||
|
||||
-# **Read 3D textured object model and object mesh.**
|
||||
|
||||
In order to load the textured model I implemented the *class* **Model** which has the function
|
||||
*load()* that opens a YAML file and take the stored 3D points with its corresponding descriptors.
|
||||
You can find an example of a 3D textured model in
|
||||
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/cookies_ORB.yml`.
|
||||
@code{.cpp}
|
||||
/* Load a YAML file using OpenCV */
|
||||
void Model::load(const std::string path)
|
||||
{
|
||||
cv::Mat points3d_mat;
|
||||
|
||||
cv::FileStorage storage(path, cv::FileStorage::READ);
|
||||
storage["points_3d"] >> points3d_mat;
|
||||
storage["descriptors"] >> descriptors_;
|
||||
|
||||
points3d_mat.copyTo(list_points3d_in_);
|
||||
|
||||
storage.release();
|
||||
|
||||
}
|
||||
@endcode
|
||||
In the main program the model is loaded as follows:
|
||||
@code{.cpp}
|
||||
Model model; // instantiate Model object
|
||||
model.load(yml_read_path); // load a 3D textured object model
|
||||
@endcode
|
||||
In order to read the model mesh I implemented a *class* **Mesh** which has a function *load()*
|
||||
that opens a \f$*\f$.ply file and store the 3D points of the object and also the composed triangles.
|
||||
You can find an example of a model mesh in
|
||||
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.ply`.
|
||||
@code{.cpp}
|
||||
/* Load a CSV with *.ply format */
|
||||
void Mesh::load(const std::string path)
|
||||
{
|
||||
|
||||
// Create the reader
|
||||
CsvReader csvReader(path);
|
||||
|
||||
// Clear previous data
|
||||
list_vertex_.clear();
|
||||
list_triangles_.clear();
|
||||
|
||||
// Read from .ply file
|
||||
csvReader.readPLY(list_vertex_, list_triangles_);
|
||||
|
||||
// Update mesh attributes
|
||||
num_vertexs_ = list_vertex_.size();
|
||||
num_triangles_ = list_triangles_.size();
|
||||
|
||||
}
|
||||
@endcode
|
||||
In the main program the mesh is loaded as follows:
|
||||
@code{.cpp}
|
||||
Mesh mesh; // instantiate Mesh object
|
||||
mesh.load(ply_read_path); // load an object mesh
|
||||
@endcode
|
||||
You can also load different model and mesh:
|
||||
@code{.cpp}
|
||||
./cpp-tutorial-pnp_detection --mesh=/absolute_path_to_your_mesh.ply --model=/absolute_path_to_your_model.yml
|
||||
@endcode
|
||||
|
||||
-# **Take input from Camera or Video**
|
||||
|
||||
To detect is necessary capture video. It's done loading a recorded video by passing the absolute
|
||||
path where it is located in your machine. In order to test the application you can find a recorded
|
||||
video in `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.mp4`.
|
||||
@code{.cpp}
|
||||
cv::VideoCapture cap; // instantiate VideoCapture
|
||||
cap.open(video_read_path); // open a recorded video
|
||||
|
||||
if(!cap.isOpened()) // check if we succeeded
|
||||
{
|
||||
std::cout << "Could not open the camera device" << std::endl;
|
||||
return -1;
|
||||
}
|
||||
@endcode
|
||||
Then the algorithm is computed frame per frame:
|
||||
@code{.cpp}
|
||||
cv::Mat frame, frame_vis;
|
||||
|
||||
while(cap.read(frame) && cv::waitKey(30) != 27) // capture frame until ESC is pressed
|
||||
{
|
||||
|
||||
frame_vis = frame.clone(); // refresh visualisation frame
|
||||
|
||||
// MAIN ALGORITHM
|
||||
|
||||
}
|
||||
@endcode
|
||||
You can also load different recorded video:
|
||||
@code{.cpp}
|
||||
./cpp-tutorial-pnp_detection --video=/absolute_path_to_your_video.mp4
|
||||
@endcode
|
||||
|
||||
-# **Extract ORB features and descriptors from the scene**
|
||||
|
||||
The next step is to detect the scene features and extract it descriptors. For this task I
|
||||
implemented a *class* **RobustMatcher** which has a function for keypoints detection and features
|
||||
extraction. You can find it in
|
||||
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobusMatcher.cpp`. In your
|
||||
*RobusMatch* object you can use any of the 2D features detectors of OpenCV. In this case I used
|
||||
@ref cv::ORB features because is based on @ref cv::FAST to detect the keypoints and cv::xfeatures2d::BriefDescriptorExtractor
|
||||
to extract the descriptors which means that is fast and robust to rotations. You can find more
|
||||
detailed information about *ORB* in the documentation.
|
||||
|
||||
The following code is how to instantiate and set the features detector and the descriptors
|
||||
extractor:
|
||||
@code{.cpp}
|
||||
RobustMatcher rmatcher; // instantiate RobustMatcher
|
||||
|
||||
cv::FeatureDetector * detector = new cv::OrbFeatureDetector(numKeyPoints); // instantiate ORB feature detector
|
||||
cv::DescriptorExtractor * extractor = new cv::OrbDescriptorExtractor(); // instantiate ORB descriptor extractor
|
||||
|
||||
rmatcher.setFeatureDetector(detector); // set feature detector
|
||||
rmatcher.setDescriptorExtractor(extractor); // set descriptor extractor
|
||||
@endcode
|
||||
The features and descriptors will be computed by the *RobustMatcher* inside the matching function.
|
||||
|
||||
-# **Match scene descriptors with model descriptors using Flann matcher**
|
||||
|
||||
It is the first step in our detection algorithm. The main idea is to match the scene descriptors
|
||||
with our model descriptors in order to know the 3D coordinates of the found features into the
|
||||
current scene.
|
||||
|
||||
Firstly, we have to set which matcher we want to use. In this case is used
|
||||
@ref cv::FlannBasedMatcher matcher which in terms of computational cost is faster than the
|
||||
@ref cv::BFMatcher matcher as we increase the trained collection of features. Then, for
|
||||
FlannBased matcher the index created is *Multi-Probe LSH: Efficient Indexing for High-Dimensional
|
||||
Similarity Search* due to *ORB* descriptors are binary.
|
||||
|
||||
You can tune the *LSH* and search parameters to improve the matching efficiency:
|
||||
@code{.cpp}
|
||||
cv::Ptr<cv::flann::IndexParams> indexParams = cv::makePtr<cv::flann::LshIndexParams>(6, 12, 1); // instantiate LSH index parameters
|
||||
cv::Ptr<cv::flann::SearchParams> searchParams = cv::makePtr<cv::flann::SearchParams>(50); // instantiate flann search parameters
|
||||
|
||||
cv::DescriptorMatcher * matcher = new cv::FlannBasedMatcher(indexParams, searchParams); // instantiate FlannBased matcher
|
||||
rmatcher.setDescriptorMatcher(matcher); // set matcher
|
||||
@endcode
|
||||
Secondly, we have to call the matcher by using *robustMatch()* or *fastRobustMatch()* function.
|
||||
The difference of using this two functions is its computational cost. The first method is slower
|
||||
but more robust at filtering good matches because uses two ratio test and a symmetry test. In
|
||||
contrast, the second method is faster but less robust because only applies a single ratio test to
|
||||
the matches.
|
||||
|
||||
The following code is to get the model 3D points and its descriptors and then call the matcher in
|
||||
the main program:
|
||||
@code{.cpp}
|
||||
// Get the MODEL INFO
|
||||
|
||||
std::vector<cv::Point3f> list_points3d_model = model.get_points3d(); // list with model 3D coordinates
|
||||
cv::Mat descriptors_model = model.get_descriptors(); // list with descriptors of each 3D coordinate
|
||||
@endcode
|
||||
@code{.cpp}
|
||||
// -- Step 1: Robust matching between model descriptors and scene descriptors
|
||||
|
||||
std::vector<cv::DMatch> good_matches; // to obtain the model 3D points in the scene
|
||||
std::vector<cv::KeyPoint> keypoints_scene; // to obtain the 2D points of the scene
|
||||
|
||||
if(fast_match)
|
||||
{
|
||||
rmatcher.fastRobustMatch(frame, good_matches, keypoints_scene, descriptors_model);
|
||||
}
|
||||
else
|
||||
{
|
||||
rmatcher.robustMatch(frame, good_matches, keypoints_scene, descriptors_model);
|
||||
}
|
||||
@endcode
|
||||
The following code corresponds to the *robustMatch()* function which belongs to the
|
||||
*RobustMatcher* class. This function uses the given image to detect the keypoints and extract the
|
||||
descriptors, match using *two Nearest Neighbour* the extracted descriptors with the given model
|
||||
descriptors and vice versa. Then, a ratio test is applied to the two direction matches in order to
|
||||
remove these matches which its distance ratio between the first and second best match is larger
|
||||
than a given threshold. Finally, a symmetry test is applied in order the remove non symmetrical
|
||||
matches.
|
||||
@code{.cpp}
|
||||
void RobustMatcher::robustMatch( const cv::Mat& frame, std::vector<cv::DMatch>& good_matches,
|
||||
std::vector<cv::KeyPoint>& keypoints_frame,
|
||||
const std::vector<cv::KeyPoint>& keypoints_model, const cv::Mat& descriptors_model )
|
||||
{
|
||||
|
||||
// 1a. Detection of the ORB features
|
||||
this->computeKeyPoints(frame, keypoints_frame);
|
||||
|
||||
// 1b. Extraction of the ORB descriptors
|
||||
cv::Mat descriptors_frame;
|
||||
this->computeDescriptors(frame, keypoints_frame, descriptors_frame);
|
||||
|
||||
// 2. Match the two image descriptors
|
||||
std::vector<std::vector<cv::DMatch> > matches12, matches21;
|
||||
|
||||
// 2a. From image 1 to image 2
|
||||
matcher_->knnMatch(descriptors_frame, descriptors_model, matches12, 2); // return 2 nearest neighbours
|
||||
|
||||
// 2b. From image 2 to image 1
|
||||
matcher_->knnMatch(descriptors_model, descriptors_frame, matches21, 2); // return 2 nearest neighbours
|
||||
|
||||
// 3. Remove matches for which NN ratio is > than threshold
|
||||
// clean image 1 -> image 2 matches
|
||||
int removed1 = ratioTest(matches12);
|
||||
// clean image 2 -> image 1 matches
|
||||
int removed2 = ratioTest(matches21);
|
||||
|
||||
// 4. Remove non-symmetrical matches
|
||||
symmetryTest(matches12, matches21, good_matches);
|
||||
|
||||
}
|
||||
@endcode
|
||||
After the matches filtering we have to subtract the 2D and 3D correspondences from the found scene
|
||||
keypoints and our 3D model using the obtained *DMatches* vector. For more information about
|
||||
@ref cv::DMatch check the documentation.
|
||||
@code{.cpp}
|
||||
// -- Step 2: Find out the 2D/3D correspondences
|
||||
|
||||
std::vector<cv::Point3f> list_points3d_model_match; // container for the model 3D coordinates found in the scene
|
||||
std::vector<cv::Point2f> list_points2d_scene_match; // container for the model 2D coordinates found in the scene
|
||||
|
||||
for(unsigned int match_index = 0; match_index < good_matches.size(); ++match_index)
|
||||
{
|
||||
cv::Point3f point3d_model = list_points3d_model[ good_matches[match_index].trainIdx ]; // 3D point from model
|
||||
cv::Point2f point2d_scene = keypoints_scene[ good_matches[match_index].queryIdx ].pt; // 2D point from the scene
|
||||
list_points3d_model_match.push_back(point3d_model); // add 3D point
|
||||
list_points2d_scene_match.push_back(point2d_scene); // add 2D point
|
||||
}
|
||||
@endcode
|
||||
You can also change the ratio test threshold, the number of keypoints to detect as well as use or
|
||||
not the robust matcher:
|
||||
@code{.cpp}
|
||||
./cpp-tutorial-pnp_detection --ratio=0.8 --keypoints=1000 --fast=false
|
||||
@endcode
|
||||
|
||||
-# **Pose estimation using PnP + Ransac**
|
||||
|
||||
Once with the 2D and 3D correspondences we have to apply a PnP algorithm in order to estimate the
|
||||
camera pose. The reason why we have to use @ref cv::solvePnPRansac instead of @ref cv::solvePnP is
|
||||
due to the fact that after the matching not all the found correspondences are correct and, as like
|
||||
as not, there are false correspondences or also called *outliers*. The [Random Sample
|
||||
Consensus](http://en.wikipedia.org/wiki/RANSAC) or *Ransac* is a non-deterministic iterative
|
||||
method which estimate parameters of a mathematical model from observed data producing an
|
||||
approximate result as the number of iterations increase. After appyling *Ransac* all the *outliers*
|
||||
will be eliminated to then estimate the camera pose with a certain probability to obtain a good
|
||||
solution.
|
||||
|
||||
For the camera pose estimation I have implemented a *class* **PnPProblem**. This *class* has 4
|
||||
attributes: a given calibration matrix, the rotation matrix, the translation matrix and the
|
||||
rotation-translation matrix. The intrinsic calibration parameters of the camera which you are
|
||||
using to estimate the pose are necessary. In order to obtain the parameters you can check
|
||||
@ref tutorial_camera_calibration_square_chess and @ref tutorial_camera_calibration tutorials.
|
||||
|
||||
The following code is how to declare the *PnPProblem class* in the main program:
|
||||
@code{.cpp}
|
||||
// Intrinsic camera parameters: UVC WEBCAM
|
||||
|
||||
double f = 55; // focal length in mm
|
||||
double sx = 22.3, sy = 14.9; // sensor size
|
||||
double width = 640, height = 480; // image size
|
||||
|
||||
double params_WEBCAM[] = { width*f/sx, // fx
|
||||
height*f/sy, // fy
|
||||
width/2, // cx
|
||||
height/2}; // cy
|
||||
|
||||
PnPProblem pnp_detection(params_WEBCAM); // instantiate PnPProblem class
|
||||
@endcode
|
||||
The following code is how the *PnPProblem class* initialises its attributes:
|
||||
@code{.cpp}
|
||||
// Custom constructor given the intrinsic camera parameters
|
||||
|
||||
PnPProblem::PnPProblem(const double params[])
|
||||
{
|
||||
_A_matrix = cv::Mat::zeros(3, 3, CV_64FC1); // intrinsic camera parameters
|
||||
_A_matrix.at<double>(0, 0) = params[0]; // [ fx 0 cx ]
|
||||
_A_matrix.at<double>(1, 1) = params[1]; // [ 0 fy cy ]
|
||||
_A_matrix.at<double>(0, 2) = params[2]; // [ 0 0 1 ]
|
||||
_A_matrix.at<double>(1, 2) = params[3];
|
||||
_A_matrix.at<double>(2, 2) = 1;
|
||||
_R_matrix = cv::Mat::zeros(3, 3, CV_64FC1); // rotation matrix
|
||||
_t_matrix = cv::Mat::zeros(3, 1, CV_64FC1); // translation matrix
|
||||
_P_matrix = cv::Mat::zeros(3, 4, CV_64FC1); // rotation-translation matrix
|
||||
|
||||
}
|
||||
@endcode
|
||||
OpenCV provides four PnP methods: ITERATIVE, EPNP, P3P and DLS. Depending on the application type,
|
||||
the estimation method will be different. In the case that we want to make a real time application,
|
||||
the more suitable methods are EPNP and P3P since they are faster than ITERATIVE and DLS at
|
||||
finding an optimal solution. However, EPNP and P3P are not especially robust in front of planar
|
||||
surfaces and sometimes the pose estimation seems to have a mirror effect. Therefore, in this
|
||||
tutorial an ITERATIVE method is used due to the object to be detected has planar surfaces.
|
||||
|
||||
The OpenCV RANSAC implementation wants you to provide three parameters: 1) the maximum number of
|
||||
iterations until the algorithm stops, 2) the maximum allowed distance between the observed and
|
||||
computed point projections to consider it an inlier and 3) the confidence to obtain a good result.
|
||||
You can tune these parameters in order to improve your algorithm performance. Increasing the
|
||||
number of iterations will have a more accurate solution, but will take more time to find a
|
||||
solution. Increasing the reprojection error will reduce the computation time, but your solution
|
||||
will be unaccurate. Decreasing the confidence your algorithm will be faster, but the obtained
|
||||
solution will be unaccurate.
|
||||
|
||||
The following parameters work for this application:
|
||||
@code{.cpp}
|
||||
// RANSAC parameters
|
||||
|
||||
int iterationsCount = 500; // number of Ransac iterations.
|
||||
float reprojectionError = 2.0; // maximum allowed distance to consider it an inlier.
|
||||
float confidence = 0.95; // RANSAC successful confidence.
|
||||
@endcode
|
||||
The following code corresponds to the *estimatePoseRANSAC()* function which belongs to the
|
||||
*PnPProblem class*. This function estimates the rotation and translation matrix given a set of
|
||||
2D/3D correspondences, the desired PnP method to use, the output inliers container and the Ransac
|
||||
parameters:
|
||||
@code{.cpp}
|
||||
// Estimate the pose given a list of 2D/3D correspondences with RANSAC and the method to use
|
||||
|
||||
void PnPProblem::estimatePoseRANSAC( const std::vector<cv::Point3f> &list_points3d, // list with model 3D coordinates
|
||||
const std::vector<cv::Point2f> &list_points2d, // list with scene 2D coordinates
|
||||
int flags, cv::Mat &inliers, int iterationsCount, // PnP method; inliers container
|
||||
float reprojectionError, float confidence ) // RANSAC parameters
|
||||
{
|
||||
cv::Mat distCoeffs = cv::Mat::zeros(4, 1, CV_64FC1); // vector of distortion coefficients
|
||||
cv::Mat rvec = cv::Mat::zeros(3, 1, CV_64FC1); // output rotation vector
|
||||
cv::Mat tvec = cv::Mat::zeros(3, 1, CV_64FC1); // output translation vector
|
||||
|
||||
bool useExtrinsicGuess = false; // if true the function uses the provided rvec and tvec values as
|
||||
// initial approximations of the rotation and translation vectors
|
||||
|
||||
cv::solvePnPRansac( list_points3d, list_points2d, _A_matrix, distCoeffs, rvec, tvec,
|
||||
useExtrinsicGuess, iterationsCount, reprojectionError, confidence,
|
||||
inliers, flags );
|
||||
|
||||
Rodrigues(rvec,_R_matrix); // converts Rotation Vector to Matrix
|
||||
_t_matrix = tvec; // set translation matrix
|
||||
|
||||
this->set_P_matrix(_R_matrix, _t_matrix); // set rotation-translation matrix
|
||||
|
||||
}
|
||||
@endcode
|
||||
In the following code are the 3th and 4th steps of the main algorithm. The first, calling the
|
||||
above function and the second taking the output inliers vector from RANSAC to get the 2D scene
|
||||
points for drawing purpose. As seen in the code we must be sure to apply RANSAC if we have
|
||||
matches, in the other case, the function @ref cv::solvePnPRansac crashes due to any OpenCV *bug*.
|
||||
@code{.cpp}
|
||||
if(good_matches.size() > 0) // None matches, then RANSAC crashes
|
||||
{
|
||||
|
||||
// -- Step 3: Estimate the pose using RANSAC approach
|
||||
pnp_detection.estimatePoseRANSAC( list_points3d_model_match, list_points2d_scene_match,
|
||||
pnpMethod, inliers_idx, iterationsCount, reprojectionError, confidence );
|
||||
|
||||
|
||||
// -- Step 4: Catch the inliers keypoints to draw
|
||||
for(int inliers_index = 0; inliers_index < inliers_idx.rows; ++inliers_index)
|
||||
{
|
||||
int n = inliers_idx.at<int>(inliers_index); // i-inlier
|
||||
cv::Point2f point2d = list_points2d_scene_match[n]; // i-inlier point 2D
|
||||
list_points2d_inliers.push_back(point2d); // add i-inlier to list
|
||||
}
|
||||
@endcode
|
||||
Finally, once the camera pose has been estimated we can use the \f$R\f$ and \f$t\f$ in order to compute
|
||||
the 2D projection onto the image of a given 3D point expressed in a world reference frame using
|
||||
the showed formula on *Theory*.
|
||||
|
||||
The following code corresponds to the *backproject3DPoint()* function which belongs to the
|
||||
*PnPProblem class*. The function backproject a given 3D point expressed in a world reference frame
|
||||
onto a 2D image:
|
||||
@code{.cpp}
|
||||
// Backproject a 3D point to 2D using the estimated pose parameters
|
||||
|
||||
cv::Point2f PnPProblem::backproject3DPoint(const cv::Point3f &point3d)
|
||||
{
|
||||
// 3D point vector [x y z 1]'
|
||||
cv::Mat point3d_vec = cv::Mat(4, 1, CV_64FC1);
|
||||
point3d_vec.at<double>(0) = point3d.x;
|
||||
point3d_vec.at<double>(1) = point3d.y;
|
||||
point3d_vec.at<double>(2) = point3d.z;
|
||||
point3d_vec.at<double>(3) = 1;
|
||||
|
||||
// 2D point vector [u v 1]'
|
||||
cv::Mat point2d_vec = cv::Mat(4, 1, CV_64FC1);
|
||||
point2d_vec = _A_matrix * _P_matrix * point3d_vec;
|
||||
|
||||
// Normalization of [u v]'
|
||||
cv::Point2f point2d;
|
||||
point2d.x = point2d_vec.at<double>(0) / point2d_vec.at<double>(2);
|
||||
point2d.y = point2d_vec.at<double>(1) / point2d_vec.at<double>(2);
|
||||
|
||||
return point2d;
|
||||
}
|
||||
@endcode
|
||||
The above function is used to compute all the 3D points of the object *Mesh* to show the pose of
|
||||
the object.
|
||||
|
||||
You can also change RANSAC parameters and PnP method:
|
||||
@code{.cpp}
|
||||
./cpp-tutorial-pnp_detection --error=0.25 --confidence=0.90 --iterations=250 --method=3
|
||||
@endcode
|
||||
|
||||
-# **Linear Kalman Filter for bad poses rejection**
|
||||
|
||||
Is it common in computer vision or robotics fields that after applying detection or tracking
|
||||
techniques, bad results are obtained due to some sensor errors. In order to avoid these bad
|
||||
detections in this tutorial is explained how to implement a Linear Kalman Filter. The Kalman
|
||||
Filter will be applied after detected a given number of inliers.
|
||||
|
||||
You can find more information about what [Kalman
|
||||
Filter](http://en.wikipedia.org/wiki/Kalman_filter) is. In this tutorial it's used the OpenCV
|
||||
implementation of the @ref cv::KalmanFilter based on
|
||||
[Linear Kalman Filter for position and orientation tracking](http://campar.in.tum.de/Chair/KalmanFilter)
|
||||
to set the dynamics and measurement models.
|
||||
|
||||
Firstly, we have to define our state vector which will have 18 states: the positional data (x,y,z)
|
||||
with its first and second derivatives (velocity and acceleration), then rotation is added in form
|
||||
of three euler angles (roll, pitch, jaw) together with their first and second derivatives (angular
|
||||
velocity and acceleration)
|
||||
|
||||
\f[X = (x,y,z,\dot x,\dot y,\dot z,\ddot x,\ddot y,\ddot z,\psi,\theta,\phi,\dot \psi,\dot \theta,\dot \phi,\ddot \psi,\ddot \theta,\ddot \phi)^T\f]
|
||||
|
||||
Secondly, we have to define the number of measurements which will be 6: from \f$R\f$ and \f$t\f$ we can
|
||||
extract \f$(x,y,z)\f$ and \f$(\psi,\theta,\phi)\f$. In addition, we have to define the number of control
|
||||
actions to apply to the system which in this case will be *zero*. Finally, we have to define the
|
||||
differential time between measurements which in this case is \f$1/T\f$, where *T* is the frame rate of
|
||||
the video.
|
||||
@code{.cpp}
|
||||
cv::KalmanFilter KF; // instantiate Kalman Filter
|
||||
|
||||
int nStates = 18; // the number of states
|
||||
int nMeasurements = 6; // the number of measured states
|
||||
int nInputs = 0; // the number of action control
|
||||
|
||||
double dt = 0.125; // time between measurements (1/FPS)
|
||||
|
||||
initKalmanFilter(KF, nStates, nMeasurements, nInputs, dt); // init function
|
||||
@endcode
|
||||
The following code corresponds to the *Kalman Filter* initialisation. Firstly, is set the process
|
||||
noise, the measurement noise and the error covariance matrix. Secondly, are set the transition
|
||||
matrix which is the dynamic model and finally the measurement matrix, which is the measurement
|
||||
model.
|
||||
|
||||
You can tune the process and measurement noise to improve the *Kalman Filter* performance. As the
|
||||
measurement noise is reduced the faster will converge doing the algorithm sensitive in front of
|
||||
bad measurements.
|
||||
@code{.cpp}
|
||||
void initKalmanFilter(cv::KalmanFilter &KF, int nStates, int nMeasurements, int nInputs, double dt)
|
||||
{
|
||||
|
||||
KF.init(nStates, nMeasurements, nInputs, CV_64F); // init Kalman Filter
|
||||
|
||||
cv::setIdentity(KF.processNoiseCov, cv::Scalar::all(1e-5)); // set process noise
|
||||
cv::setIdentity(KF.measurementNoiseCov, cv::Scalar::all(1e-4)); // set measurement noise
|
||||
cv::setIdentity(KF.errorCovPost, cv::Scalar::all(1)); // error covariance
|
||||
|
||||
|
||||
/* DYNAMIC MODEL */
|
||||
|
||||
// [1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2 0 0]
|
||||
// [0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2 0]
|
||||
// [0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2]
|
||||
// [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0]
|
||||
// [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0]
|
||||
// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt]
|
||||
// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0]
|
||||
// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
|
||||
// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
|
||||
|
||||
// position
|
||||
KF.transitionMatrix.at<double>(0,3) = dt;
|
||||
KF.transitionMatrix.at<double>(1,4) = dt;
|
||||
KF.transitionMatrix.at<double>(2,5) = dt;
|
||||
KF.transitionMatrix.at<double>(3,6) = dt;
|
||||
KF.transitionMatrix.at<double>(4,7) = dt;
|
||||
KF.transitionMatrix.at<double>(5,8) = dt;
|
||||
KF.transitionMatrix.at<double>(0,6) = 0.5*pow(dt,2);
|
||||
KF.transitionMatrix.at<double>(1,7) = 0.5*pow(dt,2);
|
||||
KF.transitionMatrix.at<double>(2,8) = 0.5*pow(dt,2);
|
||||
|
||||
// orientation
|
||||
KF.transitionMatrix.at<double>(9,12) = dt;
|
||||
KF.transitionMatrix.at<double>(10,13) = dt;
|
||||
KF.transitionMatrix.at<double>(11,14) = dt;
|
||||
KF.transitionMatrix.at<double>(12,15) = dt;
|
||||
KF.transitionMatrix.at<double>(13,16) = dt;
|
||||
KF.transitionMatrix.at<double>(14,17) = dt;
|
||||
KF.transitionMatrix.at<double>(9,15) = 0.5*pow(dt,2);
|
||||
KF.transitionMatrix.at<double>(10,16) = 0.5*pow(dt,2);
|
||||
KF.transitionMatrix.at<double>(11,17) = 0.5*pow(dt,2);
|
||||
|
||||
|
||||
/* MEASUREMENT MODEL */
|
||||
|
||||
// [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
|
||||
// [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
|
||||
|
||||
KF.measurementMatrix.at<double>(0,0) = 1; // x
|
||||
KF.measurementMatrix.at<double>(1,1) = 1; // y
|
||||
KF.measurementMatrix.at<double>(2,2) = 1; // z
|
||||
KF.measurementMatrix.at<double>(3,9) = 1; // roll
|
||||
KF.measurementMatrix.at<double>(4,10) = 1; // pitch
|
||||
KF.measurementMatrix.at<double>(5,11) = 1; // yaw
|
||||
|
||||
}
|
||||
@endcode
|
||||
In the following code is the 5th step of the main algorithm. When the obtained number of inliers
|
||||
after *Ransac* is over the threshold, the measurements matrix is filled and then the *Kalman
|
||||
Filter* is updated:
|
||||
@code{.cpp}
|
||||
// -- Step 5: Kalman Filter
|
||||
|
||||
// GOOD MEASUREMENT
|
||||
if( inliers_idx.rows >= minInliersKalman )
|
||||
{
|
||||
|
||||
// Get the measured translation
|
||||
cv::Mat translation_measured(3, 1, CV_64F);
|
||||
translation_measured = pnp_detection.get_t_matrix();
|
||||
|
||||
// Get the measured rotation
|
||||
cv::Mat rotation_measured(3, 3, CV_64F);
|
||||
rotation_measured = pnp_detection.get_R_matrix();
|
||||
|
||||
// fill the measurements vector
|
||||
fillMeasurements(measurements, translation_measured, rotation_measured);
|
||||
|
||||
}
|
||||
|
||||
// Instantiate estimated translation and rotation
|
||||
cv::Mat translation_estimated(3, 1, CV_64F);
|
||||
cv::Mat rotation_estimated(3, 3, CV_64F);
|
||||
|
||||
// update the Kalman filter with good measurements
|
||||
updateKalmanFilter( KF, measurements,
|
||||
translation_estimated, rotation_estimated);
|
||||
@endcode
|
||||
The following code corresponds to the *fillMeasurements()* function which converts the measured
|
||||
[Rotation Matrix to Eulers
|
||||
angles](http://euclideanspace.com/maths/geometry/rotations/conversions/matrixToEuler/index.htm)
|
||||
and fill the measurements matrix along with the measured translation vector:
|
||||
@code{.cpp}
|
||||
void fillMeasurements( cv::Mat &measurements,
|
||||
const cv::Mat &translation_measured, const cv::Mat &rotation_measured)
|
||||
{
|
||||
// Convert rotation matrix to euler angles
|
||||
cv::Mat measured_eulers(3, 1, CV_64F);
|
||||
measured_eulers = rot2euler(rotation_measured);
|
||||
|
||||
// Set measurement to predict
|
||||
measurements.at<double>(0) = translation_measured.at<double>(0); // x
|
||||
measurements.at<double>(1) = translation_measured.at<double>(1); // y
|
||||
measurements.at<double>(2) = translation_measured.at<double>(2); // z
|
||||
measurements.at<double>(3) = measured_eulers.at<double>(0); // roll
|
||||
measurements.at<double>(4) = measured_eulers.at<double>(1); // pitch
|
||||
measurements.at<double>(5) = measured_eulers.at<double>(2); // yaw
|
||||
}
|
||||
@endcode
|
||||
The following code corresponds to the *updateKalmanFilter()* function which update the Kalman
|
||||
Filter and set the estimated Rotation Matrix and translation vector. The estimated Rotation Matrix
|
||||
comes from the estimated [Euler angles to Rotation
|
||||
Matrix](http://euclideanspace.com/maths/geometry/rotations/conversions/eulerToMatrix/index.htm).
|
||||
@code{.cpp}
|
||||
void updateKalmanFilter( cv::KalmanFilter &KF, cv::Mat &measurement,
|
||||
cv::Mat &translation_estimated, cv::Mat &rotation_estimated )
|
||||
{
|
||||
|
||||
// First predict, to update the internal statePre variable
|
||||
cv::Mat prediction = KF.predict();
|
||||
|
||||
// The "correct" phase that is going to use the predicted value and our measurement
|
||||
cv::Mat estimated = KF.correct(measurement);
|
||||
|
||||
// Estimated translation
|
||||
translation_estimated.at<double>(0) = estimated.at<double>(0);
|
||||
translation_estimated.at<double>(1) = estimated.at<double>(1);
|
||||
translation_estimated.at<double>(2) = estimated.at<double>(2);
|
||||
|
||||
// Estimated euler angles
|
||||
cv::Mat eulers_estimated(3, 1, CV_64F);
|
||||
eulers_estimated.at<double>(0) = estimated.at<double>(9);
|
||||
eulers_estimated.at<double>(1) = estimated.at<double>(10);
|
||||
eulers_estimated.at<double>(2) = estimated.at<double>(11);
|
||||
|
||||
// Convert estimated quaternion to rotation matrix
|
||||
rotation_estimated = euler2rot(eulers_estimated);
|
||||
|
||||
}
|
||||
@endcode
|
||||
The 6th step is set the estimated rotation-translation matrix:
|
||||
@code{.cpp}
|
||||
// -- Step 6: Set estimated projection matrix
|
||||
pnp_detection_est.set_P_matrix(rotation_estimated, translation_estimated);
|
||||
@endcode
|
||||
The last and optional step is draw the found pose. To do it I implemented a function to draw all
|
||||
the mesh 3D points and an extra reference axis:
|
||||
@code{.cpp}
|
||||
// -- Step X: Draw pose
|
||||
|
||||
drawObjectMesh(frame_vis, &mesh, &pnp_detection, green); // draw current pose
|
||||
drawObjectMesh(frame_vis, &mesh, &pnp_detection_est, yellow); // draw estimated pose
|
||||
|
||||
double l = 5;
|
||||
std::vector<cv::Point2f> pose_points2d;
|
||||
pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,0))); // axis center
|
||||
pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(l,0,0))); // axis x
|
||||
pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,l,0))); // axis y
|
||||
pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,l))); // axis z
|
||||
draw3DCoordinateAxes(frame_vis, pose_points2d); // draw axes
|
||||
@endcode
|
||||
You can also modify the minimum inliers to update Kalman Filter:
|
||||
@code{.cpp}
|
||||
./cpp-tutorial-pnp_detection --inliers=20
|
||||
@endcode
|
||||
|
||||
Results
|
||||
-------
|
||||
|
||||
The following videos are the results of pose estimation in real time using the explained detection
|
||||
algorithm using the following parameters:
|
||||
@code{.cpp}
|
||||
// Robust Matcher parameters
|
||||
|
||||
int numKeyPoints = 2000; // number of detected keypoints
|
||||
float ratio = 0.70f; // ratio test
|
||||
bool fast_match = true; // fastRobustMatch() or robustMatch()
|
||||
|
||||
|
||||
// RANSAC parameters
|
||||
|
||||
int iterationsCount = 500; // number of Ransac iterations.
|
||||
int reprojectionError = 2.0; // maximum allowed distance to consider it an inlier.
|
||||
float confidence = 0.95; // ransac successful confidence.
|
||||
|
||||
|
||||
// Kalman Filter parameters
|
||||
|
||||
int minInliersKalman = 30; // Kalman threshold updating
|
||||
@endcode
|
||||
You can watch the real time pose estimation on the [YouTube
|
||||
here](http://www.youtube.com/user/opencvdev/videos).
|
||||
|
||||
@youtube{XNATklaJlSQ}
|
||||
@youtube{YLS9bWek78k}
|
||||
@@ -0,0 +1,50 @@
|
||||
Camera calibration and 3D reconstruction (calib3d module) {#tutorial_table_of_content_calib3d}
|
||||
==========================================================
|
||||
|
||||
Although we get most of our images in a 2D format they do come from a 3D world. Here you will learn how to find out 3D world information from 2D images.
|
||||
|
||||
- @subpage tutorial_camera_calibration_pattern
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Laurent Berger
|
||||
|
||||
You will learn how to create some calibration pattern.
|
||||
|
||||
- @subpage tutorial_camera_calibration_square_chess
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Victor Eruhimov
|
||||
|
||||
You will use some chessboard images to calibrate your camera.
|
||||
|
||||
- @subpage tutorial_camera_calibration
|
||||
|
||||
*Compatibility:* \> OpenCV 4.0
|
||||
|
||||
*Author:* Bernát Gábor
|
||||
|
||||
Camera calibration by using either the chessboard, circle or the asymmetrical circle
|
||||
pattern. Get the images either from a camera attached, a video file or from an image
|
||||
collection.
|
||||
|
||||
- @subpage tutorial_real_time_pose
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Edgar Riba
|
||||
|
||||
Real time pose estimation of a textured object using ORB features, FlannBased matcher, PnP
|
||||
approach plus Ransac and Linear Kalman Filter to reject possible bad poses.
|
||||
|
||||
- @subpage tutorial_interactive_calibration
|
||||
|
||||
*Compatibility:* \> OpenCV 3.1
|
||||
|
||||
*Author:* Vladislav Sovrasov
|
||||
|
||||
Camera calibration by using either the chessboard, chAruco, asymmetrical circle or dual asymmetrical circle
|
||||
pattern. Calibration process is continuous, so you can see results after each new pattern shot.
|
||||
As an output you get average reprojection error, intrinsic camera parameters, distortion coefficients and
|
||||
confidence intervals for all of evaluated variables.
|
||||
@@ -0,0 +1,115 @@
|
||||
Adding (blending) two images using OpenCV {#tutorial_adding_images}
|
||||
=========================================
|
||||
|
||||
@prev_tutorial{tutorial_mat_operations}
|
||||
@next_tutorial{tutorial_basic_linear_transform}
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this tutorial you will learn:
|
||||
|
||||
- what is *linear blending* and why it is useful;
|
||||
- how to add two images using **addWeighted()**
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
@note
|
||||
The explanation below belongs to the book [Computer Vision: Algorithms and
|
||||
Applications](http://szeliski.org/Book/) by Richard Szeliski
|
||||
|
||||
From our previous tutorial, we know already a bit of *Pixel operators*. An interesting dyadic
|
||||
(two-input) operator is the *linear blend operator*:
|
||||
|
||||
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
|
||||
|
||||
By varying \f$\alpha\f$ from \f$0 \rightarrow 1\f$ this operator can be used to perform a temporal
|
||||
*cross-dissolve* between two images or videos, as seen in slide shows and film productions (cool,
|
||||
eh?)
|
||||
|
||||
Source Code
|
||||
-----------
|
||||
|
||||
@add_toggle_cpp
|
||||
Download the source code from
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/AddingImages/AddingImages.cpp).
|
||||
@include cpp/tutorial_code/core/AddingImages/AddingImages.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
Download the source code from
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/AddingImages/AddingImages.java).
|
||||
@include java/tutorial_code/core/AddingImages/AddingImages.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
Download the source code from
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/AddingImages/adding_images.py).
|
||||
@include python/tutorial_code/core/AddingImages/adding_images.py
|
||||
@end_toggle
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
Since we are going to perform:
|
||||
|
||||
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
|
||||
|
||||
We need two source images (\f$f_{0}(x)\f$ and \f$f_{1}(x)\f$). So, we load them in the usual way:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/AddingImages/AddingImages.java load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/AddingImages/adding_images.py load
|
||||
@end_toggle
|
||||
|
||||
We used the following images: [LinuxLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/LinuxLogo.jpg) and [WindowsLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/WindowsLogo.jpg)
|
||||
|
||||
@warning Since we are *adding* *src1* and *src2*, they both have to be of the same size
|
||||
(width and height) and type.
|
||||
|
||||
Now we need to generate the `g(x)` image. For this, the function **addWeighted()** comes quite handy:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp blend_images
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/AddingImages/AddingImages.java blend_images
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/AddingImages/adding_images.py blend_images
|
||||
Numpy version of above line (but cv function is around 2x faster):
|
||||
\code{.py}
|
||||
dst = np.uint8(alpha*(img1)+beta*(img2))
|
||||
\endcode
|
||||
@end_toggle
|
||||
|
||||
since **addWeighted()** produces:
|
||||
\f[dst = \alpha \cdot src1 + \beta \cdot src2 + \gamma\f]
|
||||
In this case, `gamma` is the argument \f$0.0\f$ in the code above.
|
||||
|
||||
Create windows, show the images and wait for the user to end the program.
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp display
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/AddingImages/AddingImages.java display
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/AddingImages/adding_images.py display
|
||||
@end_toggle
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||

|
||||
|
After Width: | Height: | Size: 6.4 KiB |
@@ -0,0 +1,318 @@
|
||||
Changing the contrast and brightness of an image! {#tutorial_basic_linear_transform}
|
||||
=================================================
|
||||
|
||||
@prev_tutorial{tutorial_adding_images}
|
||||
@next_tutorial{tutorial_discrete_fourier_transform}
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this tutorial you will learn how to:
|
||||
|
||||
- Access pixel values
|
||||
- Initialize a matrix with zeros
|
||||
- Learn what @ref cv::saturate_cast does and why it is useful
|
||||
- Get some cool info about pixel transformations
|
||||
- Improve the brightness of an image on a practical example
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
@note
|
||||
The explanation below belongs to the book [Computer Vision: Algorithms and
|
||||
Applications](http://szeliski.org/Book/) by Richard Szeliski
|
||||
|
||||
### Image Processing
|
||||
|
||||
- A general image processing operator is a function that takes one or more input images and
|
||||
produces an output image.
|
||||
- Image transforms can be seen as:
|
||||
- Point operators (pixel transforms)
|
||||
- Neighborhood (area-based) operators
|
||||
|
||||
### Pixel Transforms
|
||||
|
||||
- In this kind of image processing transform, each output pixel's value depends on only the
|
||||
corresponding input pixel value (plus, potentially, some globally collected information or
|
||||
parameters).
|
||||
- Examples of such operators include *brightness and contrast adjustments* as well as color
|
||||
correction and transformations.
|
||||
|
||||
### Brightness and contrast adjustments
|
||||
|
||||
- Two commonly used point processes are *multiplication* and *addition* with a constant:
|
||||
|
||||
\f[g(x) = \alpha f(x) + \beta\f]
|
||||
|
||||
- The parameters \f$\alpha > 0\f$ and \f$\beta\f$ are often called the *gain* and *bias* parameters;
|
||||
sometimes these parameters are said to control *contrast* and *brightness* respectively.
|
||||
- You can think of \f$f(x)\f$ as the source image pixels and \f$g(x)\f$ as the output image pixels. Then,
|
||||
more conveniently we can write the expression as:
|
||||
|
||||
\f[g(i,j) = \alpha \cdot f(i,j) + \beta\f]
|
||||
|
||||
where \f$i\f$ and \f$j\f$ indicates that the pixel is located in the *i-th* row and *j-th* column.
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
@add_toggle_cpp
|
||||
- **Downloadable code**: Click
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp)
|
||||
|
||||
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
|
||||
@include samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
- **Downloadable code**: Click
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java)
|
||||
|
||||
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
|
||||
@include samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
- **Downloadable code**: Click
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py)
|
||||
|
||||
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
|
||||
@include samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py
|
||||
@end_toggle
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
- We load an image using @ref cv::imread and save it in a Mat object:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-load
|
||||
@end_toggle
|
||||
|
||||
- Now, since we will make some transformations to this image, we need a new Mat object to store
|
||||
it. Also, we want this to have the following features:
|
||||
|
||||
- Initial pixel values equal to zero
|
||||
- Same size and type as the original image
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-output
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-output
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-output
|
||||
@end_toggle
|
||||
|
||||
We observe that @ref cv::Mat::zeros returns a Matlab-style zero initializer based on
|
||||
*image.size()* and *image.type()*
|
||||
|
||||
- We ask now the values of \f$\alpha\f$ and \f$\beta\f$ to be entered by the user:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-parameters
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-parameters
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-parameters
|
||||
@end_toggle
|
||||
|
||||
- Now, to perform the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ we will access to each
|
||||
pixel in image. Since we are operating with BGR images, we will have three values per pixel (B,
|
||||
G and R), so we will also access them separately. Here is the piece of code:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-operation
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-operation
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-operation
|
||||
@end_toggle
|
||||
|
||||
Notice the following (**C++ code only**):
|
||||
- To access each pixel in the images we are using this syntax: *image.at\<Vec3b\>(y,x)[c]*
|
||||
where *y* is the row, *x* is the column and *c* is B, G or R (0, 1 or 2).
|
||||
- Since the operation \f$\alpha \cdot p(i,j) + \beta\f$ can give values out of range or not
|
||||
integers (if \f$\alpha\f$ is float), we use cv::saturate_cast to make sure the
|
||||
values are valid.
|
||||
|
||||
- Finally, we create windows and show the images, the usual way.
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-display
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-display
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-display
|
||||
@end_toggle
|
||||
|
||||
@note
|
||||
Instead of using the **for** loops to access each pixel, we could have simply used this command:
|
||||
|
||||
@add_toggle_cpp
|
||||
@code{.cpp}
|
||||
image.convertTo(new_image, -1, alpha, beta);
|
||||
@endcode
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@code{.java}
|
||||
image.convertTo(newImage, -1, alpha, beta);
|
||||
@endcode
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@code{.py}
|
||||
new_image = cv.convertScaleAbs(image, alpha=alpha, beta=beta)
|
||||
@endcode
|
||||
@end_toggle
|
||||
|
||||
where @ref cv::Mat::convertTo would effectively perform *new_image = a*image + beta\*. However, we
|
||||
wanted to show you how to access each pixel. In any case, both methods give the same result but
|
||||
convertTo is more optimized and works a lot faster.
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||
- Running our code and using \f$\alpha = 2.2\f$ and \f$\beta = 50\f$
|
||||
@code{.bash}
|
||||
$ ./BasicLinearTransforms lena.jpg
|
||||
Basic Linear Transforms
|
||||
-------------------------
|
||||
* Enter the alpha value [1.0-3.0]: 2.2
|
||||
* Enter the beta value [0-100]: 50
|
||||
@endcode
|
||||
|
||||
- We get this:
|
||||
|
||||

|
||||
|
||||
Practical example
|
||||
----
|
||||
|
||||
In this paragraph, we will put into practice what we have learned to correct an underexposed image by adjusting the brightness
|
||||
and the contrast of the image. We will also see another technique to correct the brightness of an image called
|
||||
gamma correction.
|
||||
|
||||
### Brightness and contrast adjustments
|
||||
|
||||
Increasing (/ decreasing) the \f$\beta\f$ value will add (/ subtract) a constant value to every pixel. Pixel values outside of the [0 ; 255]
|
||||
range will be saturated (i.e. a pixel value higher (/ lesser) than 255 (/ 0) will be clamped to 255 (/ 0)).
|
||||
|
||||

|
||||
|
||||
The histogram represents for each color level the number of pixels with that color level. A dark image will have many pixels with
|
||||
low color value and thus the histogram will present a peak in its left part. When adding a constant bias, the histogram is shifted to the
|
||||
right as we have added a constant bias to all the pixels.
|
||||
|
||||
The \f$\alpha\f$ parameter will modify how the levels spread. If \f$ \alpha < 1 \f$, the color levels will be compressed and the result
|
||||
will be an image with less contrast.
|
||||
|
||||

|
||||
|
||||
Note that these histograms have been obtained using the Brightness-Contrast tool in the Gimp software. The brightness tool should be
|
||||
identical to the \f$\beta\f$ bias parameters but the contrast tool seems to differ to the \f$\alpha\f$ gain where the output range
|
||||
seems to be centered with Gimp (as you can notice in the previous histogram).
|
||||
|
||||
It can occur that playing with the \f$\beta\f$ bias will improve the brightness but in the same time the image will appear with a
|
||||
slight veil as the contrast will be reduced. The \f$\alpha\f$ gain can be used to diminue this effect but due to the saturation,
|
||||
we will lose some details in the original bright regions.
|
||||
|
||||
### Gamma correction
|
||||
|
||||
[Gamma correction](https://en.wikipedia.org/wiki/Gamma_correction) can be used to correct the brightness of an image by using a non
|
||||
linear transformation between the input values and the mapped output values:
|
||||
|
||||
\f[O = \left( \frac{I}{255} \right)^{\gamma} \times 255\f]
|
||||
|
||||
As this relation is non linear, the effect will not be the same for all the pixels and will depend to their original value.
|
||||
|
||||

|
||||
|
||||
When \f$ \gamma < 1 \f$, the original dark regions will be brighter and the histogram will be shifted to the right whereas it will
|
||||
be the opposite with \f$ \gamma > 1 \f$.
|
||||
|
||||
### Correct an underexposed image
|
||||
|
||||
The following image has been corrected with: \f$ \alpha = 1.3 \f$ and \f$ \beta = 40 \f$.
|
||||
|
||||
![By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons](images/Basic_Linear_Transform_Tutorial_linear_transform_correction.jpg)
|
||||
|
||||
The overall brightness has been improved but you can notice that the clouds are now greatly saturated due to the numerical saturation
|
||||
of the implementation used ([highlight clipping](https://en.wikipedia.org/wiki/Clipping_(photography)) in photography).
|
||||
|
||||
The following image has been corrected with: \f$ \gamma = 0.4 \f$.
|
||||
|
||||
![By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons](images/Basic_Linear_Transform_Tutorial_gamma_correction.jpg)
|
||||
|
||||
The gamma correction should tend to add less saturation effect as the mapping is non linear and there is no numerical saturation possible as in the previous method.
|
||||
|
||||

|
||||
|
||||
The previous figure compares the histograms for the three images (the y-ranges are not the same between the three histograms).
|
||||
You can notice that most of the pixel values are in the lower part of the histogram for the original image. After \f$ \alpha \f$,
|
||||
\f$ \beta \f$ correction, we can observe a big peak at 255 due to the saturation as well as a shift in the right.
|
||||
After gamma correction, the histogram is shifted to the right but the pixels in the dark regions are more shifted
|
||||
(see the gamma curves [figure](Basic_Linear_Transform_Tutorial_gamma.png)) than those in the bright regions.
|
||||
|
||||
In this tutorial, you have seen two simple methods to adjust the contrast and the brightness of an image. **They are basic techniques
|
||||
and are not intended to be used as a replacement of a raster graphics editor!**
|
||||
|
||||
### Code
|
||||
|
||||
@add_toggle_cpp
|
||||
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp).
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java).
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py).
|
||||
@end_toggle
|
||||
|
||||
Code for the gamma correction:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp changing-contrast-brightness-gamma-correction
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java changing-contrast-brightness-gamma-correction
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py changing-contrast-brightness-gamma-correction
|
||||
@end_toggle
|
||||
|
||||
A look-up table is used to improve the performance of the computation as only 256 values needs to be calculated once.
|
||||
|
||||
### Additional resources
|
||||
|
||||
- [Gamma correction in graphics rendering](https://learnopengl.com/#!Advanced-Lighting/Gamma-Correction)
|
||||
- [Gamma correction and images displayed on CRT monitors](http://www.graphics.cornell.edu/~westin/gamma/gamma.html)
|
||||
- [Digital exposure techniques](http://www.cambridgeincolour.com/tutorials/digital-exposure-techniques.htm)
|
||||
|
After Width: | Height: | Size: 28 KiB |
|
After Width: | Height: | Size: 90 KiB |
|
After Width: | Height: | Size: 270 KiB |
|
After Width: | Height: | Size: 3.1 KiB |
|
After Width: | Height: | Size: 3.4 KiB |
|
After Width: | Height: | Size: 1.4 KiB |
|
After Width: | Height: | Size: 222 KiB |
@@ -0,0 +1,238 @@
|
||||
Discrete Fourier Transform {#tutorial_discrete_fourier_transform}
|
||||
==========================
|
||||
|
||||
@prev_tutorial{tutorial_basic_linear_transform}
|
||||
@next_tutorial{tutorial_file_input_output_with_xml_yml}
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
We'll seek answers for the following questions:
|
||||
|
||||
- What is a Fourier transform and why use it?
|
||||
- How to do it in OpenCV?
|
||||
- Usage of functions such as: **copyMakeBorder()** , **merge()** , **dft()** ,
|
||||
**getOptimalDFTSize()** , **log()** and **normalize()** .
|
||||
|
||||
Source code
|
||||
-----------
|
||||
|
||||
@add_toggle_cpp
|
||||
You can [download this from here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp) or
|
||||
find it in the
|
||||
`samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp` of the
|
||||
OpenCV source code library.
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
You can [download this from here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java) or
|
||||
find it in the
|
||||
`samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java` of the
|
||||
OpenCV source code library.
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
You can [download this from here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py) or
|
||||
find it in the
|
||||
`samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py` of the
|
||||
OpenCV source code library.
|
||||
@end_toggle
|
||||
|
||||
Here's a sample usage of **dft()** :
|
||||
|
||||
@add_toggle_cpp
|
||||
@include cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@include java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@include python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py
|
||||
@end_toggle
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
The Fourier Transform will decompose an image into its sinus and cosines components. In other words,
|
||||
it will transform an image from its spatial domain to its frequency domain. The idea is that any
|
||||
function may be approximated exactly with the sum of infinite sinus and cosines functions. The
|
||||
Fourier Transform is a way how to do this. Mathematically a two dimensional images Fourier transform
|
||||
is:
|
||||
|
||||
\f[F(k,l) = \displaystyle\sum\limits_{i=0}^{N-1}\sum\limits_{j=0}^{N-1} f(i,j)e^{-i2\pi(\frac{ki}{N}+\frac{lj}{N})}\f]\f[e^{ix} = \cos{x} + i\sin {x}\f]
|
||||
|
||||
Here f is the image value in its spatial domain and F in its frequency domain. The result of the
|
||||
transformation is complex numbers. Displaying this is possible either via a *real* image and a
|
||||
*complex* image or via a *magnitude* and a *phase* image. However, throughout the image processing
|
||||
algorithms only the *magnitude* image is interesting as this contains all the information we need
|
||||
about the images geometric structure. Nevertheless, if you intend to make some modifications of the
|
||||
image in these forms and then you need to retransform it you'll need to preserve both of these.
|
||||
|
||||
In this sample I'll show how to calculate and show the *magnitude* image of a Fourier Transform. In
|
||||
case of digital images are discrete. This means they may take up a value from a given domain value.
|
||||
For example in a basic gray scale image values usually are between zero and 255. Therefore the
|
||||
Fourier Transform too needs to be of a discrete type resulting in a Discrete Fourier Transform
|
||||
(*DFT*). You'll want to use this whenever you need to determine the structure of an image from a
|
||||
geometrical point of view. Here are the steps to follow (in case of a gray scale input image *I*):
|
||||
|
||||
#### Expand the image to an optimal size
|
||||
|
||||
The performance of a DFT is dependent of the image
|
||||
size. It tends to be the fastest for image sizes that are multiple of the numbers two, three and
|
||||
five. Therefore, to achieve maximal performance it is generally a good idea to pad border values
|
||||
to the image to get a size with such traits. The **getOptimalDFTSize()** returns this
|
||||
optimal size and we can use the **copyMakeBorder()** function to expand the borders of an
|
||||
image (the appended pixels are initialized with zero):
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp expand
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java expand
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py expand
|
||||
@end_toggle
|
||||
|
||||
#### Make place for both the complex and the real values
|
||||
|
||||
The result of a Fourier Transform is
|
||||
complex. This implies that for each image value the result is two image values (one per
|
||||
component). Moreover, the frequency domains range is much larger than its spatial counterpart.
|
||||
Therefore, we store these usually at least in a *float* format. Therefore we'll convert our
|
||||
input image to this type and expand it with another channel to hold the complex values:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp complex_and_real
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java complex_and_real
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py complex_and_real
|
||||
@end_toggle
|
||||
|
||||
#### Make the Discrete Fourier Transform
|
||||
It's possible an in-place calculation (same input as
|
||||
output):
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp dft
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java dft
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py dft
|
||||
@end_toggle
|
||||
|
||||
#### Transform the real and complex values to magnitude
|
||||
A complex number has a real (*Re*) and a
|
||||
complex (imaginary - *Im*) part. The results of a DFT are complex numbers. The magnitude of a
|
||||
DFT is:
|
||||
|
||||
\f[M = \sqrt[2]{ {Re(DFT(I))}^2 + {Im(DFT(I))}^2}\f]
|
||||
|
||||
Translated to OpenCV code:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp magnitude
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java magnitude
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py magnitude
|
||||
@end_toggle
|
||||
|
||||
#### Switch to a logarithmic scale
|
||||
It turns out that the dynamic range of the Fourier
|
||||
coefficients is too large to be displayed on the screen. We have some small and some high
|
||||
changing values that we can't observe like this. Therefore the high values will all turn out as
|
||||
white points, while the small ones as black. To use the gray scale values to for visualization
|
||||
we can transform our linear scale to a logarithmic one:
|
||||
|
||||
\f[M_1 = \log{(1 + M)}\f]
|
||||
|
||||
Translated to OpenCV code:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp log
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java log
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py log
|
||||
@end_toggle
|
||||
|
||||
#### Crop and rearrange
|
||||
Remember, that at the first step, we expanded the image? Well, it's time
|
||||
to throw away the newly introduced values. For visualization purposes we may also rearrange the
|
||||
quadrants of the result, so that the origin (zero, zero) corresponds with the image center.
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp crop_rearrange
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java crop_rearrange
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py crop_rearrange
|
||||
@end_toggle
|
||||
|
||||
#### Normalize
|
||||
This is done again for visualization purposes. We now have the magnitudes,
|
||||
however this are still out of our image display range of zero to one. We normalize our values to
|
||||
this range using the @ref cv::normalize() function.
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp normalize
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java normalize
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py normalize
|
||||
@end_toggle
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||
An application idea would be to determine the geometrical orientation present in the image. For
|
||||
example, let us find out if a text is horizontal or not? Looking at some text you'll notice that the
|
||||
text lines sort of form also horizontal lines and the letters form sort of vertical lines. These two
|
||||
main components of a text snippet may be also seen in case of the Fourier transform. Let us use
|
||||
[this horizontal ](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/imageTextN.png) and [this rotated](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/imageTextR.png)
|
||||
image about a text.
|
||||
|
||||
In case of the horizontal text:
|
||||
|
||||

|
||||
|
||||
In case of a rotated text:
|
||||
|
||||

|
||||
|
||||
You can see that the most influential components of the frequency domain (brightest dots on the
|
||||
magnitude image) follow the geometric rotation of objects on the image. From this we may calculate
|
||||
the offset and perform an image rotation to correct eventual miss alignments.
|
||||
|
After Width: | Height: | Size: 11 KiB |
|
After Width: | Height: | Size: 12 KiB |
@@ -0,0 +1,269 @@
|
||||
File Input and Output using XML and YAML files {#tutorial_file_input_output_with_xml_yml}
|
||||
==============================================
|
||||
|
||||
@prev_tutorial{tutorial_discrete_fourier_transform}
|
||||
@next_tutorial{tutorial_how_to_use_OpenCV_parallel_for_}
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
You'll find answers for the following questions:
|
||||
|
||||
- How to print and read text entries to a file and OpenCV using YAML or XML files?
|
||||
- How to do the same for OpenCV data structures?
|
||||
- How to do this for your data structures?
|
||||
- Usage of OpenCV data structures such as @ref cv::FileStorage , @ref cv::FileNode or @ref
|
||||
cv::FileNodeIterator .
|
||||
|
||||
Source code
|
||||
-----------
|
||||
|
||||
You can [download this from here
|
||||
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp) or find it in the
|
||||
`samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp` of the OpenCV source code
|
||||
library.
|
||||
|
||||
Here's a sample code of how to achieve all the stuff enumerated at the goal list.
|
||||
|
||||
@include cpp/tutorial_code/core/file_input_output/file_input_output.cpp
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
Here we talk only about XML and YAML file inputs. Your output (and its respective input) file may
|
||||
have only one of these extensions and the structure coming from this. They are two kinds of data
|
||||
structures you may serialize: *mappings* (like the STL map) and *element sequence* (like the STL
|
||||
vector). The difference between these is that in a map every element has a unique name through what
|
||||
you may access it. For sequences you need to go through them to query a specific item.
|
||||
|
||||
-# **XML/YAML File Open and Close.** Before you write any content to such file you need to open it
|
||||
and at the end to close it. The XML/YAML data structure in OpenCV is @ref cv::FileStorage . To
|
||||
specify that this structure to which file binds on your hard drive you can use either its
|
||||
constructor or the *open()* function of this:
|
||||
@code{.cpp}
|
||||
string filename = "I.xml";
|
||||
FileStorage fs(filename, FileStorage::WRITE);
|
||||
//...
|
||||
fs.open(filename, FileStorage::READ);
|
||||
@endcode
|
||||
Either one of this you use the second argument is a constant specifying the type of operations
|
||||
you'll be able to on them: WRITE, READ or APPEND. The extension specified in the file name also
|
||||
determinates the output format that will be used. The output may be even compressed if you
|
||||
specify an extension such as *.xml.gz*.
|
||||
|
||||
The file automatically closes when the @ref cv::FileStorage objects is destroyed. However, you
|
||||
may explicitly call for this by using the *release* function:
|
||||
@code{.cpp}
|
||||
fs.release(); // explicit close
|
||||
@endcode
|
||||
-# **Input and Output of text and numbers.** The data structure uses the same \<\< output operator
|
||||
that the STL library. For outputting any type of data structure we need first to specify its
|
||||
name. We do this by just simply printing out the name of this. For basic types you may follow
|
||||
this with the print of the value :
|
||||
@code{.cpp}
|
||||
fs << "iterationNr" << 100;
|
||||
@endcode
|
||||
Reading in is a simple addressing (via the [] operator) and casting operation or a read via
|
||||
the \>\> operator :
|
||||
@code{.cpp}
|
||||
int itNr;
|
||||
fs["iterationNr"] >> itNr;
|
||||
itNr = (int) fs["iterationNr"];
|
||||
@endcode
|
||||
-# **Input/Output of OpenCV Data structures.** Well these behave exactly just as the basic C++
|
||||
types:
|
||||
@code{.cpp}
|
||||
Mat R = Mat_<uchar >::eye (3, 3),
|
||||
T = Mat_<double>::zeros(3, 1);
|
||||
|
||||
fs << "R" << R; // Write cv::Mat
|
||||
fs << "T" << T;
|
||||
|
||||
fs["R"] >> R; // Read cv::Mat
|
||||
fs["T"] >> T;
|
||||
@endcode
|
||||
-# **Input/Output of vectors (arrays) and associative maps.** As I mentioned beforehand, we can
|
||||
output maps and sequences (array, vector) too. Again we first print the name of the variable and
|
||||
then we have to specify if our output is either a sequence or map.
|
||||
|
||||
For sequence before the first element print the "[" character and after the last one the "]"
|
||||
character:
|
||||
@code{.cpp}
|
||||
fs << "strings" << "["; // text - string sequence
|
||||
fs << "image1.jpg" << "Awesomeness" << "baboon.jpg";
|
||||
fs << "]"; // close sequence
|
||||
@endcode
|
||||
For maps the drill is the same however now we use the "{" and "}" delimiter characters:
|
||||
@code{.cpp}
|
||||
fs << "Mapping"; // text - mapping
|
||||
fs << "{" << "One" << 1;
|
||||
fs << "Two" << 2 << "}";
|
||||
@endcode
|
||||
To read from these we use the @ref cv::FileNode and the @ref cv::FileNodeIterator data
|
||||
structures. The [] operator of the @ref cv::FileStorage class returns a @ref cv::FileNode data
|
||||
type. If the node is sequential we can use the @ref cv::FileNodeIterator to iterate through the
|
||||
items:
|
||||
@code{.cpp}
|
||||
FileNode n = fs["strings"]; // Read string sequence - Get node
|
||||
if (n.type() != FileNode::SEQ)
|
||||
{
|
||||
cerr << "strings is not a sequence! FAIL" << endl;
|
||||
return 1;
|
||||
}
|
||||
|
||||
FileNodeIterator it = n.begin(), it_end = n.end(); // Go through the node
|
||||
for (; it != it_end; ++it)
|
||||
cout << (string)*it << endl;
|
||||
@endcode
|
||||
For maps you can use the [] operator again to access the given item (or the \>\> operator too):
|
||||
@code{.cpp}
|
||||
n = fs["Mapping"]; // Read mappings from a sequence
|
||||
cout << "Two " << (int)(n["Two"]) << "; ";
|
||||
cout << "One " << (int)(n["One"]) << endl << endl;
|
||||
@endcode
|
||||
-# **Read and write your own data structures.** Suppose you have a data structure such as:
|
||||
@code{.cpp}
|
||||
class MyData
|
||||
{
|
||||
public:
|
||||
MyData() : A(0), X(0), id() {}
|
||||
public: // Data Members
|
||||
int A;
|
||||
double X;
|
||||
string id;
|
||||
};
|
||||
@endcode
|
||||
It's possible to serialize this through the OpenCV I/O XML/YAML interface (just as in case of
|
||||
the OpenCV data structures) by adding a read and a write function inside and outside of your
|
||||
class. For the inside part:
|
||||
@code{.cpp}
|
||||
void write(FileStorage& fs) const //Write serialization for this class
|
||||
{
|
||||
fs << "{" << "A" << A << "X" << X << "id" << id << "}";
|
||||
}
|
||||
|
||||
void read(const FileNode& node) //Read serialization for this class
|
||||
{
|
||||
A = (int)node["A"];
|
||||
X = (double)node["X"];
|
||||
id = (string)node["id"];
|
||||
}
|
||||
@endcode
|
||||
Then you need to add the following functions definitions outside the class:
|
||||
@code{.cpp}
|
||||
void write(FileStorage& fs, const std::string&, const MyData& x)
|
||||
{
|
||||
x.write(fs);
|
||||
}
|
||||
|
||||
void read(const FileNode& node, MyData& x, const MyData& default_value = MyData())
|
||||
{
|
||||
if(node.empty())
|
||||
x = default_value;
|
||||
else
|
||||
x.read(node);
|
||||
}
|
||||
@endcode
|
||||
Here you can observe that in the read section we defined what happens if the user tries to read
|
||||
a non-existing node. In this case we just return the default initialization value, however a
|
||||
more verbose solution would be to return for instance a minus one value for an object ID.
|
||||
|
||||
Once you added these four functions use the \>\> operator for write and the \<\< operator for
|
||||
read:
|
||||
@code{.cpp}
|
||||
MyData m(1);
|
||||
fs << "MyData" << m; // your own data structures
|
||||
fs["MyData"] >> m; // Read your own structure_
|
||||
@endcode
|
||||
Or to try out reading a non-existing read:
|
||||
@code{.cpp}
|
||||
fs["NonExisting"] >> m; // Do not add a fs << "NonExisting" << m command for this to work
|
||||
cout << endl << "NonExisting = " << endl << m << endl;
|
||||
@endcode
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||
Well mostly we just print out the defined numbers. On the screen of your console you could see:
|
||||
@code{.bash}
|
||||
Write Done.
|
||||
|
||||
Reading:
|
||||
100image1.jpg
|
||||
Awesomeness
|
||||
baboon.jpg
|
||||
Two 2; One 1
|
||||
|
||||
|
||||
R = [1, 0, 0;
|
||||
0, 1, 0;
|
||||
0, 0, 1]
|
||||
T = [0; 0; 0]
|
||||
|
||||
MyData =
|
||||
{ id = mydata1234, X = 3.14159, A = 97}
|
||||
|
||||
Attempt to read NonExisting (should initialize the data structure with its default).
|
||||
NonExisting =
|
||||
{ id = , X = 0, A = 0}
|
||||
|
||||
Tip: Open up output.xml with a text editor to see the serialized data.
|
||||
@endcode
|
||||
Nevertheless, it's much more interesting what you may see in the output xml file:
|
||||
@code{.xml}
|
||||
<?xml version="1.0"?>
|
||||
<opencv_storage>
|
||||
<iterationNr>100</iterationNr>
|
||||
<strings>
|
||||
image1.jpg Awesomeness baboon.jpg</strings>
|
||||
<Mapping>
|
||||
<One>1</One>
|
||||
<Two>2</Two></Mapping>
|
||||
<R type_id="opencv-matrix">
|
||||
<rows>3</rows>
|
||||
<cols>3</cols>
|
||||
<dt>u</dt>
|
||||
<data>
|
||||
1 0 0 0 1 0 0 0 1</data></R>
|
||||
<T type_id="opencv-matrix">
|
||||
<rows>3</rows>
|
||||
<cols>1</cols>
|
||||
<dt>d</dt>
|
||||
<data>
|
||||
0. 0. 0.</data></T>
|
||||
<MyData>
|
||||
<A>97</A>
|
||||
<X>3.1415926535897931e+000</X>
|
||||
<id>mydata1234</id></MyData>
|
||||
</opencv_storage>
|
||||
@endcode
|
||||
Or the YAML file:
|
||||
@code{.yaml}
|
||||
%YAML:1.0
|
||||
iterationNr: 100
|
||||
strings:
|
||||
- "image1.jpg"
|
||||
- Awesomeness
|
||||
- "baboon.jpg"
|
||||
Mapping:
|
||||
One: 1
|
||||
Two: 2
|
||||
R: !!opencv-matrix
|
||||
rows: 3
|
||||
cols: 3
|
||||
dt: u
|
||||
data: [ 1, 0, 0, 0, 1, 0, 0, 0, 1 ]
|
||||
T: !!opencv-matrix
|
||||
rows: 3
|
||||
cols: 1
|
||||
dt: d
|
||||
data: [ 0., 0., 0. ]
|
||||
MyData:
|
||||
A: 97
|
||||
X: 3.1415926535897931e+000
|
||||
id: mydata1234
|
||||
@endcode
|
||||
You may observe a runtime instance of this on the [YouTube
|
||||
here](https://www.youtube.com/watch?v=A4yqVnByMMM) .
|
||||
|
||||
@youtube{A4yqVnByMMM}
|
||||
@@ -0,0 +1,220 @@
|
||||
How to scan images, lookup tables and time measurement with OpenCV {#tutorial_how_to_scan_images}
|
||||
==================================================================
|
||||
|
||||
@prev_tutorial{tutorial_mat_the_basic_image_container}
|
||||
@next_tutorial{tutorial_mat_mask_operations}
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
We'll seek answers for the following questions:
|
||||
|
||||
- How to go through each and every pixel of an image?
|
||||
- How are OpenCV matrix values stored?
|
||||
- How to measure the performance of our algorithm?
|
||||
- What are lookup tables and why use them?
|
||||
|
||||
Our test case
|
||||
-------------
|
||||
|
||||
Let us consider a simple color reduction method. By using the unsigned char C and C++ type for
|
||||
matrix item storing, a channel of pixel may have up to 256 different values. For a three channel
|
||||
image this can allow the formation of way too many colors (16 million to be exact). Working with so
|
||||
many color shades may give a heavy blow to our algorithm performance. However, sometimes it is
|
||||
enough to work with a lot less of them to get the same final result.
|
||||
|
||||
In this cases it's common that we make a *color space reduction*. This means that we divide the
|
||||
color space current value with a new input value to end up with fewer colors. For instance every
|
||||
value between zero and nine takes the new value zero, every value between ten and nineteen the value
|
||||
ten and so on.
|
||||
|
||||
When you divide an *uchar* (unsigned char - aka values between zero and 255) value with an *int*
|
||||
value the result will be also *char*. These values may only be char values. Therefore, any fraction
|
||||
will be rounded down. Taking advantage of this fact the upper operation in the *uchar* domain may be
|
||||
expressed as:
|
||||
|
||||
\f[I_{new} = (\frac{I_{old}}{10}) * 10\f]
|
||||
|
||||
A simple color space reduction algorithm would consist of just passing through every pixel of an
|
||||
image matrix and applying this formula. It's worth noting that we do a divide and a multiplication
|
||||
operation. These operations are bloody expensive for a system. If possible it's worth avoiding them
|
||||
by using cheaper operations such as a few subtractions, addition or in best case a simple
|
||||
assignment. Furthermore, note that we only have a limited number of input values for the upper
|
||||
operation. In case of the *uchar* system this is 256 to be exact.
|
||||
|
||||
Therefore, for larger images it would be wise to calculate all possible values beforehand and during
|
||||
the assignment just make the assignment, by using a lookup table. Lookup tables are simple arrays
|
||||
(having one or more dimensions) that for a given input value variation holds the final output value.
|
||||
Its strength is that we do not need to make the calculation, we just need to read the result.
|
||||
|
||||
Our test case program (and the code sample below) will do the following: read in an image passed
|
||||
as a command line argument (it may be either color or grayscale) and apply the reduction
|
||||
with the given command line argument integer value. In OpenCV, at the moment there are
|
||||
three major ways of going through an image pixel by pixel. To make things a little more interesting
|
||||
we'll make the scanning of the image using each of these methods, and print out how long it took.
|
||||
|
||||
You can download the full source code [here
|
||||
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/how_to_scan_images/how_to_scan_images.cpp) or look it up in
|
||||
the samples directory of OpenCV at the cpp tutorial code for the core section. Its basic usage is:
|
||||
@code{.bash}
|
||||
how_to_scan_images imageName.jpg intValueToReduce [G]
|
||||
@endcode
|
||||
The final argument is optional. If given the image will be loaded in grayscale format, otherwise
|
||||
the BGR color space is used. The first thing is to calculate the lookup table.
|
||||
|
||||
@snippet how_to_scan_images.cpp dividewith
|
||||
|
||||
Here we first use the C++ *stringstream* class to convert the third command line argument from text
|
||||
to an integer format. Then we use a simple look and the upper formula to calculate the lookup table.
|
||||
No OpenCV specific stuff here.
|
||||
|
||||
Another issue is how do we measure time? Well OpenCV offers two simple functions to achieve this
|
||||
cv::getTickCount() and cv::getTickFrequency() . The first returns the number of ticks of
|
||||
your systems CPU from a certain event (like since you booted your system). The second returns how
|
||||
many times your CPU emits a tick during a second. So, measuring amount of time elapsed between
|
||||
two operations is as easy as:
|
||||
@code{.cpp}
|
||||
double t = (double)getTickCount();
|
||||
// do something ...
|
||||
t = ((double)getTickCount() - t)/getTickFrequency();
|
||||
cout << "Times passed in seconds: " << t << endl;
|
||||
@endcode
|
||||
|
||||
@anchor tutorial_how_to_scan_images_storing
|
||||
How is the image matrix stored in memory?
|
||||
-----------------------------------------
|
||||
|
||||
As you could already read in my @ref tutorial_mat_the_basic_image_container tutorial the size of the matrix
|
||||
depends on the color system used. More accurately, it depends on the number of channels used. In
|
||||
case of a grayscale image we have something like:
|
||||
|
||||

|
||||
|
||||
For multichannel images the columns contain as many sub columns as the number of channels. For
|
||||
example in case of an BGR color system:
|
||||
|
||||

|
||||
|
||||
Note that the order of the channels is inverse: BGR instead of RGB. Because in many cases the memory
|
||||
is large enough to store the rows in a successive fashion the rows may follow one after another,
|
||||
creating a single long row. Because everything is in a single place following one after another this
|
||||
may help to speed up the scanning process. We can use the cv::Mat::isContinuous() function to *ask*
|
||||
the matrix if this is the case. Continue on to the next section to find an example.
|
||||
|
||||
The efficient way
|
||||
-----------------
|
||||
|
||||
When it comes to performance you cannot beat the classic C style operator[] (pointer) access.
|
||||
Therefore, the most efficient method we can recommend for making the assignment is:
|
||||
|
||||
@snippet how_to_scan_images.cpp scan-c
|
||||
|
||||
Here we basically just acquire a pointer to the start of each row and go through it until it ends.
|
||||
In the special case that the matrix is stored in a continuous manner we only need to request the
|
||||
pointer a single time and go all the way to the end. We need to look out for color images: we have
|
||||
three channels so we need to pass through three times more items in each row.
|
||||
|
||||
There's another way of this. The *data* data member of a *Mat* object returns the pointer to the
|
||||
first row, first column. If this pointer is null you have no valid input in that object. Checking
|
||||
this is the simplest method to check if your image loading was a success. In case the storage is
|
||||
continuous we can use this to go through the whole data pointer. In case of a grayscale image this
|
||||
would look like:
|
||||
@code{.cpp}
|
||||
uchar* p = I.data;
|
||||
|
||||
for( unsigned int i = 0; i < ncol*nrows; ++i)
|
||||
*p++ = table[*p];
|
||||
@endcode
|
||||
You would get the same result. However, this code is a lot harder to read later on. It gets even
|
||||
harder if you have some more advanced technique there. Moreover, in practice I've observed you'll
|
||||
get the same performance result (as most of the modern compilers will probably make this small
|
||||
optimization trick automatically for you).
|
||||
|
||||
The iterator (safe) method
|
||||
--------------------------
|
||||
|
||||
In case of the efficient way making sure that you pass through the right amount of *uchar* fields
|
||||
and to skip the gaps that may occur between the rows was your responsibility. The iterator method is
|
||||
considered a safer way as it takes over these tasks from the user. All you need to do is to ask the
|
||||
begin and the end of the image matrix and then just increase the begin iterator until you reach the
|
||||
end. To acquire the value *pointed* by the iterator use the \* operator (add it before it).
|
||||
|
||||
@snippet how_to_scan_images.cpp scan-iterator
|
||||
|
||||
In case of color images we have three uchar items per column. This may be considered a short vector
|
||||
of uchar items, that has been baptized in OpenCV with the *Vec3b* name. To access the n-th sub
|
||||
column we use simple operator[] access. It's important to remember that OpenCV iterators go through
|
||||
the columns and automatically skip to the next row. Therefore in case of color images if you use a
|
||||
simple *uchar* iterator you'll be able to access only the blue channel values.
|
||||
|
||||
On-the-fly address calculation with reference returning
|
||||
-------------------------------------------------------
|
||||
|
||||
The final method isn't recommended for scanning. It was made to acquire or modify somehow random
|
||||
elements in the image. Its basic usage is to specify the row and column number of the item you want
|
||||
to access. During our earlier scanning methods you could already notice that it is important through
|
||||
what type we are looking at the image. It's no different here as you need to manually specify what
|
||||
type to use at the automatic lookup. You can observe this in case of the grayscale images for the
|
||||
following source code (the usage of the + cv::Mat::at() function):
|
||||
|
||||
@snippet how_to_scan_images.cpp scan-random
|
||||
|
||||
The function takes your input type and coordinates and calculates the address of the
|
||||
queried item. Then returns a reference to that. This may be a constant when you *get* the value and
|
||||
non-constant when you *set* the value. As a safety step in **debug mode only**\* there is a check
|
||||
performed that your input coordinates are valid and do exist. If this isn't the case you'll get a
|
||||
nice output message of this on the standard error output stream. Compared to the efficient way in
|
||||
release mode the only difference in using this is that for every element of the image you'll get a
|
||||
new row pointer for what we use the C operator[] to acquire the column element.
|
||||
|
||||
If you need to do multiple lookups using this method for an image it may be troublesome and time
|
||||
consuming to enter the type and the at keyword for each of the accesses. To solve this problem
|
||||
OpenCV has a cv::Mat_ data type. It's the same as Mat with the extra need that at definition
|
||||
you need to specify the data type through what to look at the data matrix, however in return you can
|
||||
use the operator() for fast access of items. To make things even better this is easily convertible
|
||||
from and to the usual cv::Mat data type. A sample usage of this you can see in case of the
|
||||
color images of the function above. Nevertheless, it's important to note that the same operation
|
||||
(with the same runtime speed) could have been done with the cv::Mat::at function. It's just a less
|
||||
to write for the lazy programmer trick.
|
||||
|
||||
The Core Function
|
||||
-----------------
|
||||
|
||||
This is a bonus method of achieving lookup table modification in an image. In image
|
||||
processing it's quite common that you want to modify all of a given image values to some other value.
|
||||
OpenCV provides a function for modifying image values, without the need to write the scanning logic
|
||||
of the image. We use the cv::LUT() function of the core module. First we build a Mat type of the
|
||||
lookup table:
|
||||
|
||||
@snippet how_to_scan_images.cpp table-init
|
||||
|
||||
Finally call the function (I is our input image and J the output one):
|
||||
|
||||
@snippet how_to_scan_images.cpp table-use
|
||||
|
||||
Performance Difference
|
||||
----------------------
|
||||
|
||||
For the best result compile the program and run it yourself. To make the differences more
|
||||
clear, I've used a quite large (2560 X 1600) image. The performance presented here are for
|
||||
color images. For a more accurate value I've averaged the value I got from the call of the function
|
||||
for hundred times.
|
||||
|
||||
Method | Time
|
||||
--------------- | ----------------------
|
||||
Efficient Way | 79.4717 milliseconds
|
||||
Iterator | 83.7201 milliseconds
|
||||
On-The-Fly RA | 93.7878 milliseconds
|
||||
LUT function | 32.5759 milliseconds
|
||||
|
||||
We can conclude a couple of things. If possible, use the already made functions of OpenCV (instead
|
||||
of reinventing these). The fastest method turns out to be the LUT function. This is because the OpenCV
|
||||
library is multi-thread enabled via Intel Threaded Building Blocks. However, if you need to write a
|
||||
simple image scan prefer the pointer method. The iterator is a safer bet, however quite slower.
|
||||
Using the on-the-fly reference access method for full image scan is the most costly in debug mode.
|
||||
In the release mode it may beat the iterator approach or not, however it surely sacrifices for this
|
||||
the safety trait of iterators.
|
||||
|
||||
Finally, you may watch a sample run of the program on the [video posted](https://www.youtube.com/watch?v=fB3AN5fjgwc) on our YouTube channel.
|
||||
|
||||
@youtube{fB3AN5fjgwc}
|
||||
|
After Width: | Height: | Size: 1.9 KiB |
|
After Width: | Height: | Size: 3.8 KiB |
@@ -0,0 +1,190 @@
|
||||
How to use the OpenCV parallel_for_ to parallelize your code {#tutorial_how_to_use_OpenCV_parallel_for_}
|
||||
==================================================================
|
||||
|
||||
@prev_tutorial{tutorial_file_input_output_with_xml_yml}
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily
|
||||
parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set
|
||||
exploiting almost all the CPU load available.
|
||||
The full tutorial code is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
|
||||
If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended
|
||||
to remain simple.
|
||||
|
||||
Precondition
|
||||
----
|
||||
|
||||
The first precondition is to have OpenCV built with a parallel framework.
|
||||
In OpenCV 3.2, the following parallel frameworks are available in that order:
|
||||
1. Intel Threading Building Blocks (3rdparty library, should be explicitly enabled)
|
||||
2. C= Parallel C/C++ Programming Language Extension (3rdparty library, should be explicitly enabled)
|
||||
3. OpenMP (integrated to compiler, should be explicitly enabled)
|
||||
4. APPLE GCD (system wide, used automatically (APPLE only))
|
||||
5. Windows RT concurrency (system wide, used automatically (Windows RT only))
|
||||
6. Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10))
|
||||
7. Pthreads (if available)
|
||||
|
||||
As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries
|
||||
are third party libraries and have to be explicitly built and enabled in CMake (e.g. TBB, C=), others are
|
||||
automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to
|
||||
have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library.
|
||||
|
||||
The second (weak) precondition is more related to the task you want to achieve as not all computations
|
||||
are suitable / can be adatapted to be run in a parallel way. To remain simple, tasks that can be split
|
||||
into multiple elementary operations with no memory dependency (no possible race condition) are easily
|
||||
parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of
|
||||
one pixel does not depend to the state of other pixels.
|
||||
|
||||
Simple example: drawing a Mandelbrot set
|
||||
----
|
||||
|
||||
We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt
|
||||
the code to parallelize the computation.
|
||||
|
||||
Theory
|
||||
-----------
|
||||
|
||||
The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician
|
||||
Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a
|
||||
class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a
|
||||
Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth
|
||||
introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set).
|
||||
Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article).
|
||||
|
||||
> The Mandelbrot set is the set of values of \f$ c \f$ in the complex plane for which the orbit of 0 under iteration
|
||||
> of the quadratic map
|
||||
> \f[\begin{cases} z_0 = 0 \\ z_{n+1} = z_n^2 + c \end{cases}\f]
|
||||
> remains bounded.
|
||||
> That is, a complex number \f$ c \f$ is part of the Mandelbrot set if, when starting with \f$ z_0 = 0 \f$ and applying
|
||||
> the iteration repeatedly, the absolute value of \f$ z_n \f$ remains bounded however large \f$ n \f$ gets.
|
||||
> This can also be represented as
|
||||
> \f[\limsup_{n\to\infty}|z_{n+1}|\leqslant2\f]
|
||||
|
||||
Pseudocode
|
||||
-----------
|
||||
|
||||
A simple algorithm to generate a representation of the Mandelbrot set is called the
|
||||
["escape time algorithm"](https://en.wikipedia.org/wiki/Mandelbrot_set#Escape_time_algorithm).
|
||||
For each pixel in the rendered image, we test using the recurrence relation if the complex number is bounded or not
|
||||
under a maximum number of iterations. Pixels that do not belong to the Mandelbrot set will escape quickly whereas
|
||||
we assume that the pixel is in the set after a fixed maximum number of iterations. A high value of iterations will
|
||||
produce a more detailed image but the computation time will increase accordingly. We use the number of iterations
|
||||
needed to "escape" to depict the pixel value in the image.
|
||||
|
||||
```
|
||||
For each pixel (Px, Py) on the screen, do:
|
||||
{
|
||||
x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2, 1))
|
||||
y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1))
|
||||
x = 0.0
|
||||
y = 0.0
|
||||
iteration = 0
|
||||
max_iteration = 1000
|
||||
while (x*x + y*y < 2*2 AND iteration < max_iteration) {
|
||||
xtemp = x*x - y*y + x0
|
||||
y = 2*x*y + y0
|
||||
x = xtemp
|
||||
iteration = iteration + 1
|
||||
}
|
||||
color = palette[iteration]
|
||||
plot(Px, Py, color)
|
||||
}
|
||||
```
|
||||
|
||||
To relate between the pseudocode and the theory, we have:
|
||||
* \f$ z = x + iy \f$
|
||||
* \f$ z^2 = x^2 + i2xy - y^2 \f$
|
||||
* \f$ c = x_0 + iy_0 \f$
|
||||
|
||||

|
||||
|
||||
On this figure, we recall that the real part of a complex number is on the x-axis and the imaginary part on the y-axis.
|
||||
You can see that the whole shape can be repeatedly visible if we zoom at particular locations.
|
||||
|
||||
Implementation
|
||||
-----------
|
||||
|
||||
Escape time algorithm implementation
|
||||
--------------------------
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-escape-time-algorithm
|
||||
|
||||
Here, we used the [`std::complex`](http://en.cppreference.com/w/cpp/numeric/complex) template class to represent a
|
||||
complex number. This function performs the test to check if the pixel is in set or not and returns the "escaped" iteration.
|
||||
|
||||
Sequential Mandelbrot implementation
|
||||
--------------------------
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-sequential
|
||||
|
||||
In this implementation, we sequentially iterate over the pixels in the rendered image to perform the test to check if the
|
||||
pixel is likely to belong to the Mandelbrot set or not.
|
||||
|
||||
Another thing to do is to transform the pixel coordinate into the Mandelbrot set space with:
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-transformation
|
||||
|
||||
Finally, to assign the grayscale value to the pixels, we use the following rule:
|
||||
* a pixel is black if it reaches the maximum number of iterations (pixel is assumed to be in the Mandelbrot set),
|
||||
* otherwise we assign a grayscale value depending on the escaped iteration and scaled to fit the grayscale range.
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-grayscale-value
|
||||
|
||||
Using a linear scale transformation is not enough to perceive the grayscale variation. To overcome this, we will boost
|
||||
the perception by using a square root scale transformation (borrowed from Jeremy D. Frens in his
|
||||
[blog post](http://www.programming-during-recess.net/2016/06/26/color-schemes-for-mandelbrot-sets/)):
|
||||
\f$ f \left( x \right) = \sqrt{\frac{x}{\text{maxIter}}} \times 255 \f$
|
||||
|
||||

|
||||
|
||||
The green curve corresponds to a simple linear scale transformation, the blue one to a square root scale transformation
|
||||
and you can observe how the lowest values will be boosted when looking at the slope at these positions.
|
||||
|
||||
Parallel Mandelbrot implementation
|
||||
--------------------------
|
||||
|
||||
When looking at the sequential implementation, we can notice that each pixel is computed independently. To optimize the
|
||||
computation, we can perform multiple pixel calculations in parallel, by exploiting the multi-core architecture of modern
|
||||
processor. To achieve this easily, we will use the OpenCV @ref cv::parallel_for_ framework.
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel
|
||||
|
||||
The first thing is to declare a custom class that inherits from @ref cv::ParallelLoopBody and to override the
|
||||
`virtual void operator ()(const cv::Range& range) const`.
|
||||
|
||||
The range in the `operator ()` represents the subset of pixels that will be treated by an individual thread.
|
||||
This splitting is done automatically to distribute equally the computation load. We have to convert the pixel index coordinate
|
||||
to a 2D `[row, col]` coordinate. Also note that we have to keep a reference on the mat image to be able to modify in-place
|
||||
the image.
|
||||
|
||||
The parallel execution is called with:
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call
|
||||
|
||||
Here, the range represents the total number of operations to be executed, so the total number of pixels in the image.
|
||||
To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the
|
||||
nstripes parameter in @ref cv::parallel_for_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)`
|
||||
or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the
|
||||
workload only on two threads.
|
||||
|
||||
@note
|
||||
C++ 11 standard allows to simplify the parallel implementation by get rid of the `ParallelMandelbrot` class and replacing it with lambda expression:
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call-cxx11
|
||||
|
||||
Results
|
||||
-----------
|
||||
|
||||
You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
|
||||
The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads
|
||||
CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X.
|
||||
Main reasons should be mostly due to:
|
||||
* the overhead to create and manage the threads,
|
||||
* background processes running in parallel,
|
||||
* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores.
|
||||
|
||||
The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color
|
||||
depending on the escaped iteration and using a color palette to get more aesthetic images):
|
||||

|
||||
|
After Width: | Height: | Size: 16 KiB |
|
After Width: | Height: | Size: 62 KiB |
|
After Width: | Height: | Size: 33 KiB |
|
After Width: | Height: | Size: 6.9 KiB |
|
After Width: | Height: | Size: 4.1 KiB |
|
After Width: | Height: | Size: 2.7 KiB |
|
After Width: | Height: | Size: 5.0 KiB |
|
After Width: | Height: | Size: 5.3 KiB |
|
After Width: | Height: | Size: 3.8 KiB |
|
After Width: | Height: | Size: 6.2 KiB |
|
After Width: | Height: | Size: 5.4 KiB |
BIN
Lib/opencv/sources/doc/tutorials/core/images/howToScanImages.jpg
Normal file
|
After Width: | Height: | Size: 6.8 KiB |
BIN
Lib/opencv/sources/doc/tutorials/core/images/interopOpenCV1.png
Normal file
|
After Width: | Height: | Size: 12 KiB |
|
After Width: | Height: | Size: 8.9 KiB |
|
After Width: | Height: | Size: 6.1 KiB |
|
After Width: | Height: | Size: 26 KiB |
@@ -0,0 +1,194 @@
|
||||
Mask operations on matrices {#tutorial_mat_mask_operations}
|
||||
===========================
|
||||
|
||||
@prev_tutorial{tutorial_how_to_scan_images}
|
||||
@next_tutorial{tutorial_mat_operations}
|
||||
|
||||
Mask operations on matrices are quite simple. The idea is that we recalculate each pixel's value in
|
||||
an image according to a mask matrix (also known as kernel). This mask holds values that will adjust
|
||||
how much influence neighboring pixels (and the current pixel) have on the new pixel value. From a
|
||||
mathematical point of view we make a weighted average, with our specified values.
|
||||
|
||||
Our test case
|
||||
-------------
|
||||
|
||||
Let's consider the issue of an image contrast enhancement method. Basically we want to apply for
|
||||
every pixel of the image the following formula:
|
||||
|
||||
\f[I(i,j) = 5*I(i,j) - [ I(i-1,j) + I(i+1,j) + I(i,j-1) + I(i,j+1)]\f]\f[\iff I(i,j)*M, \text{where }
|
||||
M = \bordermatrix{ _i\backslash ^j & -1 & 0 & +1 \cr
|
||||
-1 & 0 & -1 & 0 \cr
|
||||
0 & -1 & 5 & -1 \cr
|
||||
+1 & 0 & -1 & 0 \cr
|
||||
}\f]
|
||||
|
||||
The first notation is by using a formula, while the second is a compacted version of the first by
|
||||
using a mask. You use the mask by putting the center of the mask matrix (in the upper case noted by
|
||||
the zero-zero index) on the pixel you want to calculate and sum up the pixel values multiplied with
|
||||
the overlapped matrix values. It's the same thing, however in case of large matrices the latter
|
||||
notation is a lot easier to look over.
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
@add_toggle_cpp
|
||||
You can download this source code from [here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp) or look in the
|
||||
OpenCV source code libraries sample directory at
|
||||
`samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp`.
|
||||
@include samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
You can download this source code from [here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java) or look in the
|
||||
OpenCV source code libraries sample directory at
|
||||
`samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java`.
|
||||
@include samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
You can download this source code from [here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py) or look in the
|
||||
OpenCV source code libraries sample directory at
|
||||
`samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py`.
|
||||
@include samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py
|
||||
@end_toggle
|
||||
|
||||
The Basic Method
|
||||
----------------
|
||||
|
||||
Now let us see how we can make this happen by using the basic pixel access method or by using the
|
||||
**filter2D()** function.
|
||||
|
||||
Here's a function that will do this:
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp basic_method
|
||||
|
||||
At first we make sure that the input images data is in unsigned char format. For this we use the
|
||||
@ref cv::CV_Assert function that throws an error when the expression inside it is false.
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp 8_bit
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java basic_method
|
||||
|
||||
At first we make sure that the input images data in unsigned 8 bit format.
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java 8_bit
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py basic_method
|
||||
|
||||
At first we make sure that the input images data in unsigned 8 bit format.
|
||||
@code{.py}
|
||||
my_image = cv.cvtColor(my_image, cv.CV_8U)
|
||||
@endcode
|
||||
|
||||
@end_toggle
|
||||
|
||||
We create an output image with the same size and the same type as our input. As you can see in the
|
||||
@ref tutorial_how_to_scan_images_storing "storing" section, depending on the number of channels we may have one or more
|
||||
subcolumns.
|
||||
|
||||
@add_toggle_cpp
|
||||
We will iterate through them via pointers so the total number of elements depends on
|
||||
this number.
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp create_channels
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java create_channels
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@code{.py}
|
||||
height, width, n_channels = my_image.shape
|
||||
result = np.zeros(my_image.shape, my_image.dtype)
|
||||
@endcode
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_cpp
|
||||
We'll use the plain C [] operator to access pixels. Because we need to access multiple rows at the
|
||||
same time we'll acquire the pointers for each of them (a previous, a current and a next line). We
|
||||
need another pointer to where we're going to save the calculation. Then simply access the right
|
||||
items with the [] operator. For moving the output pointer ahead we simply increase this (with one
|
||||
byte) after each operation:
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp basic_method_loop
|
||||
|
||||
On the borders of the image the upper notation results inexistent pixel locations (like minus one -
|
||||
minus one). In these points our formula is undefined. A simple solution is to not apply the kernel
|
||||
in these points and, for example, set the pixels on the borders to zeros:
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp borders
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j).
|
||||
Then we apply the sum and put the new value in the Result matrix.
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java basic_method_loop
|
||||
|
||||
On the borders of the image the upper notation results in inexistent pixel locations (like (-1,-1)).
|
||||
In these points our formula is undefined. A simple solution is to not apply the kernel
|
||||
in these points and, for example, set the pixels on the borders to zeros:
|
||||
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java borders
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j).
|
||||
Then we apply the sum and put the new value in the Result matrix.
|
||||
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py basic_method_loop
|
||||
@end_toggle
|
||||
|
||||
The filter2D function
|
||||
---------------------
|
||||
|
||||
Applying such filters are so common in image processing that in OpenCV there is a function that
|
||||
will take care of applying the mask (also called a kernel in some places). For this you first need
|
||||
to define an object that holds the mask:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp kern
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java kern
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py kern
|
||||
@end_toggle
|
||||
|
||||
Then call the **filter2D()** function specifying the input, the output image and the kernel to
|
||||
use:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp filter2D
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java filter2D
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py filter2D
|
||||
@end_toggle
|
||||
|
||||
The function even has a fifth optional argument to specify the center of the kernel, a sixth
|
||||
for adding an optional value to the filtered pixels before storing them in K and a seventh one
|
||||
for determining what to do in the regions where the operation is undefined (borders).
|
||||
|
||||
This function is shorter, less verbose and, because there are some optimizations, it is usually faster
|
||||
than the *hand-coded method*. For example in my test while the second one took only 13
|
||||
milliseconds the first took around 31 milliseconds. Quite some difference.
|
||||
|
||||
For example:
|
||||
|
||||

|
||||
|
||||
@add_toggle_cpp
|
||||
Check out an instance of running the program on our [YouTube
|
||||
channel](http://www.youtube.com/watch?v=7PF1tAU9se4) .
|
||||
@youtube{7PF1tAU9se4}
|
||||
@end_toggle
|
||||
264
Lib/opencv/sources/doc/tutorials/core/mat_operations.markdown
Normal file
@@ -0,0 +1,264 @@
|
||||
Operations with images {#tutorial_mat_operations}
|
||||
======================
|
||||
|
||||
@prev_tutorial{tutorial_mat_mask_operations}
|
||||
@next_tutorial{tutorial_adding_images}
|
||||
|
||||
Input/Output
|
||||
------------
|
||||
|
||||
### Images
|
||||
|
||||
Load an image from a file:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Load an image from a file
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Load an image from a file
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Load an image from a file
|
||||
@end_toggle
|
||||
|
||||
If you read a jpg file, a 3 channel image is created by default. If you need a grayscale image, use:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Load an image from a file in grayscale
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Load an image from a file in grayscale
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Load an image from a file in grayscale
|
||||
@end_toggle
|
||||
|
||||
@note Format of the file is determined by its content (first few bytes). To save an image to a file:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Save image
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Save image
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Save image
|
||||
@end_toggle
|
||||
|
||||
@note Format of the file is determined by its extension.
|
||||
|
||||
@note Use cv::imdecode and cv::imencode to read and write an image from/to memory rather than a file.
|
||||
|
||||
Basic operations with images
|
||||
----------------------------
|
||||
|
||||
### Accessing pixel intensity values
|
||||
|
||||
In order to get pixel intensity value, you have to know the type of an image and the number of
|
||||
channels. Here is an example for a single channel grey scale image (type 8UC1) and pixel coordinates
|
||||
x and y:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 1
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Pixel access 1
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 1
|
||||
@end_toggle
|
||||
|
||||
C++ version only:
|
||||
intensity.val[0] contains a value from 0 to 255. Note the ordering of x and y. Since in OpenCV
|
||||
images are represented by the same structure as matrices, we use the same convention for both
|
||||
cases - the 0-based row index (or y-coordinate) goes first and the 0-based column index (or
|
||||
x-coordinate) follows it. Alternatively, you can use the following notation (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 2
|
||||
|
||||
Now let us consider a 3 channel image with BGR color ordering (the default format returned by
|
||||
imread):
|
||||
|
||||
**C++ code**
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 3
|
||||
|
||||
**Python Python**
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 3
|
||||
|
||||
You can use the same method for floating-point images (for example, you can get such an image by
|
||||
running Sobel on a 3 channel image) (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 4
|
||||
|
||||
The same method can be used to change pixel intensities:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 5
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Pixel access 5
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 5
|
||||
@end_toggle
|
||||
|
||||
There are functions in OpenCV, especially from calib3d module, such as cv::projectPoints, that take an
|
||||
array of 2D or 3D points in the form of Mat. Matrix should contain exactly one column, each row
|
||||
corresponds to a point, matrix type should be 32FC2 or 32FC3 correspondingly. Such a matrix can be
|
||||
easily constructed from `std::vector` (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Mat from points vector
|
||||
|
||||
One can access a point in this matrix using the same method `Mat::at` (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Point access
|
||||
|
||||
### Memory management and reference counting
|
||||
|
||||
Mat is a structure that keeps matrix/image characteristics (rows and columns number, data type etc)
|
||||
and a pointer to data. So nothing prevents us from having several instances of Mat corresponding to
|
||||
the same data. A Mat keeps a reference count that tells if data has to be deallocated when a
|
||||
particular instance of Mat is destroyed. Here is an example of creating two matrices without copying
|
||||
data (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 1
|
||||
|
||||
As a result, we get a 32FC1 matrix with 3 columns instead of 32FC3 matrix with 1 column. `pointsMat`
|
||||
uses data from points and will not deallocate the memory when destroyed. In this particular
|
||||
instance, however, developer has to make sure that lifetime of `points` is longer than of `pointsMat`
|
||||
If we need to copy the data, this is done using, for example, cv::Mat::copyTo or cv::Mat::clone:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 2
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Reference counting 2
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Reference counting 2
|
||||
@end_toggle
|
||||
|
||||
An empty output Mat can be supplied to each function.
|
||||
Each implementation calls Mat::create for a destination matrix.
|
||||
This method allocates data for a matrix if it is empty.
|
||||
If it is not empty and has the correct size and type, the method does nothing.
|
||||
If however, size or type are different from the input arguments, the data is deallocated (and lost) and a new data is allocated.
|
||||
For example:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 3
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Reference counting 3
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Reference counting 3
|
||||
@end_toggle
|
||||
|
||||
### Primitive operations
|
||||
|
||||
There is a number of convenient operators defined on a matrix. For example, here is how we can make
|
||||
a black image from an existing greyscale image `img`
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Set image to black
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Set image to black
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Set image to black
|
||||
@end_toggle
|
||||
|
||||
Selecting a region of interest:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Select ROI
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Select ROI
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Select ROI
|
||||
@end_toggle
|
||||
|
||||
Conversion from color to greyscale:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp BGR to Gray
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java BGR to Gray
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py BGR to Gray
|
||||
@end_toggle
|
||||
|
||||
Change image type from 8UC1 to 32FC1:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Convert to CV_32F
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Convert to CV_32F
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Convert to CV_32F
|
||||
@end_toggle
|
||||
|
||||
### Visualizing images
|
||||
|
||||
It is very useful to see intermediate results of your algorithm during development process. OpenCV
|
||||
provides a convenient way of visualizing images. A 8U image can be shown using:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp imshow 1
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java imshow 1
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py imshow 1
|
||||
@end_toggle
|
||||
|
||||
A call to waitKey() starts a message passing cycle that waits for a key stroke in the "image"
|
||||
window. A 32F image needs to be converted to 8U type. For example:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp imshow 2
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java imshow 2
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py imshow 2
|
||||
@end_toggle
|
||||
|
||||
@note Here cv::namedWindow is not necessary since it is immediately followed by cv::imshow.
|
||||
Nevertheless, it can be used to change the window properties or when using cv::createTrackbar
|
||||
|
After Width: | Height: | Size: 2.3 KiB |
|
After Width: | Height: | Size: 4.7 KiB |
|
After Width: | Height: | Size: 5.0 KiB |
|
After Width: | Height: | Size: 1.7 KiB |
|
After Width: | Height: | Size: 2.3 KiB |
|
After Width: | Height: | Size: 4.0 KiB |
|
After Width: | Height: | Size: 7.3 KiB |
|
After Width: | Height: | Size: 10 KiB |
|
After Width: | Height: | Size: 3.5 KiB |
|
After Width: | Height: | Size: 16 KiB |
|
After Width: | Height: | Size: 2.2 KiB |
|
After Width: | Height: | Size: 2.2 KiB |
|
After Width: | Height: | Size: 6.7 KiB |
|
After Width: | Height: | Size: 7.9 KiB |
|
After Width: | Height: | Size: 21 KiB |
@@ -0,0 +1,271 @@
|
||||
Mat - The Basic Image Container {#tutorial_mat_the_basic_image_container}
|
||||
===============================
|
||||
|
||||
@next_tutorial{tutorial_how_to_scan_images}
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
We have multiple ways to acquire digital images from the real world: digital cameras, scanners,
|
||||
computed tomography, and magnetic resonance imaging to name a few. In every case what we (humans)
|
||||
see are images. However, when transforming this to our digital devices what we record are numerical
|
||||
values for each of the points of the image.
|
||||
|
||||

|
||||
|
||||
For example in the above image you can see that the mirror of the car is nothing more than a matrix
|
||||
containing all the intensity values of the pixel points. How we get and store the pixels values may
|
||||
vary according to our needs, but in the end all images inside a computer world may be reduced to
|
||||
numerical matrices and other information describing the matrix itself. *OpenCV* is a computer vision
|
||||
library whose main focus is to process and manipulate this information. Therefore, the first thing
|
||||
you need to be familiar with is how OpenCV stores and handles images.
|
||||
|
||||
Mat
|
||||
---
|
||||
|
||||
OpenCV has been around since 2001. In those days the library was built around a *C* interface and to
|
||||
store the image in the memory they used a C structure called *IplImage*. This is the one you'll see
|
||||
in most of the older tutorials and educational materials. The problem with this is that it brings to
|
||||
the table all the minuses of the C language. The biggest issue is the manual memory management. It
|
||||
builds on the assumption that the user is responsible for taking care of memory allocation and
|
||||
deallocation. While this is not a problem with smaller programs, once your code base grows it will
|
||||
be more of a struggle to handle all this rather than focusing on solving your development goal.
|
||||
|
||||
Luckily C++ came around and introduced the concept of classes making easier for the user through
|
||||
automatic memory management (more or less). The good news is that C++ is fully compatible with C so
|
||||
no compatibility issues can arise from making the change. Therefore, OpenCV 2.0 introduced a new C++
|
||||
interface which offered a new way of doing things which means you do not need to fiddle with memory
|
||||
management, making your code concise (less to write, to achieve more). The main downside of the C++
|
||||
interface is that many embedded development systems at the moment support only C. Therefore, unless
|
||||
you are targeting embedded platforms, there's no point to using the *old* methods (unless you're a
|
||||
masochist programmer and you're asking for trouble).
|
||||
|
||||
The first thing you need to know about *Mat* is that you no longer need to manually allocate its
|
||||
memory and release it as soon as you do not need it. While doing this is still a possibility, most
|
||||
of the OpenCV functions will allocate its output data automatically. As a nice bonus if you pass on
|
||||
an already existing *Mat* object, which has already allocated the required space for the matrix,
|
||||
this will be reused. In other words we use at all times only as much memory as we need to perform
|
||||
the task.
|
||||
|
||||
*Mat* is basically a class with two data parts: the matrix header (containing information such as
|
||||
the size of the matrix, the method used for storing, at which address is the matrix stored, and so
|
||||
on) and a pointer to the matrix containing the pixel values (taking any dimensionality depending on
|
||||
the method chosen for storing) . The matrix header size is constant, however the size of the matrix
|
||||
itself may vary from image to image and usually is larger by orders of magnitude.
|
||||
|
||||
OpenCV is an image processing library. It contains a large collection of image processing functions.
|
||||
To solve a computational challenge, most of the time you will end up using multiple functions of the
|
||||
library. Because of this, passing images to functions is a common practice. We should not forget
|
||||
that we are talking about image processing algorithms, which tend to be quite computational heavy.
|
||||
The last thing we want to do is further decrease the speed of your program by making unnecessary
|
||||
copies of potentially *large* images.
|
||||
|
||||
To tackle this issue OpenCV uses a reference counting system. The idea is that each *Mat* object has
|
||||
its own header, however a matrix may be shared between two *Mat* objects by having their matrix
|
||||
pointers point to the same address. Moreover, the copy operators **will only copy the headers** and
|
||||
the pointer to the large matrix, not the data itself.
|
||||
|
||||
@code{.cpp}
|
||||
Mat A, C; // creates just the header parts
|
||||
A = imread(argv[1], IMREAD_COLOR); // here we'll know the method used (allocate matrix)
|
||||
|
||||
Mat B(A); // Use the copy constructor
|
||||
|
||||
C = A; // Assignment operator
|
||||
@endcode
|
||||
|
||||
All the above objects, in the end, point to the same single data matrix and making a modification
|
||||
using any of them will affect all the other ones as well. In practice the different objects just
|
||||
provide different access methods to the same underlying data. Nevertheless, their header parts are
|
||||
different. The real interesting part is that you can create headers which refer to only a subsection
|
||||
of the full data. For example, to create a region of interest (*ROI*) in an image you just create
|
||||
a new header with the new boundaries:
|
||||
@code{.cpp}
|
||||
Mat D (A, Rect(10, 10, 100, 100) ); // using a rectangle
|
||||
Mat E = A(Range::all(), Range(1,3)); // using row and column boundaries
|
||||
@endcode
|
||||
Now you may ask -- if the matrix itself may belong to multiple *Mat* objects who takes responsibility
|
||||
for cleaning it up when it's no longer needed. The short answer is: the last object that used it.
|
||||
This is handled by using a reference counting mechanism. Whenever somebody copies a header of a
|
||||
*Mat* object, a counter is increased for the matrix. Whenever a header is cleaned, this counter
|
||||
is decreased. When the counter reaches zero the matrix is freed. Sometimes you will want to copy
|
||||
the matrix itself too, so OpenCV provides @ref cv::Mat::clone() and @ref cv::Mat::copyTo() functions.
|
||||
@code{.cpp}
|
||||
Mat F = A.clone();
|
||||
Mat G;
|
||||
A.copyTo(G);
|
||||
@endcode
|
||||
Now modifying *F* or *G* will not affect the matrix pointed by the *A*'s header. What you need to
|
||||
remember from all this is that:
|
||||
|
||||
- Output image allocation for OpenCV functions is automatic (unless specified otherwise).
|
||||
- You do not need to think about memory management with OpenCV's C++ interface.
|
||||
- The assignment operator and the copy constructor only copies the header.
|
||||
- The underlying matrix of an image may be copied using the @ref cv::Mat::clone() and @ref cv::Mat::copyTo()
|
||||
functions.
|
||||
|
||||
Storing methods
|
||||
-----------------
|
||||
|
||||
This is about how you store the pixel values. You can select the color space and the data type used.
|
||||
The color space refers to how we combine color components in order to code a given color. The
|
||||
simplest one is the grayscale where the colors at our disposal are black and white. The combination
|
||||
of these allows us to create many shades of gray.
|
||||
|
||||
For *colorful* ways we have a lot more methods to choose from. Each of them breaks it down to three
|
||||
or four basic components and we can use the combination of these to create the others. The most
|
||||
popular one is RGB, mainly because this is also how our eye builds up colors. Its base colors are
|
||||
red, green and blue. To code the transparency of a color sometimes a fourth element: alpha (A) is
|
||||
added.
|
||||
|
||||
There are, however, many other color systems each with their own advantages:
|
||||
|
||||
- RGB is the most common as our eyes use something similar, however keep in mind that OpenCV standard display
|
||||
system composes colors using the BGR color space (red and blue channels are swapped places).
|
||||
- The HSV and HLS decompose colors into their hue, saturation and value/luminance components,
|
||||
which is a more natural way for us to describe colors. You might, for example, dismiss the last
|
||||
component, making your algorithm less sensible to the light conditions of the input image.
|
||||
- YCrCb is used by the popular JPEG image format.
|
||||
- CIE L\*a\*b\* is a perceptually uniform color space, which comes in handy if you need to measure
|
||||
the *distance* of a given color to another color.
|
||||
|
||||
Each of the building components has its own valid domains. This leads to the data type used. How
|
||||
we store a component defines the control we have over its domain. The smallest data type possible is
|
||||
*char*, which means one byte or 8 bits. This may be unsigned (so can store values from 0 to 255) or
|
||||
signed (values from -127 to +127). Although in case of three components this already gives 16
|
||||
million possible colors to represent (like in case of RGB) we may acquire an even finer control by
|
||||
using the float (4 byte = 32 bit) or double (8 byte = 64 bit) data types for each component.
|
||||
Nevertheless, remember that increasing the size of a component also increases the size of the whole
|
||||
picture in the memory.
|
||||
|
||||
Creating a Mat object explicitly
|
||||
----------------------------------
|
||||
|
||||
In the @ref tutorial_load_save_image tutorial you have already learned how to write a matrix to an image
|
||||
file by using the @ref cv::imwrite() function. However, for debugging purposes it's much more
|
||||
convenient to see the actual values. You can do this using the \<\< operator of *Mat*. Be aware that
|
||||
this only works for two dimensional matrices.
|
||||
|
||||
Although *Mat* works really well as an image container, it is also a general matrix class.
|
||||
Therefore, it is possible to create and manipulate multidimensional matrices. You can create a Mat
|
||||
object in multiple ways:
|
||||
|
||||
- @ref cv::Mat::Mat Constructor
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp constructor
|
||||
|
||||

|
||||
|
||||
For two dimensional and multichannel images we first define their size: row and column count wise.
|
||||
|
||||
Then we need to specify the data type to use for storing the elements and the number of channels
|
||||
per matrix point. To do this we have multiple definitions constructed according to the following
|
||||
convention:
|
||||
@code
|
||||
CV_[The number of bits per item][Signed or Unsigned][Type Prefix]C[The channel number]
|
||||
@endcode
|
||||
For instance, *CV_8UC3* means we use unsigned char types that are 8 bit long and each pixel has
|
||||
three of these to form the three channels. There are types predefined for up to four channels. The
|
||||
@ref cv::Scalar is four element short vector. Specify it and you can initialize all matrix
|
||||
points with a custom value. If you need more you can create the type with the upper macro, setting
|
||||
the channel number in parenthesis as you can see below.
|
||||
|
||||
- Use C/C++ arrays and initialize via constructor
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp init
|
||||
|
||||
The upper example shows how to create a matrix with more than two dimensions. Specify its
|
||||
dimension, then pass a pointer containing the size for each dimension and the rest remains the
|
||||
same.
|
||||
|
||||
- @ref cv::Mat::create function:
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp create
|
||||
|
||||

|
||||
|
||||
You cannot initialize the matrix values with this construction. It will only reallocate its matrix
|
||||
data memory if the new size will not fit into the old one.
|
||||
|
||||
- MATLAB style initializer: @ref cv::Mat::zeros , @ref cv::Mat::ones , @ref cv::Mat::eye . Specify size and
|
||||
data type to use:
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp matlab
|
||||
|
||||

|
||||
|
||||
- For small matrices you may use comma separated initializers or initializer lists (C++11 support is required in the last case):
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp comma
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp list
|
||||
|
||||

|
||||
|
||||
- Create a new header for an existing *Mat* object and @ref cv::Mat::clone or @ref cv::Mat::copyTo it.
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp clone
|
||||
|
||||

|
||||
|
||||
@note
|
||||
You can fill out a matrix with random values using the @ref cv::randu() function. You need to
|
||||
give a lower and upper limit for the random values:
|
||||
@snippet mat_the_basic_image_container.cpp random
|
||||
|
||||
|
||||
Output formatting
|
||||
-----------------
|
||||
|
||||
In the above examples you could see the default formatting option. OpenCV, however, allows you to
|
||||
format your matrix output:
|
||||
|
||||
- Default
|
||||
@snippet mat_the_basic_image_container.cpp out-default
|
||||

|
||||
|
||||
- Python
|
||||
@snippet mat_the_basic_image_container.cpp out-python
|
||||

|
||||
|
||||
- Comma separated values (CSV)
|
||||
@snippet mat_the_basic_image_container.cpp out-csv
|
||||

|
||||
|
||||
- Numpy
|
||||
@snippet mat_the_basic_image_container.cpp out-numpy
|
||||

|
||||
|
||||
- C
|
||||
@snippet mat_the_basic_image_container.cpp out-c
|
||||

|
||||
|
||||
Output of other common items
|
||||
----------------------------
|
||||
|
||||
OpenCV offers support for output of other common OpenCV data structures too via the \<\< operator:
|
||||
|
||||
- 2D Point
|
||||
@snippet mat_the_basic_image_container.cpp out-point2
|
||||

|
||||
|
||||
- 3D Point
|
||||
@snippet mat_the_basic_image_container.cpp out-point3
|
||||

|
||||
|
||||
- std::vector via cv::Mat
|
||||
@snippet mat_the_basic_image_container.cpp out-vector
|
||||

|
||||
|
||||
- std::vector of points
|
||||
@snippet mat_the_basic_image_container.cpp out-vector-points
|
||||

|
||||
|
||||
Most of the samples here have been included in a small console application. You can download it from
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/mat_the_basic_image_container/mat_the_basic_image_container.cpp)
|
||||
or in the core section of the cpp samples.
|
||||
|
||||
You can also find a quick video demonstration of this on
|
||||
[YouTube](https://www.youtube.com/watch?v=1tibU7vGWpk).
|
||||
|
||||
@youtube{1tibU7vGWpk}
|
||||
@@ -0,0 +1,89 @@
|
||||
The Core Functionality (core module) {#tutorial_table_of_content_core}
|
||||
=====================================
|
||||
|
||||
Here you will learn the about the basic building blocks of the library. A must read and know for
|
||||
understanding how to manipulate the images on a pixel level.
|
||||
|
||||
- @subpage tutorial_mat_the_basic_image_container
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Bernát Gábor
|
||||
|
||||
You will learn how to store images in the memory and how to print out their content to the
|
||||
console.
|
||||
|
||||
- @subpage tutorial_how_to_scan_images
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Bernát Gábor
|
||||
|
||||
You'll find out how to scan images (go through each of the image pixels) with OpenCV.
|
||||
Bonus: time measurement with OpenCV.
|
||||
|
||||
|
||||
- @subpage tutorial_mat_mask_operations
|
||||
|
||||
*Languages:* C++, Java, Python
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Bernát Gábor
|
||||
|
||||
You'll find out how to scan images with neighbor access and use the @ref cv::filter2D
|
||||
function to apply kernel filters on images.
|
||||
|
||||
- @subpage tutorial_mat_operations
|
||||
|
||||
*Languages:* C++, Java, Python
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
Reading/writing images from file, accessing pixels, primitive operations, visualizing images.
|
||||
|
||||
- @subpage tutorial_adding_images
|
||||
|
||||
*Languages:* C++, Java, Python
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Ana Huamán
|
||||
|
||||
We will learn how to blend two images!
|
||||
|
||||
- @subpage tutorial_basic_linear_transform
|
||||
|
||||
*Languages:* C++, Java, Python
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Ana Huamán
|
||||
|
||||
We will learn how to change our image appearance!
|
||||
|
||||
- @subpage tutorial_discrete_fourier_transform
|
||||
|
||||
*Languages:* C++, Java, Python
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Bernát Gábor
|
||||
|
||||
You will see how and why use the Discrete Fourier transformation with OpenCV.
|
||||
|
||||
|
||||
- @subpage tutorial_file_input_output_with_xml_yml
|
||||
|
||||
*Compatibility:* \> OpenCV 2.0
|
||||
|
||||
*Author:* Bernát Gábor
|
||||
|
||||
You will see how to use the @ref cv::FileStorage data structure of OpenCV to write and read
|
||||
data to XML or YAML file format.
|
||||
|
||||
- @subpage tutorial_how_to_use_OpenCV_parallel_for_
|
||||
|
||||
*Compatibility:* \>= OpenCV 2.4.3
|
||||
|
||||
You will see how to use the OpenCV parallel_for_ to easily parallelize your code.
|
||||
|
After Width: | Height: | Size: 15 KiB |
BIN
Lib/opencv/sources/doc/tutorials/dnn/dnn_android/11_demo.jpg
Normal file
|
After Width: | Height: | Size: 118 KiB |
|
After Width: | Height: | Size: 41 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 34 KiB |
|
After Width: | Height: | Size: 37 KiB |
BIN
Lib/opencv/sources/doc/tutorials/dnn/dnn_android/5_setup.png
Normal file
|
After Width: | Height: | Size: 5.6 KiB |
|
After Width: | Height: | Size: 9.5 KiB |
|
After Width: | Height: | Size: 28 KiB |
|
After Width: | Height: | Size: 52 KiB |
|
After Width: | Height: | Size: 56 KiB |
@@ -0,0 +1,97 @@
|
||||
# How to run deep networks on Android device {#tutorial_dnn_android}
|
||||
|
||||
## Introduction
|
||||
In this tutorial you'll know how to run deep learning networks on Android device
|
||||
using OpenCV deep learning module.
|
||||
|
||||
Tutorial was written for the following versions of corresponding software:
|
||||
- Android Studio 2.3.3
|
||||
- OpenCV 3.3.0+
|
||||
|
||||
## Requirements
|
||||
|
||||
- Download and install Android Studio from https://developer.android.com/studio.
|
||||
|
||||
- Get the latest pre-built OpenCV for Android release from https://github.com/opencv/opencv/releases and unpack it (for example, `opencv-4.2.0-android-sdk.zip`).
|
||||
|
||||
- Download MobileNet object detection model from https://github.com/chuanqi305/MobileNet-SSD. We need a configuration file `MobileNetSSD_deploy.prototxt` and weights `MobileNetSSD_deploy.caffemodel`.
|
||||
|
||||
## Create an empty Android Studio project
|
||||
- Open Android Studio. Start a new project. Let's call it `opencv_mobilenet`.
|
||||

|
||||
|
||||
- Keep default target settings.
|
||||

|
||||
|
||||
- Use "Empty Activity" template. Name activity as `MainActivity` with a
|
||||
corresponding layout `activity_main`.
|
||||

|
||||
|
||||

|
||||
|
||||
- Wait until a project was created. Go to `Run->Edit Configurations`.
|
||||
Choose `USB Device` as target device for runs.
|
||||

|
||||
Plug in your device and run the project. It should be installed and launched
|
||||
successfully before we'll go next.
|
||||
@note Read @ref tutorial_android_dev_intro in case of problems.
|
||||
|
||||

|
||||
|
||||
## Add OpenCV dependency
|
||||
|
||||
- Go to `File->New->Import module` and provide a path to `unpacked_OpenCV_package/sdk/java`. The name of module detects automatically.
|
||||
Disable all features that Android Studio will suggest you on the next window.
|
||||

|
||||
|
||||

|
||||
|
||||
- Open two files:
|
||||
|
||||
1. `AndroidStudioProjects/opencv_mobilenet/app/build.gradle`
|
||||
|
||||
2. `AndroidStudioProjects/opencv_mobilenet/openCVLibrary330/build.gradle`
|
||||
|
||||
Copy both `compileSdkVersion` and `buildToolsVersion` from the first file to
|
||||
the second one.
|
||||
|
||||
`compileSdkVersion 14` -> `compileSdkVersion 26`
|
||||
|
||||
`buildToolsVersion "25.0.0"` -> `buildToolsVersion "26.0.1"`
|
||||
|
||||
- Make the project. There is no errors should be at this point.
|
||||
|
||||
- Go to `File->Project Structure`. Add OpenCV module dependency.
|
||||

|
||||
|
||||

|
||||
|
||||
- Install once an appropriate OpenCV manager from `unpacked_OpenCV_package/apk`
|
||||
to target device.
|
||||
@code
|
||||
adb install OpenCV_3.3.0_Manager_3.30_armeabi-v7a.apk
|
||||
@endcode
|
||||
|
||||
- Congratulations! We're ready now to make a sample using OpenCV.
|
||||
|
||||
## Make a sample
|
||||
Our sample will takes pictures from a camera, forwards it into a deep network and
|
||||
receives a set of rectangles, class identifiers and confidence values in `[0, 1]`
|
||||
range.
|
||||
|
||||
- First of all, we need to add a necessary widget which displays processed
|
||||
frames. Modify `app/src/main/res/layout/activity_main.xml`:
|
||||
@include android/mobilenet-objdetect/res/layout/activity_main.xml
|
||||
|
||||
- Put downloaded `MobileNetSSD_deploy.prototxt` and `MobileNetSSD_deploy.caffemodel`
|
||||
into `app/build/intermediates/assets/debug` folder.
|
||||
|
||||
- Modify `/app/src/main/AndroidManifest.xml` to enable full-screen mode, set up
|
||||
a correct screen orientation and allow to use a camera.
|
||||
@include android/mobilenet-objdetect/AndroidManifest.xml
|
||||
|
||||
- Replace content of `app/src/main/java/org/opencv/samples/opencv_mobilenet/MainActivity.java`:
|
||||
@include android/mobilenet-objdetect/src/org/opencv/samples/opencv_mobilenet/MainActivity.java
|
||||
|
||||
- Launch an application and make a fun!
|
||||

|
||||
@@ -0,0 +1,226 @@
|
||||
# Custom deep learning layers support {#tutorial_dnn_custom_layers}
|
||||
|
||||
## Introduction
|
||||
Deep learning is a fast growing area. The new approaches to build neural networks
|
||||
usually introduce new types of layers. They could be modifications of existing
|
||||
ones or implement outstanding researching ideas.
|
||||
|
||||
OpenCV gives an opportunity to import and run networks from different deep learning
|
||||
frameworks. There are a number of the most popular layers. However you can face
|
||||
a problem that your network cannot be imported using OpenCV because of unimplemented layers.
|
||||
|
||||
The first solution is to create a feature request at https://github.com/opencv/opencv/issues
|
||||
mentioning details such a source of model and type of new layer. A new layer could
|
||||
be implemented if OpenCV community shares this need.
|
||||
|
||||
The second way is to define a **custom layer** so OpenCV's deep learning engine
|
||||
will know how to use it. This tutorial is dedicated to show you a process of deep
|
||||
learning models import customization.
|
||||
|
||||
## Define a custom layer in C++
|
||||
Deep learning layer is a building block of network's pipeline.
|
||||
It has connections to **input blobs** and produces results to **output blobs**.
|
||||
There are trained **weights** and **hyper-parameters**.
|
||||
Layers' names, types, weights and hyper-parameters are stored in files are generated by
|
||||
native frameworks during training. If OpenCV mets unknown layer type it throws an
|
||||
exception trying to read a model:
|
||||
|
||||
```
|
||||
Unspecified error: Can't create layer "layer_name" of type "MyType" in function getLayerInstance
|
||||
```
|
||||
|
||||
To import the model correctly you have to derive a class from cv::dnn::Layer with
|
||||
the following methods:
|
||||
|
||||
@snippet dnn/custom_layers.hpp A custom layer interface
|
||||
|
||||
And register it before the import:
|
||||
|
||||
@snippet dnn/custom_layers.hpp Register a custom layer
|
||||
|
||||
@note `MyType` is a type of unimplemented layer from the thrown exception.
|
||||
|
||||
Let's see what all the methods do:
|
||||
|
||||
- Constructor
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::MyLayer
|
||||
|
||||
Retrieves hyper-parameters from cv::dnn::LayerParams. If your layer has trainable
|
||||
weights they will be already stored in the Layer's member cv::dnn::Layer::blobs.
|
||||
|
||||
- A static method `create`
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::create
|
||||
|
||||
This method should create an instance of you layer and return cv::Ptr with it.
|
||||
|
||||
- Output blobs' shape computation
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::getMemoryShapes
|
||||
|
||||
Returns layer's output shapes depends on input shapes. You may request an extra
|
||||
memory using `internals`.
|
||||
|
||||
- Run a layer
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::forward
|
||||
|
||||
Implement a layer's logic here. Compute outputs for given inputs.
|
||||
|
||||
@note OpenCV manages memory allocated for layers. In the most cases the same memory
|
||||
can be reused between layers. So your `forward` implementation should not rely that
|
||||
the second invocation of `forward` will has the same data at `outputs` and `internals`.
|
||||
|
||||
- Optional `finalize` method
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::finalize
|
||||
|
||||
The chain of methods are the following: OpenCV deep learning engine calls `create`
|
||||
method once then it calls `getMemoryShapes` for an every created layer then you
|
||||
can make some preparations depends on known input dimensions at cv::dnn::Layer::finalize.
|
||||
After network was initialized only `forward` method is called for an every network's input.
|
||||
|
||||
@note Varying input blobs' sizes such height or width or batch size you make OpenCV
|
||||
reallocate all the internal memory. That leads efficiency gaps. Try to initialize
|
||||
and deploy models using a fixed batch size and image's dimensions.
|
||||
|
||||
## Example: custom layer from Caffe
|
||||
Let's create a custom layer `Interp` from https://github.com/cdmh/deeplab-public.
|
||||
It's just a simple resize that takes an input blob of size `N x C x Hi x Wi` and returns
|
||||
an output blob of size `N x C x Ho x Wo` where `N` is a batch size, `C` is a number of channels,
|
||||
`Hi x Wi` and `Ho x Wo` are input and output `height x width` correspondingly.
|
||||
This layer has no trainable weights but it has hyper-parameters to specify an output size.
|
||||
|
||||
In example,
|
||||
~~~~~~~~~~~~~
|
||||
layer {
|
||||
name: "output"
|
||||
type: "Interp"
|
||||
bottom: "input"
|
||||
top: "output"
|
||||
interp_param {
|
||||
height: 9
|
||||
width: 8
|
||||
}
|
||||
}
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
This way our implementation can look like:
|
||||
|
||||
@snippet dnn/custom_layers.hpp InterpLayer
|
||||
|
||||
Next we need to register a new layer type and try to import the model.
|
||||
|
||||
@snippet dnn/custom_layers.hpp Register InterpLayer
|
||||
|
||||
## Example: custom layer from TensorFlow
|
||||
This is an example of how to import a network with [tf.image.resize_bilinear](https://www.tensorflow.org/versions/master/api_docs/python/tf/image/resize_bilinear)
|
||||
operation. This is also a resize but with an implementation different from OpenCV's or `Interp` above.
|
||||
|
||||
Let's create a single layer network:
|
||||
~~~~~~~~~~~~~{.py}
|
||||
inp = tf.placeholder(tf.float32, [2, 3, 4, 5], 'input')
|
||||
resized = tf.image.resize_bilinear(inp, size=[9, 8], name='resize_bilinear')
|
||||
~~~~~~~~~~~~~
|
||||
OpenCV sees that TensorFlow's graph in the following way:
|
||||
|
||||
```
|
||||
node {
|
||||
name: "input"
|
||||
op: "Placeholder"
|
||||
attr {
|
||||
key: "dtype"
|
||||
value {
|
||||
type: DT_FLOAT
|
||||
}
|
||||
}
|
||||
}
|
||||
node {
|
||||
name: "resize_bilinear/size"
|
||||
op: "Const"
|
||||
attr {
|
||||
key: "dtype"
|
||||
value {
|
||||
type: DT_INT32
|
||||
}
|
||||
}
|
||||
attr {
|
||||
key: "value"
|
||||
value {
|
||||
tensor {
|
||||
dtype: DT_INT32
|
||||
tensor_shape {
|
||||
dim {
|
||||
size: 2
|
||||
}
|
||||
}
|
||||
tensor_content: "\t\000\000\000\010\000\000\000"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
node {
|
||||
name: "resize_bilinear"
|
||||
op: "ResizeBilinear"
|
||||
input: "input:0"
|
||||
input: "resize_bilinear/size"
|
||||
attr {
|
||||
key: "T"
|
||||
value {
|
||||
type: DT_FLOAT
|
||||
}
|
||||
}
|
||||
attr {
|
||||
key: "align_corners"
|
||||
value {
|
||||
b: false
|
||||
}
|
||||
}
|
||||
}
|
||||
library {
|
||||
}
|
||||
```
|
||||
Custom layers import from TensorFlow is designed to put all layer's `attr` into
|
||||
cv::dnn::LayerParams but input `Const` blobs into cv::dnn::Layer::blobs.
|
||||
In our case resize's output shape will be stored in layer's `blobs[0]`.
|
||||
|
||||
@snippet dnn/custom_layers.hpp ResizeBilinearLayer
|
||||
|
||||
Next we register a layer and try to import the model.
|
||||
|
||||
@snippet dnn/custom_layers.hpp Register ResizeBilinearLayer
|
||||
|
||||
## Define a custom layer in Python
|
||||
The following example shows how to customize OpenCV's layers in Python.
|
||||
|
||||
Let's consider [Holistically-Nested Edge Detection](https://arxiv.org/abs/1504.06375)
|
||||
deep learning model. That was trained with one and only difference comparing to
|
||||
a current version of [Caffe framework](http://caffe.berkeleyvision.org/). `Crop`
|
||||
layers that receive two input blobs and crop the first one to match spatial dimensions
|
||||
of the second one used to crop from the center. Nowadays Caffe's layer does it
|
||||
from the top-left corner. So using the latest version of Caffe or OpenCV you'll
|
||||
get shifted results with filled borders.
|
||||
|
||||
Next we're going to replace OpenCV's `Crop` layer that makes top-left cropping by
|
||||
a centric one.
|
||||
|
||||
- Create a class with `getMemoryShapes` and `forward` methods
|
||||
|
||||
@snippet dnn/edge_detection.py CropLayer
|
||||
|
||||
@note Both methods should return lists.
|
||||
|
||||
- Register a new layer.
|
||||
|
||||
@snippet dnn/edge_detection.py Register
|
||||
|
||||
That's it! We've replaced an implemented OpenCV's layer to a custom one.
|
||||
You may find a full script in the [source code](https://github.com/opencv/opencv/tree/master/samples/dnn/edge_detection.py).
|
||||
|
||||
<table border="0">
|
||||
<tr>
|
||||
<td></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</table>
|
||||
@@ -0,0 +1,65 @@
|
||||
Load Caffe framework models {#tutorial_dnn_googlenet}
|
||||
===========================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
In this tutorial you will learn how to use opencv_dnn module for image classification by using
|
||||
GoogLeNet trained network from [Caffe model zoo](http://caffe.berkeleyvision.org/model_zoo.html).
|
||||
|
||||
We will demonstrate results of this example on the following picture.
|
||||

|
||||
|
||||
Source Code
|
||||
-----------
|
||||
|
||||
We will be using snippets from the example application, that can be downloaded [here](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.cpp).
|
||||
|
||||
@include dnn/classification.cpp
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
-# Firstly, download GoogLeNet model files:
|
||||
[bvlc_googlenet.prototxt ](https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/bvlc_googlenet.prototxt) and
|
||||
[bvlc_googlenet.caffemodel](http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel)
|
||||
|
||||
Also you need file with names of [ILSVRC2012](http://image-net.org/challenges/LSVRC/2012/browse-synsets) classes:
|
||||
[classification_classes_ILSVRC2012.txt](https://github.com/opencv/opencv/blob/master/samples/data/dnn/classification_classes_ILSVRC2012.txt).
|
||||
|
||||
Put these files into working dir of this program example.
|
||||
|
||||
-# Read and initialize network using path to .prototxt and .caffemodel files
|
||||
@snippet dnn/classification.cpp Read and initialize network
|
||||
|
||||
You can skip an argument `framework` if one of the files `model` or `config` has an
|
||||
extension `.caffemodel` or `.prototxt`.
|
||||
This way function cv::dnn::readNet can automatically detects a model's format.
|
||||
|
||||
-# Read input image and convert to the blob, acceptable by GoogleNet
|
||||
@snippet dnn/classification.cpp Open a video file or an image file or a camera stream
|
||||
|
||||
cv::VideoCapture can load both images and videos.
|
||||
|
||||
@snippet dnn/classification.cpp Create a 4D blob from a frame
|
||||
We convert the image to a 4-dimensional blob (so-called batch) with `1x3x224x224` shape
|
||||
after applying necessary pre-processing like resizing and mean subtraction
|
||||
`(-104, -117, -123)` for each blue, green and red channels correspondingly using cv::dnn::blobFromImage function.
|
||||
|
||||
-# Pass the blob to the network
|
||||
@snippet dnn/classification.cpp Set input blob
|
||||
|
||||
-# Make forward pass
|
||||
@snippet dnn/classification.cpp Make forward pass
|
||||
During the forward pass output of each network layer is computed, but in this example we need output from the last layer only.
|
||||
|
||||
-# Determine the best class
|
||||
@snippet dnn/classification.cpp Get a class with a highest score
|
||||
We put the output of network, which contain probabilities for each of 1000 ILSVRC2012 image classes, to the `prob` blob.
|
||||
And find the index of element with maximal value in this one. This index corresponds to the class of the image.
|
||||
|
||||
-# Run an example from command line
|
||||
@code
|
||||
./example_dnn_classification --model=bvlc_googlenet.caffemodel --config=bvlc_googlenet.prototxt --width=224 --height=224 --classes=classification_classes_ILSVRC2012.txt --input=space_shuttle.jpg --mean="104 117 123"
|
||||
@endcode
|
||||
For our image we get prediction of class `space shuttle` with more than 99% sureness.
|
||||
@@ -0,0 +1,78 @@
|
||||
# How to enable Halide backend for improve efficiency {#tutorial_dnn_halide}
|
||||
|
||||
## Introduction
|
||||
This tutorial guidelines how to run your models in OpenCV deep learning module
|
||||
using Halide language backend. Halide is an open-source project that let us
|
||||
write image processing algorithms in well-readable format, schedule computations
|
||||
according to specific device and evaluate it with a quite good efficiency.
|
||||
|
||||
An official website of the Halide project: http://halide-lang.org/.
|
||||
|
||||
An up to date efficiency comparison: https://github.com/opencv/opencv/wiki/DNN-Efficiency
|
||||
|
||||
## Requirements
|
||||
### LLVM compiler
|
||||
|
||||
@note LLVM compilation might take a long time.
|
||||
|
||||
- Download LLVM source code from http://releases.llvm.org/4.0.0/llvm-4.0.0.src.tar.xz.
|
||||
Unpack it. Let **llvm_root** is a root directory of source code.
|
||||
|
||||
- Create directory **llvm_root**/tools/clang
|
||||
|
||||
- Download Clang with the same version as LLVM. In our case it will be from
|
||||
http://releases.llvm.org/4.0.0/cfe-4.0.0.src.tar.xz. Unpack it into
|
||||
**llvm_root**/tools/clang. Note that it should be a root for Clang source code.
|
||||
|
||||
- Build LLVM on Linux
|
||||
@code
|
||||
cd llvm_root
|
||||
mkdir build && cd build
|
||||
cmake -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release ..
|
||||
make -j4
|
||||
@endcode
|
||||
|
||||
- Build LLVM on Windows (Developer Command Prompt)
|
||||
@code
|
||||
mkdir \\path-to-llvm-build\\ && cd \\path-to-llvm-build\\
|
||||
cmake.exe -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=\\path-to-llvm-install\\ -G "Visual Studio 14 Win64" \\path-to-llvm-src\\
|
||||
MSBuild.exe /m:4 /t:Build /p:Configuration=Release .\\INSTALL.vcxproj
|
||||
@endcode
|
||||
|
||||
@note `\\path-to-llvm-build\\` and `\\path-to-llvm-install\\` are different directories.
|
||||
|
||||
### Halide language.
|
||||
|
||||
- Download source code from GitHub repository, https://github.com/halide/Halide
|
||||
or using git. The root directory will be a **halide_root**.
|
||||
@code
|
||||
git clone https://github.com/halide/Halide.git
|
||||
@endcode
|
||||
|
||||
- Build Halide on Linux
|
||||
@code
|
||||
cd halide_root
|
||||
mkdir build && cd build
|
||||
cmake -DLLVM_DIR=llvm_root/build/lib/cmake/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_VERSION=40 -DWITH_TESTS=OFF -DWITH_APPS=OFF -DWITH_TUTORIALS=OFF ..
|
||||
make -j4
|
||||
@endcode
|
||||
|
||||
- Build Halide on Windows (Developer Command Prompt)
|
||||
@code
|
||||
cd halide_root
|
||||
mkdir build && cd build
|
||||
cmake.exe -DLLVM_DIR=\\path-to-llvm-install\\lib\\cmake\\llvm -DLLVM_VERSION=40 -DWITH_TESTS=OFF -DWITH_APPS=OFF -DWITH_TUTORIALS=OFF -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 14 Win64" ..
|
||||
MSBuild.exe /m:4 /t:Build /p:Configuration=Release .\\ALL_BUILD.vcxproj
|
||||
@endcode
|
||||
|
||||
## Build OpenCV with Halide backend
|
||||
When you build OpenCV add the following configuration flags:
|
||||
|
||||
- `WITH_HALIDE` - enable Halide linkage
|
||||
|
||||
- `HALIDE_ROOT_DIR` - path to Halide build directory
|
||||
|
||||
## Set Halide as a preferable backend
|
||||
@code
|
||||
net.setPreferableBackend(DNN_BACKEND_HALIDE);
|
||||
@endcode
|
||||
@@ -0,0 +1,82 @@
|
||||
# How to schedule your network for Halide backend {#tutorial_dnn_halide_scheduling}
|
||||
|
||||
## Introduction
|
||||
Halide code is the same for every device we use. But for achieving the satisfied
|
||||
efficiency we should schedule computations properly. In this tutorial we describe
|
||||
the ways to schedule your networks using Halide backend in OpenCV deep learning module.
|
||||
|
||||
For better understanding of Halide scheduling you might want to read tutorials @ http://halide-lang.org/tutorials.
|
||||
|
||||
If it's your first meeting with Halide in OpenCV, we recommend to start from @ref tutorial_dnn_halide.
|
||||
|
||||
## Configuration files
|
||||
You can schedule computations of Halide pipeline by writing textual configuration files.
|
||||
It means that you can easily vectorize, parallelize and manage loops order of
|
||||
layers computation. Pass path to file with scheduling directives for specific
|
||||
device into ```cv::dnn::Net::setHalideScheduler``` before the first ```cv::dnn::Net::forward``` call.
|
||||
|
||||
Scheduling configuration files represented as YAML files where each node is a
|
||||
scheduled function or a scheduling directive.
|
||||
@code
|
||||
relu1:
|
||||
reorder: [x, c, y]
|
||||
split: { y: 2, c: 8 }
|
||||
parallel: [yo, co]
|
||||
unroll: yi
|
||||
vectorize: { x: 4 }
|
||||
conv1_constant_exterior:
|
||||
compute_at: { relu1: yi }
|
||||
@endcode
|
||||
|
||||
Considered use variables `n` for batch dimension, `c` for channels,
|
||||
`y` for rows and `x` for columns. For variables after split are used names
|
||||
with the same prefix but `o` and `i` suffixes for outer and inner variables
|
||||
correspondingly. In example, for variable `x` in range `[0, 10)` directive
|
||||
`split: { x: 2 }` gives new ones `xo` in range `[0, 5)` and `xi` in range `[0, 2)`.
|
||||
Variable name `x` is no longer available in the same scheduling node.
|
||||
|
||||
You can find scheduling examples at [opencv_extra/testdata/dnn](https://github.com/opencv/opencv_extra/tree/master/testdata/dnn)
|
||||
and use it for schedule your networks.
|
||||
|
||||
## Layers fusing
|
||||
Thanks to layers fusing we can schedule only the top layers of fused sets.
|
||||
Because for every output value we use the fused formula.
|
||||
In example, if you have three layers Convolution + Scale + ReLU one by one,
|
||||
@code
|
||||
conv(x, y, c, n) = sum(...) + bias(c);
|
||||
scale(x, y, c, n) = conv(x, y, c, n) * weights(c);
|
||||
relu(x, y, c, n) = max(scale(x, y, c, n), 0);
|
||||
@endcode
|
||||
|
||||
fused function is something like
|
||||
@code
|
||||
relu(x, y, c, n) = max((sum(...) + bias(c)) * weights(c), 0);
|
||||
@endcode
|
||||
|
||||
So only function called `relu` require scheduling.
|
||||
|
||||
## Scheduling patterns
|
||||
Sometimes networks built using blocked structure that means some layer are
|
||||
identical or quite similar. If you want to apply the same scheduling for
|
||||
different layers accurate to tiling or vectorization factors, define scheduling
|
||||
patterns in section `patterns` at the beginning of scheduling file.
|
||||
Also, your patters may use some parametric variables.
|
||||
@code
|
||||
# At the beginning of the file
|
||||
patterns:
|
||||
fully_connected:
|
||||
split: { c: c_split }
|
||||
fuse: { src: [x, y, co], dst: block }
|
||||
parallel: block
|
||||
vectorize: { ci: c_split }
|
||||
# Somewhere below
|
||||
fc8:
|
||||
pattern: fully_connected
|
||||
params: { c_split: 8 }
|
||||
@endcode
|
||||
|
||||
## Automatic scheduling
|
||||
You can let DNN to schedule layers automatically. Just skip call of ```cv::dnn::Net::setHalideScheduler```. Sometimes it might be even more efficient than manual scheduling.
|
||||
But if specific layers require be scheduled manually, you would be able to
|
||||
mix both manual and automatic scheduling ways. Write scheduling file
|
||||
and skip layers that you want to be scheduled automatically.
|
||||
@@ -0,0 +1,44 @@
|
||||
# How to run deep networks in browser {#tutorial_dnn_javascript}
|
||||
|
||||
## Introduction
|
||||
This tutorial will show us how to run deep learning models using OpenCV.js right
|
||||
in a browser. Tutorial refers a sample of face detection and face recognition
|
||||
models pipeline.
|
||||
|
||||
## Face detection
|
||||
Face detection network gets BGR image as input and produces set of bounding boxes
|
||||
that might contain faces. All that we need is just select the boxes with a strong
|
||||
confidence.
|
||||
|
||||
## Face recognition
|
||||
Network is called OpenFace (project https://github.com/cmusatyalab/openface).
|
||||
Face recognition model receives RGB face image of size `96x96`. Then it returns
|
||||
`128`-dimensional unit vector that represents input face as a point on the unit
|
||||
multidimensional sphere. So difference between two faces is an angle between two
|
||||
output vectors.
|
||||
|
||||
## Sample
|
||||
All the sample is an HTML page that has JavaScript code to use OpenCV.js functionality.
|
||||
You may see an insertion of this page below. Press `Start` button to begin a demo.
|
||||
Press `Add a person` to name a person that is recognized as an unknown one.
|
||||
Next we'll discuss main parts of the code.
|
||||
|
||||
@htmlinclude js_face_recognition.html
|
||||
|
||||
-# Run face detection network to detect faces on input image.
|
||||
@snippet dnn/js_face_recognition.html Run face detection model
|
||||
You may play with input blob sizes to balance detection quality and efficiency.
|
||||
The bigger input blob the smaller faces may be detected.
|
||||
|
||||
-# Run face recognition network to receive `128`-dimensional unit feature vector by input face image.
|
||||
@snippet dnn/js_face_recognition.html Get 128 floating points feature vector
|
||||
|
||||
-# Perform a recognition.
|
||||
@snippet dnn/js_face_recognition.html Recognize
|
||||
Match a new feature vector with registered ones. Return a name of the best matched person.
|
||||
|
||||
-# The main loop.
|
||||
@snippet dnn/js_face_recognition.html Define frames processing
|
||||
A main loop of our application receives a frames from a camera and makes a recognition
|
||||
of an every detected face on the frame. We start this function ones when OpenCV.js was
|
||||
initialized and deep learning models were downloaded.
|
||||
@@ -0,0 +1,44 @@
|
||||
YOLO DNNs {#tutorial_dnn_yolo}
|
||||
===============================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
In this text you will learn how to use opencv_dnn module using yolo_object_detection (Sample of using OpenCV dnn module in real time with device capture, video and image).
|
||||
|
||||
We will demonstrate results of this example on the following picture.
|
||||

|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
VIDEO DEMO:
|
||||
@youtube{NHtRlndE2cg}
|
||||
|
||||
Source Code
|
||||
-----------
|
||||
|
||||
Use a universal sample for object detection models written
|
||||
[in C++](https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp) and
|
||||
[in Python](https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py) languages
|
||||
|
||||
Usage examples
|
||||
--------------
|
||||
|
||||
Execute in webcam:
|
||||
|
||||
@code{.bash}
|
||||
|
||||
$ example_dnn_object_detection --config=[PATH-TO-DARKNET]/cfg/yolo.cfg --model=[PATH-TO-DARKNET]/yolo.weights --classes=object_detection_classes_pascal_voc.txt --width=416 --height=416 --scale=0.00392 --rgb
|
||||
|
||||
@endcode
|
||||
|
||||
Execute with image or video file:
|
||||
|
||||
@code{.bash}
|
||||
|
||||
$ example_dnn_object_detection --config=[PATH-TO-DARKNET]/cfg/yolo.cfg --model=[PATH-TO-DARKNET]/yolo.weights --classes=object_detection_classes_pascal_voc.txt --width=416 --height=416 --scale=0.00392 --input=[PATH-TO-IMAGE-OR-VIDEO-FILE] --rgb
|
||||
|
||||
@endcode
|
||||
|
||||
Questions and suggestions email to: Alessandro de Oliveira Faria cabelo@opensuse.org or OpenCV Team.
|
||||
BIN
Lib/opencv/sources/doc/tutorials/dnn/dnn_yolo/images/yolo.jpg
Normal file
|
After Width: | Height: | Size: 210 KiB |
BIN
Lib/opencv/sources/doc/tutorials/dnn/images/lena_hed.jpg
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
Lib/opencv/sources/doc/tutorials/dnn/images/space_shuttle.jpg
Normal file
|
After Width: | Height: | Size: 27 KiB |
@@ -0,0 +1,58 @@
|
||||
Deep Neural Networks (dnn module) {#tutorial_table_of_content_dnn}
|
||||
=====================================
|
||||
|
||||
- @subpage tutorial_dnn_googlenet
|
||||
|
||||
*Compatibility:* \> OpenCV 3.3
|
||||
|
||||
*Author:* Vitaliy Lyudvichenko
|
||||
|
||||
In this tutorial you will learn how to use opencv_dnn module for image classification by using GoogLeNet trained network from Caffe model zoo.
|
||||
|
||||
- @subpage tutorial_dnn_halide
|
||||
|
||||
*Compatibility:* \> OpenCV 3.3
|
||||
|
||||
*Author:* Dmitry Kurtaev
|
||||
|
||||
This tutorial guidelines how to run your models in OpenCV deep learning module using Halide language backend.
|
||||
|
||||
- @subpage tutorial_dnn_halide_scheduling
|
||||
|
||||
*Compatibility:* \> OpenCV 3.3
|
||||
|
||||
*Author:* Dmitry Kurtaev
|
||||
|
||||
In this tutorial we describe the ways to schedule your networks using Halide backend in OpenCV deep learning module.
|
||||
|
||||
- @subpage tutorial_dnn_android
|
||||
|
||||
*Compatibility:* \> OpenCV 3.3
|
||||
|
||||
*Author:* Dmitry Kurtaev
|
||||
|
||||
This tutorial will show you how to run deep learning model using OpenCV on Android device.
|
||||
|
||||
- @subpage tutorial_dnn_yolo
|
||||
|
||||
*Compatibility:* \> OpenCV 3.3.1
|
||||
|
||||
*Author:* Alessandro de Oliveira Faria
|
||||
|
||||
In this tutorial you will learn how to use opencv_dnn module using yolo_object_detection with device capture, video file or image.
|
||||
|
||||
- @subpage tutorial_dnn_javascript
|
||||
|
||||
*Compatibility:* \> OpenCV 3.3.1
|
||||
|
||||
*Author:* Dmitry Kurtaev
|
||||
|
||||
In this tutorial we'll run deep learning models in browser using OpenCV.js.
|
||||
|
||||
- @subpage tutorial_dnn_custom_layers
|
||||
|
||||
*Compatibility:* \> OpenCV 3.4.1
|
||||
|
||||
*Author:* Dmitry Kurtaev
|
||||
|
||||
How to define custom layers to import networks.
|
||||
@@ -0,0 +1,173 @@
|
||||
AKAZE local features matching {#tutorial_akaze_matching}
|
||||
=============================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
In this tutorial we will learn how to use AKAZE @cite ANB13 local features to detect and match keypoints on
|
||||
two images.
|
||||
We will find keypoints on a pair of images with given homography matrix, match them and count the
|
||||
number of inliers (i.e. matches that fit in the given homography).
|
||||
|
||||
You can find expanded version of this example here:
|
||||
<https://github.com/pablofdezalc/test_kaze_akaze_opencv>
|
||||
|
||||
Data
|
||||
----
|
||||
|
||||
We are going to use images 1 and 3 from *Graffiti* sequence of [Oxford dataset](http://www.robots.ox.ac.uk/~vgg/data/data-aff.html).
|
||||
|
||||

|
||||
|
||||
Homography is given by a 3 by 3 matrix:
|
||||
@code{.none}
|
||||
7.6285898e-01 -2.9922929e-01 2.2567123e+02
|
||||
3.3443473e-01 1.0143901e+00 -7.6999973e+01
|
||||
3.4663091e-04 -1.4364524e-05 1.0000000e+00
|
||||
@endcode
|
||||
You can find the images (*graf1.png*, *graf3.png*) and homography (*H1to3p.xml*) in
|
||||
*opencv/samples/data/*.
|
||||
|
||||
### Source Code
|
||||
|
||||
@add_toggle_cpp
|
||||
- **Downloadable code**: Click
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/features2D/AKAZE_match.cpp)
|
||||
|
||||
- **Code at glance:**
|
||||
@include samples/cpp/tutorial_code/features2D/AKAZE_match.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
- **Downloadable code**: Click
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java)
|
||||
|
||||
- **Code at glance:**
|
||||
@include samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
- **Downloadable code**: Click
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py)
|
||||
|
||||
- **Code at glance:**
|
||||
@include samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py
|
||||
@end_toggle
|
||||
|
||||
### Explanation
|
||||
|
||||
- **Load images and homography**
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py load
|
||||
@end_toggle
|
||||
|
||||
We are loading grayscale images here. Homography is stored in the xml created with FileStorage.
|
||||
|
||||
- **Detect keypoints and compute descriptors using AKAZE**
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp AKAZE
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java AKAZE
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py AKAZE
|
||||
@end_toggle
|
||||
|
||||
We create AKAZE and detect and compute AKAZE keypoints and descriptors. Since we don't need the *mask*
|
||||
parameter, *noArray()* is used.
|
||||
|
||||
- **Use brute-force matcher to find 2-nn matches**
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp 2-nn matching
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java 2-nn matching
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py 2-nn matching
|
||||
@end_toggle
|
||||
|
||||
We use Hamming distance, because AKAZE uses binary descriptor by default.
|
||||
|
||||
- **Use 2-nn matches and ratio criterion to find correct keypoint matches**
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp ratio test filtering
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java ratio test filtering
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py ratio test filtering
|
||||
@end_toggle
|
||||
|
||||
If the closest match distance is significantly lower than the second closest one, then the match is correct (match is not ambiguous).
|
||||
|
||||
- **Check if our matches fit in the homography model**
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp homography check
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java homography check
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py homography check
|
||||
@end_toggle
|
||||
|
||||
If the distance from first keypoint's projection to the second keypoint is less than threshold,
|
||||
then it fits the homography model.
|
||||
|
||||
We create a new set of matches for the inliers, because it is required by the drawing function.
|
||||
|
||||
- **Output results**
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/features2D/AKAZE_match.cpp draw final matches
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/features2D/akaze_matching/AKAZEMatchDemo.java draw final matches
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/features2D/akaze_matching/AKAZE_match.py draw final matches
|
||||
@end_toggle
|
||||
|
||||
Here we save the resulting image and print some statistics.
|
||||
|
||||
Results
|
||||
-------
|
||||
|
||||
### Found matches
|
||||
|
||||

|
||||
|
||||
Depending on your OpenCV version, you should get results coherent with:
|
||||
|
||||
@code{.none}
|
||||
Keypoints 1: 2943
|
||||
Keypoints 2: 3511
|
||||
Matches: 447
|
||||
Inliers: 308
|
||||
Inlier Ratio: 0.689038
|
||||
@endcode
|
||||
|
After Width: | Height: | Size: 2.0 MiB |
|
After Width: | Height: | Size: 1.8 MiB |
@@ -0,0 +1,140 @@
|
||||
AKAZE and ORB planar tracking {#tutorial_akaze_tracking}
|
||||
=============================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
In this tutorial we will compare *AKAZE* and *ORB* local features using them to find matches between
|
||||
video frames and track object movements.
|
||||
|
||||
The algorithm is as follows:
|
||||
|
||||
- Detect and describe keypoints on the first frame, manually set object boundaries
|
||||
- For every next frame:
|
||||
-# Detect and describe keypoints
|
||||
-# Match them using bruteforce matcher
|
||||
-# Estimate homography transformation using RANSAC
|
||||
-# Filter inliers from all the matches
|
||||
-# Apply homography transformation to the bounding box to find the object
|
||||
-# Draw bounding box and inliers, compute inlier ratio as evaluation metric
|
||||
|
||||

|
||||
|
||||
Data
|
||||
----
|
||||
|
||||
To do the tracking we need a video and object position on the first frame.
|
||||
|
||||
You can download our example video and data from
|
||||
[here](https://docs.google.com/file/d/0B72G7D4snftJandBb0taLVJHMFk).
|
||||
|
||||
To run the code you have to specify input (camera id or video_file). Then, select a bounding box with the mouse, and press any key to start tracking
|
||||
@code{.none}
|
||||
./planar_tracking blais.mp4
|
||||
@endcode
|
||||
|
||||
Source Code
|
||||
-----------
|
||||
|
||||
@include cpp/tutorial_code/features2D/AKAZE_tracking/planar_tracking.cpp
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
### Tracker class
|
||||
|
||||
This class implements algorithm described abobve using given feature detector and descriptor
|
||||
matcher.
|
||||
|
||||
- **Setting up the first frame**
|
||||
@code{.cpp}
|
||||
void Tracker::setFirstFrame(const Mat frame, vector<Point2f> bb, string title, Stats& stats)
|
||||
{
|
||||
first_frame = frame.clone();
|
||||
(*detector)(first_frame, noArray(), first_kp, first_desc);
|
||||
stats.keypoints = (int)first_kp.size();
|
||||
drawBoundingBox(first_frame, bb);
|
||||
putText(first_frame, title, Point(0, 60), FONT_HERSHEY_PLAIN, 5, Scalar::all(0), 4);
|
||||
object_bb = bb;
|
||||
}
|
||||
@endcode
|
||||
We compute and store keypoints and descriptors from the first frame and prepare it for the
|
||||
output.
|
||||
|
||||
We need to save number of detected keypoints to make sure both detectors locate roughly the same
|
||||
number of those.
|
||||
|
||||
- **Processing frames**
|
||||
|
||||
-# Locate keypoints and compute descriptors
|
||||
@code{.cpp}
|
||||
(*detector)(frame, noArray(), kp, desc);
|
||||
@endcode
|
||||
|
||||
To find matches between frames we have to locate the keypoints first.
|
||||
|
||||
In this tutorial detectors are set up to find about 1000 keypoints on each frame.
|
||||
|
||||
-# Use 2-nn matcher to find correspondences
|
||||
@code{.cpp}
|
||||
matcher->knnMatch(first_desc, desc, matches, 2);
|
||||
for(unsigned i = 0; i < matches.size(); i++) {
|
||||
if(matches[i][0].distance < nn_match_ratio * matches[i][1].distance) {
|
||||
matched1.push_back(first_kp[matches[i][0].queryIdx]);
|
||||
matched2.push_back( kp[matches[i][0].trainIdx]);
|
||||
}
|
||||
}
|
||||
@endcode
|
||||
If the closest match is *nn_match_ratio* closer than the second closest one, then it's a
|
||||
match.
|
||||
|
||||
-# Use *RANSAC* to estimate homography transformation
|
||||
@code{.cpp}
|
||||
homography = findHomography(Points(matched1), Points(matched2),
|
||||
RANSAC, ransac_thresh, inlier_mask);
|
||||
@endcode
|
||||
If there are at least 4 matches we can use random sample consensus to estimate image
|
||||
transformation.
|
||||
|
||||
-# Save the inliers
|
||||
@code{.cpp}
|
||||
for(unsigned i = 0; i < matched1.size(); i++) {
|
||||
if(inlier_mask.at<uchar>(i)) {
|
||||
int new_i = static_cast<int>(inliers1.size());
|
||||
inliers1.push_back(matched1[i]);
|
||||
inliers2.push_back(matched2[i]);
|
||||
inlier_matches.push_back(DMatch(new_i, new_i, 0));
|
||||
}
|
||||
}
|
||||
@endcode
|
||||
Since *findHomography* computes the inliers we only have to save the chosen points and
|
||||
matches.
|
||||
|
||||
-# Project object bounding box
|
||||
@code{.cpp}
|
||||
perspectiveTransform(object_bb, new_bb, homography);
|
||||
@endcode
|
||||
|
||||
If there is a reasonable number of inliers we can use estimated transformation to locate the
|
||||
object.
|
||||
|
||||
Results
|
||||
-------
|
||||
|
||||
You can watch the resulting [video on youtube](http://www.youtube.com/watch?v=LWY-w8AGGhE).
|
||||
|
||||
*AKAZE* statistics:
|
||||
@code{.none}
|
||||
Matches 626
|
||||
Inliers 410
|
||||
Inlier ratio 0.58
|
||||
Keypoints 1117
|
||||
@endcode
|
||||
|
||||
*ORB* statistics:
|
||||
@code{.none}
|
||||
Matches 504
|
||||
Inliers 319
|
||||
Inlier ratio 0.56
|
||||
Keypoints 1112
|
||||
@endcode
|
||||
|
After Width: | Height: | Size: 318 KiB |
@@ -0,0 +1,52 @@
|
||||
Detection of planar objects {#tutorial_detection_of_planar_objects}
|
||||
===========================
|
||||
|
||||
The goal of this tutorial is to learn how to use *features2d* and *calib3d* modules for detecting
|
||||
known planar objects in scenes.
|
||||
|
||||
*Test data*: use images in your data folder, for instance, box.png and box_in_scene.png.
|
||||
|
||||
- Create a new console project. Read two input images. :
|
||||
|
||||
Mat img1 = imread(argv[1], IMREAD_GRAYSCALE);
|
||||
Mat img2 = imread(argv[2], IMREAD_GRAYSCALE);
|
||||
|
||||
- Detect keypoints in both images and compute descriptors for each of the keypoints. :
|
||||
|
||||
// detecting keypoints
|
||||
Ptr<Feature2D> surf = SURF::create();
|
||||
vector<KeyPoint> keypoints1;
|
||||
Mat descriptors1;
|
||||
surf->detectAndCompute(img1, Mat(), keypoints1, descriptors1);
|
||||
|
||||
... // do the same for the second image
|
||||
|
||||
- Now, find the closest matches between descriptors from the first image to the second: :
|
||||
|
||||
// matching descriptors
|
||||
BruteForceMatcher<L2<float> > matcher;
|
||||
vector<DMatch> matches;
|
||||
matcher.match(descriptors1, descriptors2, matches);
|
||||
|
||||
- Visualize the results: :
|
||||
|
||||
// drawing the results
|
||||
namedWindow("matches", 1);
|
||||
Mat img_matches;
|
||||
drawMatches(img1, keypoints1, img2, keypoints2, matches, img_matches);
|
||||
imshow("matches", img_matches);
|
||||
waitKey(0);
|
||||
|
||||
- Find the homography transformation between two sets of points: :
|
||||
|
||||
vector<Point2f> points1, points2;
|
||||
// fill the arrays with the points
|
||||
....
|
||||
Mat H = findHomography(Mat(points1), Mat(points2), RANSAC, ransacReprojThreshold);
|
||||
|
||||
- Create a set of inlier matches and draw them. Use perspectiveTransform function to map points
|
||||
with homography:
|
||||
|
||||
Mat points1Projected; perspectiveTransform(Mat(points1), points1Projected, H);
|
||||
|
||||
- Use drawMatches for drawing inliers.
|
||||
@@ -0,0 +1,51 @@
|
||||
Feature Description {#tutorial_feature_description}
|
||||
===================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this tutorial you will learn how to:
|
||||
|
||||
- Use the @ref cv::DescriptorExtractor interface in order to find the feature vector correspondent
|
||||
to the keypoints. Specifically:
|
||||
- Use cv::xfeatures2d::SURF and its function cv::xfeatures2d::SURF::compute to perform the
|
||||
required calculations.
|
||||
- Use a @ref cv::DescriptorMatcher to match the features vector
|
||||
- Use the function @ref cv::drawMatches to draw the detected matches.
|
||||
|
||||
\warning You need the <a href="https://github.com/opencv/opencv_contrib">OpenCV contrib modules</a> to be able to use the SURF features
|
||||
(alternatives are ORB, KAZE, ... features).
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
@add_toggle_cpp
|
||||
This tutorial code's is shown lines below. You can also download it from
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/features2D/feature_description/SURF_matching_Demo.cpp)
|
||||
@include samples/cpp/tutorial_code/features2D/feature_description/SURF_matching_Demo.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
This tutorial code's is shown lines below. You can also download it from
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/features2D/feature_description/SURFMatchingDemo.java)
|
||||
@include samples/java/tutorial_code/features2D/feature_description/SURFMatchingDemo.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
This tutorial code's is shown lines below. You can also download it from
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/features2D/feature_description/SURF_matching_Demo.py)
|
||||
@include samples/python/tutorial_code/features2D/feature_description/SURF_matching_Demo.py
|
||||
@end_toggle
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||
Here is the result after applying the BruteForce matcher between the two original images:
|
||||
|
||||

|
||||