Object search, detection and recognition using OpenCV

In this tutorial, we will understand the important concept called "selective search" in object detection. We will also share OpenCV code with C + + and Python.

Object detection and object recognition

The object recognition algorithm recognizes which objects exist in the image. It takes the whole image as the input and outputs the class label and class probability of the objects existing in the image. For example, the class label can be "dog", and the related class probability can be 97%.

On the other hand, the object detection algorithm not only tells you which objects exist in the image, but also outputs boundary boxes (x, y, width, height) to indicate the position of objects in the image.

The core of all object detection algorithms is object recognition algorithm. Suppose we train an object recognition model that recognizes dogs in image patches. The model will judge whether there is a dog in the image. It does not tell the location of the object.

In order to localize objects, we must select sub regions (blocks) of the image, and then apply the object recognition algorithm to these image blocks. The position of the object is given by the position of the image block, and the class probability returned by the object recognition algorithm is high.


The most direct method to generate smaller areas (patches) is called sliding window method. However, the sliding window method has several limitations. A class of algorithms called "region proposal" algorithm overcomes these limitations. Selective search is one of the most popular regional proposal algorithms.

Sliding window algorithm

In the sliding window method, we slide the frame or window on the image to select the patch, and use the object recognition model to classify each image patch covered by the window. It is a detailed search of objects on the whole image. We need to search not only all possible positions in the image, but also different proportions. This is because the object recognition model is usually trained at a specific scale (or scale range). This leads to the classification of tens of thousands of image blocks.

The problem does not end here. The sliding window method is suitable for objects with fixed aspect ratio, such as faces or pedestrians. An image is a 2D projection of a 3D object. Object features such as aspect ratio and shape will vary greatly according to the angle of the captured image. The sliding window method is computationally expensive when we search for multiple aspect ratios.

Regional proposal algorithm

The problem we have discussed so far can be solved by using the regional proposal algorithm. These methods use the image as an input and output bounding box, corresponding to all patches in the image that are most likely to be objects. These regional proposals may be noisy, overlapping and may not fully contain objects, but in these regional proposals, there will be a proposal very close to the actual objects in the image. Then we can use the object recognition model to classify these proposals. An area proposal with a high probability score is the location of the object.


The region proposal algorithm uses segmentation to identify the expected object in the image. In segmentation, we group adjacent regions similar to each other based on some criteria (such as color, texture, etc.). Unlike our sliding window method of finding objects at all pixel positions and at all scales, the region recommendation algorithm works by grouping pixels into a smaller number of segments. Therefore, the number of final proposals generated is many times less than that of the sliding window method. This reduces the number of image patches we have to classify. These generated area proposals have different proportions and aspect ratios.

An important feature of the regional proposal method is its very high recall rate. It's just a strange statement that the area containing the object we're looking for must be on our regional proposal list. To achieve this, our region proposal list may eventually contain many regions that do not contain any objects. In other words, the region proposal algorithm can produce a large number of false positives, as long as it can capture all real positive numbers. Most of these false positives will be rejected by the object recognition algorithm. When we have more false positives and the accuracy is slightly affected, the time required for detection will increase. However, a high recall rate is still a good idea, because alternatives that Miss areas containing actual objects can seriously affect the detection rate.

Several regional approaches have been proposed, such as

  • Objectiveness
  • Constraint parameter minimum cut for automatic object segmentation
  • Category independent object proposal
  • Random Prim
  • Selective search

Of all these regional proposal methods, selective search is the most commonly used because of its speed and high recall rate.

Selective search object recognition

What is selective search?

Selective search is a region proposal algorithm for object detection. It is designed to be fast and has a high recall rate. It is based on calculating the hierarchical grouping of similar areas according to color, texture, size and shape compatibility.

Selective search first uses the image-based segmentation method of Felzenszwalb and Huttenlocher to over segment the image according to the intensity of pixels. The output of the algorithm is shown below. The image on the right contains segmented areas represented in solid color.


Do you use segments as regional proposals? The answer is no. We can't do this for two reasons:

Most of the actual objects in the original image contain 2 or more segmented parts
Using this method does not produce recommendations for areas used to close objects, such as plates covered by cups or cups containing coffee
If we try to solve the first problem by further merging adjacent regions similar to each other, we will eventually get a segmented region covering two objects.

Perfect segmentation is not our goal. We just want to predict many regional proposals so that some of them should have a very high overlap with the actual object.

Selective search uses surveys from Felzenszwalb and Huttenlocher methods as initial seeds. An over annotated image looks like this.


The selective search algorithm takes these excesses as initial input and performs the following steps

Add all bounding boxes corresponding to the segmented subassembly to the area proposal list
Adjacent segments are grouped according to similarity
Go to step 1
In each iteration, a larger segment is formed and added to the regional proposal list. Therefore, we create regional proposals from smaller segments to larger segments in a bottom-up manner. This is what we call using Felzenszwalb and Huttenlocher's plan to calculate "layered" segments.


This figure shows the initial, intermediate and final steps of the hierarchical segmentation process.

be similar

Let's explore in depth how to calculate the similarity between two regions.

Selective search uses four similarity measures based on color, texture, size and shape compatibility.

Similar color

Calculate the color histogram of 25 intervals for each channel of the image, and connect the histograms of all channels to obtain the color descriptor to obtain 25 × 3 = 75 dimensional color descriptor.

The color similarity of the two regions is based on the intersection of histograms and can be calculated as:


Is the histogram value of bin in the second color descriptor

Texture similarity

The texture features are calculated by extracting Gaussian derivatives in 8 directions for each channel. For each direction and each color channel, calculate the histogram of 10 boxes to obtain 10x8x3 = 240 dimensional feature descriptor.

Histogram crossing is also used to calculate the texture similarity of the two regions.

Similar size

Size similarity encourages smaller regions to merge as soon as possible. It ensures that all scale region proposals are formed in all parts of the image. If this similarity measure is not considered, a single region will devour all smaller adjacent regions one by one, so multiple scale region proposals are generated only at this location. Size similarity is defined as:


Where Size(im) is the size of the image, in pixels.

Shape compatibility

Shape compatibility measures the degree to which two regions (Ri and Rj) fit each other. If R_I fit r_j we want to merge them to fill the gap. If they don't even touch each other, they shouldn't merge.

Shape compatibility is defined as:


Where Size(BBijj) is a bounding box Ri and Rj.

Final similarity

The final similarity between the two regions is defined as the linear combination of the above four similarities.

Neutralization is two regions or sections in an image and indicates whether similarity measures are used.


result

The selective search implementation in OpenCV provides thousands of regional proposals in descending order of objectiveness. For clarity, we share the results with the 200-250 boxes above the image. Generally speaking, 1000-1200 proposals are enough to obtain all the correct regional proposals.

Let's see how we can use segmentation based on selective search in OpenCV.

Selective search: C++

The following code is a C + + tutorial for selective search using OpenCV. Please read the comments carefully to understand the code.

#include "opencv2/ximgproc/segmentation.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/core.hpp"
#include "opencv2/imgproc.hpp"
#include <iostream>
#include <ctime>
using namespace cv;
using namespace cv::ximgproc::segmentation;
static void help() {
std::cout << std::endl <<
"Usage:" << std::endl <<
"./ssearch input_image (f|q)" << std::endl <<
"f=fast, q=quality" << std::endl <<
"Use l to display less rects, m to display more rects, q to quit" << std::endl;
}
int main(int argc, char** argv) {
// If image path and f/q is not passed as command
// line arguments, quit and display help message
if (argc < 3) {
help();
return -1;
}
// speed-up using multithreads
setUseOptimized(true);
setNumThreads(4);
// read image
Mat im = imread(argv[1]);
// resize image
int newHeight = 200;
int newWidth = im.cols*newHeight/im.rows;
resize(im, im, Size(newWidth, newHeight));
// create Selective Search Segmentation Object using default parameters
Ptr<SelectiveSearchSegmentation> ss = createSelectiveSearchSegmentation();
// set input image on which we will run segmentation
ss->setBaseImage(im);
// Switch to fast but low recall Selective Search method
if (argv[2][0] == 'f') {
ss->switchToSelectiveSearchFast();
}
// Switch to high recall but slow Selective Search method
else if (argv[2][0] == 'q') {
ss->switchToSelectiveSearchQuality();
}
// if argument is neither f nor q print help message
else {
help();
return -2;
}
// run selective search segmentation on input image
std::vector<Rect> rects;
ss->process(rects);
std::cout << "Total Number of Region Proposals: " << rects.size() << std::endl;
// number of region proposals to show
int numShowRects = 100;
// increment to increase/decrease total number
// of reason proposals to be shown
int increment = 50;
while(1) {
// create a copy of original image
Mat imOut = im.clone();
// itereate over all the region proposals
for(int i = 0; i < rects.size(); i++) {
if (i < numShowRects) {
rectangle(imOut, rects[i], Scalar(0, 255, 0));
}
else {
break;
}
}
// show output
imshow("Output", imOut);
// record key press
int k = waitKey();
// m is pressed
if (k == 109) {
// increase total number of rectangles to show by increment
numShowRects += increment;
}
// l is pressed
else if (k == 108 && numShowRects > increment) {
// decrease total number of rectangles to show by increment
numShowRects -= increment;
}
// q is pressed
else if (k == 113) {
break;
}
}
return 0;
}

Selective search: Python

The following code is a Python tutorial for selective search using OpenCV 3.3. Note the error alert for OpenCV 3.2 mentioned after the code block. Please read the comments carefully to understand the code.

#!/usr/bin/env python
'''
Usage: ./ssearch.py input_image (f|q)
f=fast, q=quality
Use "l" to display less rects, 'm' to display more rects, "q" to quit.
'''
import sys
import cv2
if __name__ == '__main__':
# If image path and f/q is not passed as command
# line arguments, quit and display help message
if len(sys.argv) < 3:
print(__doc__)
sys.exit(1)
# speed-up using multithreads
cv2.setUseOptimized(True);
cv2.setNumThreads(4);
# read image
im = cv2.imread(sys.argv[1])
# resize image
newHeight = 200
newWidth = int(im.shape[1]*200/im.shape[0])
im = cv2.resize(im, (newWidth, newHeight))
# create Selective Search Segmentation Object using default parameters
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
# set input image on which we will run segmentation
ss.setBaseImage(im)
# Switch to fast but low recall Selective Search method
if (sys.argv[2] == 'f'):
ss.switchToSelectiveSearchFast()
# Switch to high recall but slow Selective Search method
elif (sys.argv[2] == 'q'):
ss.switchToSelectiveSearchQuality()
# if argument is neither f nor q print help message
else:
print(__doc__)
sys.exit(1)
# run selective search segmentation on input image
rects = ss.process()
print('Total Number of Region Proposals: {}'.format(len(rects)))
# number of region proposals to show
numShowRects = 100
# increment to increase/decrease total number
# of reason proposals to be shown
increment = 50
while True:
# create a copy of original image
imOut = im.copy()
# itereate over all the region proposals
for i, rect in enumerate(rects):
# draw rectangle for region proposal till numShowRects
if (i < numShowRects):
x, y, w, h = rect
cv2.rectangle(imOut, (x, y), (x+w, y+h), (0, 255, 0), 1, cv2.LINE_AA)
else:
break
# show output
cv2.imshow("Output", imOut)
# record key press
k = cv2.waitKey(0) & 0xFF
# m is pressed
if k == 109:
# increase total number of rectangles to show by increment
numShowRects += increment
# l is pressed
elif k == 108 and numShowRects > increment:
# decrease total number of rectangles to show by increment
numShowRects -= increment
# q is pressed
elif k == 113:
break
# close image show window
cv2.destroyAllWindows()

Tags: Algorithm Machine Learning Computer Vision

Posted by master82 on Sun, 15 May 2022 11:40:45 +0300