OpenCV uses forEach for parallel pixel access

Parallel pixel access using forEach in OpenCV

In this tutorial, we will compare the performance of the forEach method of the Mat class and other ways to access and convert pixel values in OpenCV. We will show how to use the foreat method much more efficiently than the foreat method.

There are hidden gemstones in OpenCV, which are sometimes not very well-known. One of the hidden gems is the forEach method of the Mat class, which uses all the cores on the machine to apply any function on each pixel.

Let's define a function complexThreshold first. It takes an RGB pixel value and applies a complex threshold to it.

 1 // Define a pixel 
 2 typedef Point3_<uint8_t> Pixel;
 3 
 4 // A complicated threshold is defined so 
 5 // a non-trivial amount of computation 
 6 // is done at each pixel. 
 7 void complicatedThreshold(Pixel &pixel)
 8 {
 9   if (pow(double(pixel.x)/10,2.5) > 100)
10   {
11     pixel.x = 255;
12     pixel.y = 255;
13     pixel.z = 255;
14   }
15   else
16   {
17     pixel.x = 0;
18     pixel.y = 0;
19     pixel.z = 0;
20   }
21 }

This function is computationally much heavier than a simple threshold. In this way, we not only test the pixel access time, but also how forEach uses all kernels when the computation of each pixel operation is large.

Next, we will apply this function to each pixel in the image in four different ways and check the related performance.

Method 1: naive pixel access using at method

The Mat class has a convenient way to access pixels at positions (rows, columns) in an image. The following code uses the at method to access each pixel and apply complex thresholds to it.

 1 // Naive pixel access
 2 // Loop over all rows
 3 for (int r = 0; r < image.rows; r++)
 4 {
 5   // Loop over all columns
 6   for ( int c = 0; c < image.cols; c++)
 7   {
 8     // Obtain pixel at (r, c)
 9     Pixel pixel = image.at<Pixel>(r, c);
10     // Apply complicatedTreshold
11     complicatedThreshold(pixel);
12     // Put result back
13     image.at<Pixel>(r, c) = pixel;
14   }
15 }

The above method is considered inefficient because every time we call the at method, the position of pixels in memory is being calculated. This involves multiplication operations. The fact that pixels are located in contiguous memory blocks is not used.

Method 2: pixel access using pointer algorithm

In OpenCV, all pixels in a row are stored in a continuous block of memory. If a Mat object is created using create, all pixels are stored in a contiguous block of memory. Since we are reading an image from disk and imread uses the Create method, we can simply traverse all pixels using pointer operations that do not require multiplication.

The code is shown below.

 1 // Using pointer arithmetic
 2 
 3 // Get pointer to first pixel
 4 Pixel* pixel = image1.ptr<Pixel>(0,0);
 5 
 6 // Mat objects created using the create method are stored
 7 // in one continous memory block.
 8 const Pixel* endPixel = pixel + image1.cols * image1.rows;
 9 // Loop over all pixels
10 for (; pixel != endPixel; pixel++)
11 {
12   complicatedThreshold(*pixel);
13 }

Method 3: use forEach

The forEach method of the Mat class accepts a function operator. Usage is

void cv::Mat::forEach   (const Functor &operation)  

The easiest way to understand the above usage is through the following example. We define a function object (Operator) for forEach.

1 // Parallel execution with function object.
2 struct Operator
3 {
4   void operator ()(Pixel &pixel, const int * position) const
5   {
6     // Perform a simple threshold operation
7     complicatedThreshold(pixel);
8   }
9 };

Calling forEach is simple and requires only one line of code

1 // Call forEach
2 image2.forEach<Pixel>(Operator());

Method 4: use forEach in C ++ 11 Lambda

1 image3.forEach<Pixel>
2 (
3   [](Pixel &pixel, const int * position) -> void
4   {
5     complicatedThreshold(pixel);
6   }
7 );

Compare the performance of forEach

The complex threshold function is applied to all pixels of a large image with a size of 9000 x 6750 five times in a row. The 2.5 GHz Intel Core i7 processor used in the experiment has four cores. The following time has been obtained. Note that using forEach is five times faster than using Naive Pixel Access or Pointer Arithmetic methods.

Method Type Time ( milliseconds )
Naive Pixel Access 6656
Pointer Arithmetic 6575
forEach 1221
forEach (C++11 Lambda) 1272

I have written code in OpenCV for more than ten years. Whenever I have to write optimization code to access pixels, I will use pointer algorithm instead of naive method. However, when writing this blog post, I was surprised to find that even for large pictures, there seems to be no difference between the two methods.

Full code:

  1 // Include OpenCV header
  2 #include <opencv2/opencv.hpp>
  3 
  4 // Use cv and std namespaces
  5 using namespace cv;
  6 using namespace std;
  7 
  8 // Define a pixel
  9 typedef Point3_<uint8_t> Pixel;
 10 
 11 // tic is called to start timer
 12 void tic(double &t)
 13 {
 14     t = (double)getTickCount();
 15 }
 16 
 17 // toc is called to end timer
 18 double toc(double &t)
 19 {
 20     return ((double)getTickCount() - t) / getTickFrequency();
 21 }
 22 
 23 void complicatedThreshold(Pixel &pixel)
 24 {
 25     if (pow(double(pixel.x) / 10, 2.5) > 100)
 26     {
 27         pixel.x = 255;
 28         pixel.y = 255;
 29         pixel.z = 255;
 30     }
 31     else
 32     {
 33         pixel.x = 0;
 34         pixel.y = 0;
 35         pixel.z = 0;
 36     }
 37 }
 38 
 39 
 40 
 41 // Parallel execution with function object.
 42 struct Operator
 43 {
 44     void operator ()(Pixel &pixel, const int * position) const
 45     {
 46         // Perform a simple threshold operation
 47         complicatedThreshold(pixel);
 48     }
 49 };
 50 
 51 
 52 int main(int argc, char** argv)
 53 {
 54     // Read image
 55     Mat image = imread("butterfly.jpg");
 56 
 57     // Scale image 30x
 58     resize(image, image, Size(), 30, 30);
 59 
 60     // Print image size
 61     cout << "Image size " << image.size() << endl;
 62 
 63     // Number of trials
 64     int numTrials = 5;
 65 
 66     // Print number of trials
 67     cout << "Number of trials : " << numTrials << endl;
 68 
 69     // Make two copies
 70     Mat image1 = image.clone();
 71     Mat image2 = image.clone();
 72     Mat image3 = image.clone();
 73 
 74     // Start timer
 75     double t;
 76     tic(t);
 77 
 78     for (int n = 0; n < numTrials; n++)
 79     {
 80         // Naive pixel access
 81         // Loop over all rows
 82         for (int r = 0; r < image.rows; r++)
 83         {
 84             // Loop over all columns
 85             for (int c = 0; c < image.cols; c++)
 86             {
 87                 // Obtain pixel at (r, c)
 88                 Pixel pixel = image.at<Pixel>(r, c);
 89                 // Apply complicatedTreshold
 90                 complicatedThreshold(pixel);
 91                 // Put result back
 92                 image.at<Pixel>(r, c) = pixel;
 93             }
 94 
 95         }
 96     }
 97 
 98     cout << "Naive way: " << toc(t) << endl;
 99 
100 
101     // Start timer
102     tic(t);
103 
104     // image1 is guaranteed to be continous, but
105     // if you are curious uncomment the line below
106     // cout << "Image 1 is continous : " << image1.isContinuous() << endl;
107 
108     for (int n = 0; n < numTrials; n++)
109     {
110         // Get pointer to first pixel
111         Pixel* pixel = image1.ptr<Pixel>(0, 0);
112 
113         // Mat objects created using the create method are stored
114         // in one continous memory block.
115         const Pixel* endPixel = pixel + image1.cols * image1.rows;
116 
117         // Loop over all pixels
118         for (; pixel != endPixel; pixel++)
119         {
120             complicatedThreshold(*pixel);
121         }
122 
123 
124     }
125     cout << "Pointer Arithmetic " << toc(t) << endl;
126     tic(t);
127 
128     for (int n = 0; n < numTrials; n++)
129     {
130         image2.forEach<Pixel>(Operator());
131     }
132     cout << "forEach : " << toc(t) << endl;
133 
134 #if __cplusplus >= 201103L || (__cplusplus < 200000 && __cplusplus > 199711L)
135     tic(t);
136 
137     for (int n = 0; n < numTrials; n++)
138     {
139         // Parallel execution using C++11 lambda.
140         image3.forEach<Pixel>
141             (
142                 [](Pixel &pixel, const int * position) -> void
143         {
144             complicatedThreshold(pixel);
145         }
146         );
147     }
148     cout << "forEach C++11 : " << toc(t) << endl;
149 
150 #endif
151 
152     return EXIT_SUCCESS;
153 }

 

Tags: OpenCV

Posted by gnize on Wed, 11 May 2022 15:12:08 +0300