# How do I speed up my calculations > 150 times

My original document: https://www.yuque.com/lart/blog/lwgt38

The book continues from "how did I speed up computing time by 25.6 times"

The previous article mentioned that F-measure can further accelerate the calculation by using cumulative histogram, but E-measure has not been changed On the night of writing the last article, after reorganizing my thinking, I seemed to think of how to use the cumulative histogram to speed up again

## Speed constraints

Although using the idea of "decoupling" can efficiently optimize the calculation process of indicators under each threshold, the overall for loop will still take a lot of time Considering that the calculation under each threshold is not very relevant in fact, if it can be calculated at the same time, it will further improve the speed Here, we will return to the strategy of calculating the brilliant cumulative histogram when calculating F-measure

After the previous decoupling, the key variable actually obtained is fg_fg_numel and fg_bg_numel .

```fg_fg_numel = np.count_nonzero(binarized_pred & gt)
fg_bg_numel = np.count_nonzero(binarized_pred & ~gt)
```

Starting from the two variables themselves, if the cumulative histogram is used, the number of foreground pixels (value 1) under > = different thresholds, the essence of calculation and NP can be obtained at the same time count_ Nonzero is the same thing So we can make intuitive replacement:

```"""
Naming rules for function internal variables:
pred attribute(prospect fg,background bg)_gt attribute(prospect fg,background bg)_Variable meaning
If only pred perhaps gt，Then another corresponding attribute location is used`_`replace
"""
fg_fg_hist, _ = np.histogram(pred[gt], bins=bins)
fg_bg_hist, _ = np.histogram(pred[~gt], bins=bins)
fg_fg_numel_w_thrs = np.cumsum(np.flip(fg_fg_hist), axis=0)
fg_bg_numel_w_thrs = np.cumsum(np.flip(fg_bg_hist), axis=0)
```

In this way, we get a series of FG corresponding to different thresholds_ fg_ Numel and fg_bg_numel It should be noted here that the setting of bins used to divide the interval Since the default histogram division interval will contain the last endpoint, a more reasonable division is bins = NP Linspace (0, 256, 257), so that the last interval is [255, 256], which can contain the maximum value without repeated counting with 254

For the convenience of calculation, the pred foreground statistics FG that will be used later will be used here___ numel_ w_ THRs and background statistics bg____numel_w_thrs is written directly for easy use:

```fg___numel_w_thrs = fg_fg_numel_w_thrs + fg_bg_numel_w_thrs
bg___numel_w_thrs = self.gt_size - fg___numel_w_thrs
```

The following steps are basically the same as before. numpy's broadcast mechanism does not need to be changed too much This method is actually used to extract multiple parts of the code

```def generate_parts_numel_combinations(self, fg_fg_numel, fg_bg_numel, pred_fg_numel, pred_bg_numel):
bg_fg_numel = self.gt_fg_numel - fg_fg_numel
bg_bg_numel = pred_bg_numel - bg_fg_numel

parts_numel = [fg_fg_numel, fg_bg_numel, bg_fg_numel, bg_bg_numel]

mean_pred_value = pred_fg_numel / self.gt_size
mean_gt_value = self.gt_fg_numel / self.gt_size

demeaned_pred_fg_value = 1 - mean_pred_value
demeaned_pred_bg_value = 0 - mean_pred_value
demeaned_gt_fg_value = 1 - mean_gt_value
demeaned_gt_bg_value = 0 - mean_gt_value

combinations = [
(demeaned_pred_fg_value, demeaned_gt_fg_value),
(demeaned_pred_fg_value, demeaned_gt_bg_value),
(demeaned_pred_bg_value, demeaned_gt_fg_value),
(demeaned_pred_bg_value, demeaned_gt_bg_value)
]
return parts_numel, combinations
```

Enhanced is calculated later_ matrix_ It's natural to write the part of sum:

```parts_numel_w_thrs, combinations = self.generate_parts_numel_combinations(
fg_fg_numel=fg_fg_numel_w_thrs, fg_bg_numel=fg_bg_numel_w_thrs,
pred_fg_numel=fg___numel_w_thrs, pred_bg_numel=bg___numel_w_thrs,
)

# Here, although you can use a list to collect individual results_part, but the list needs to be converted into a numpy array for summation. It's better to apply for space at one time and load it directly later
results_parts = np.empty(shape=(4, 256), dtype=np.float64)
for i, (part_numel, combination) in enumerate(zip(parts_numel_w_thrs, combinations)):
align_matrix_value = 2 * (combination[0] * combination[1]) / \
(combination[0] ** 2 + combination[1] ** 2 + _EPS)
enhanced_matrix_value = (align_matrix_value + 1) ** 2 / 4
results_parts[i] = enhanced_matrix_value * part_numel
enhanced_matrix_sum = results_parts.sum(axis=0)
```

## Overall combing

The main logic has been completed. The next step is to integrate these codes with the original code, that is, to integrate the original code_ em_ with_ Threshold and cal_enhanced_matrix has two methods

```def cal_em_with_threshold(self, pred: np.ndarray, gt: np.ndarray, threshold: float) -> float:
binarized_pred = pred >= threshold

if self.gt_fg_numel == 0:
binarized_pred_bg_numel = np.count_nonzero(~binarized_pred)
enhanced_matrix_sum = binarized_pred_bg_numel
elif self.gt_fg_numel == self.gt_size:
binarized_pred_fg_numel = np.count_nonzero(binarized_pred)
enhanced_matrix_sum = binarized_pred_fg_numel
else:
enhanced_matrix_sum = self.cal_enhanced_matrix(binarized_pred, gt)
em = enhanced_matrix_sum / (self.gt_size - 1 + _EPS)
return em
```

Combined with the statistical values of the front and background elements under each threshold calculated in the previous code, the above code here can actually be simplified by using the existing operation results, that is, the first two branches of if In addition, the threshold division does not need explicit processing, because it has been done in the cumulative histogram Therefore, the code here can be incorporated into cal in the case of dynamic threshold calculation_ enhanced_ In the calculation process of matrix Get the final integrated method directly:

```def cal_em_with_cumsumhistogram(self, pred: np.ndarray, gt: np.ndarray) -> np.ndarray:
"""
Naming rules for function internal variables:
pred attribute(prospect fg,background bg)_gt attribute(prospect fg,background bg)_Variable meaning
If only pred perhaps gt，Then another corresponding attribute location is used`_`replace
"""
pred = (pred * 255).astype(np.uint8)
bins = np.linspace(0, 256, 257)
fg_fg_hist, _ = np.histogram(pred[gt], bins=bins)
fg_bg_hist, _ = np.histogram(pred[~gt], bins=bins)
fg_fg_numel_w_thrs = np.cumsum(np.flip(fg_fg_hist), axis=0)
fg_bg_numel_w_thrs = np.cumsum(np.flip(fg_bg_hist), axis=0)

fg___numel_w_thrs = fg_fg_numel_w_thrs + fg_bg_numel_w_thrs
bg___numel_w_thrs = self.gt_size - fg___numel_w_thrs

if self.gt_fg_numel == 0:
enhanced_matrix_sum = bg___numel_w_thrs
elif self.gt_fg_numel == self.gt_size:
enhanced_matrix_sum = fg___numel_w_thrs
else:
parts_numel_w_thrs, combinations = self.generate_parts_numel_combinations(
fg_fg_numel=fg_fg_numel_w_thrs, fg_bg_numel=fg_bg_numel_w_thrs,
pred_fg_numel=fg___numel_w_thrs, pred_bg_numel=bg___numel_w_thrs,
)

results_parts = np.empty(shape=(4, 256), dtype=np.float64)
for i, (part_numel, combination) in enumerate(zip(parts_numel_w_thrs, combinations)):
align_matrix_value = 2 * (combination[0] * combination[1]) / \
(combination[0] ** 2 + combination[1] ** 2 + _EPS)
enhanced_matrix_value = (align_matrix_value + 1) ** 2 / 4
results_parts[i] = enhanced_matrix_value * part_numel
enhanced_matrix_sum = results_parts.sum(axis=0)

em = enhanced_matrix_sum / (self.gt_size - 1 + _EPS)
return em
```

Or for reuse, cal_em_with_threshold (this method needs to be retained, because there is another calculation case of E-measure that needs this method) can be reconstructed:

```def cal_em_with_threshold(self, pred: np.ndarray, gt: np.ndarray, threshold: float) -> float:
"""
Naming rules for function internal variables:
pred attribute(prospect fg,background bg)_gt attribute(prospect fg,background bg)_Variable meaning
If only pred perhaps gt，Then another corresponding attribute location is used`_`replace
"""
binarized_pred = pred >= threshold
fg_fg_numel = np.count_nonzero(binarized_pred & gt)
fg_bg_numel = np.count_nonzero(binarized_pred & ~gt)

fg___numel = fg_fg_numel + fg_bg_numel
bg___numel = self.gt_size - fg___numel

if self.gt_fg_numel == 0:
enhanced_matrix_sum = bg___numel
elif self.gt_fg_numel == self.gt_size:
enhanced_matrix_sum = fg___numel
else:
parts_numel, combinations = self.generate_parts_numel_combinations(
fg_fg_numel=fg_fg_numel, fg_bg_numel=fg_bg_numel,
pred_fg_numel=fg___numel, pred_bg_numel=bg___numel,
)

results_parts = []
for i, (part_numel, combination) in enumerate(zip(parts_numel, combinations)):
align_matrix_value = 2 * (combination[0] * combination[1]) / \
(combination[0] ** 2 + combination[1] ** 2 + _EPS)
enhanced_matrix_value = (align_matrix_value + 1) ** 2 / 4
results_parts.append(enhanced_matrix_value * part_numel)
enhanced_matrix_sum = sum(results_parts)

em = enhanced_matrix_sum / (self.gt_size - 1 + _EPS)
return em
```

## Efficiency comparison

845 local gray prediction images and binary mask truth value data are used for test and comparison. The overall time comparison is as follows:

method Total time (s) Speed increase (Times)
'base' 539.2173762321472s x1
'best' 19.94518733024597s x27.0 (539.22/19.95)
'cumsumhistogram' 3.2935903072357178s x163.8 (539.22/3.29)

Although the specific time may be limited by hardware, the relative speed is obvious

The test code can be seen in my github: https://github.com/lartpang/CodeForArticle/tree/main/sod_metrics

Tags: Python numpy

Posted by cap2cap10 on Thu, 05 May 2022 06:54:22 +0300