# How do I speed up my calculations > 150 times

My original document: https://www.yuque.com/lart/blog/lwgt38

The book continues from "how did I speed up computing time by 25.6 times"

The previous article mentioned that F-measure can further accelerate the calculation by using cumulative histogram, but E-measure has not been changed On the night of writing the last article, after reorganizing my thinking, I seemed to think of how to use the cumulative histogram to speed up again

## Speed constraints

Although using the idea of "decoupling" can efficiently optimize the calculation process of indicators under each threshold, the overall for loop will still take a lot of time Considering that the calculation under each threshold is not very relevant in fact, if it can be calculated at the same time, it will further improve the speed Here, we will return to the strategy of calculating the brilliant cumulative histogram when calculating F-measure

After the previous decoupling, the key variable actually obtained is fg_fg_numel and fg_bg_numel .

fg_fg_numel = np.count_nonzero(binarized_pred & gt) fg_bg_numel = np.count_nonzero(binarized_pred & ~gt)

Starting from the two variables themselves, if the cumulative histogram is used, the number of foreground pixels (value 1) under > = different thresholds, the essence of calculation and NP can be obtained at the same time count_ Nonzero is the same thing So we can make intuitive replacement:

""" Naming rules for function internal variables: pred attribute(prospect fg,background bg)_gt attribute(prospect fg,background bg)_Variable meaning If only pred perhaps gt，Then another corresponding attribute location is used`_`replace """ fg_fg_hist, _ = np.histogram(pred[gt], bins=bins) fg_bg_hist, _ = np.histogram(pred[~gt], bins=bins) fg_fg_numel_w_thrs = np.cumsum(np.flip(fg_fg_hist), axis=0) fg_bg_numel_w_thrs = np.cumsum(np.flip(fg_bg_hist), axis=0)

In this way, we get a series of FG corresponding to different thresholds_ fg_ Numel and fg_bg_numel It should be noted here that the setting of bins used to divide the interval Since the default histogram division interval will contain the last endpoint, a more reasonable division is bins = NP Linspace (0, 256, 257), so that the last interval is [255, 256], which can contain the maximum value without repeated counting with 254

For the convenience of calculation, the pred foreground statistics FG that will be used later will be used here___ numel_ w_ THRs and background statistics bg____numel_w_thrs is written directly for easy use:

fg___numel_w_thrs = fg_fg_numel_w_thrs + fg_bg_numel_w_thrs bg___numel_w_thrs = self.gt_size - fg___numel_w_thrs

The following steps are basically the same as before. numpy's broadcast mechanism does not need to be changed too much This method is actually used to extract multiple parts of the code

def generate_parts_numel_combinations(self, fg_fg_numel, fg_bg_numel, pred_fg_numel, pred_bg_numel): bg_fg_numel = self.gt_fg_numel - fg_fg_numel bg_bg_numel = pred_bg_numel - bg_fg_numel parts_numel = [fg_fg_numel, fg_bg_numel, bg_fg_numel, bg_bg_numel] mean_pred_value = pred_fg_numel / self.gt_size mean_gt_value = self.gt_fg_numel / self.gt_size demeaned_pred_fg_value = 1 - mean_pred_value demeaned_pred_bg_value = 0 - mean_pred_value demeaned_gt_fg_value = 1 - mean_gt_value demeaned_gt_bg_value = 0 - mean_gt_value combinations = [ (demeaned_pred_fg_value, demeaned_gt_fg_value), (demeaned_pred_fg_value, demeaned_gt_bg_value), (demeaned_pred_bg_value, demeaned_gt_fg_value), (demeaned_pred_bg_value, demeaned_gt_bg_value) ] return parts_numel, combinations

Enhanced is calculated later_ matrix_ It's natural to write the part of sum:

parts_numel_w_thrs, combinations = self.generate_parts_numel_combinations( fg_fg_numel=fg_fg_numel_w_thrs, fg_bg_numel=fg_bg_numel_w_thrs, pred_fg_numel=fg___numel_w_thrs, pred_bg_numel=bg___numel_w_thrs, ) # Here, although you can use a list to collect individual results_part, but the list needs to be converted into a numpy array for summation. It's better to apply for space at one time and load it directly later results_parts = np.empty(shape=(4, 256), dtype=np.float64) for i, (part_numel, combination) in enumerate(zip(parts_numel_w_thrs, combinations)): align_matrix_value = 2 * (combination[0] * combination[1]) / \ (combination[0] ** 2 + combination[1] ** 2 + _EPS) enhanced_matrix_value = (align_matrix_value + 1) ** 2 / 4 results_parts[i] = enhanced_matrix_value * part_numel enhanced_matrix_sum = results_parts.sum(axis=0)

## Overall combing

The main logic has been completed. The next step is to integrate these codes with the original code, that is, to integrate the original code_ em_ with_ Threshold and cal_enhanced_matrix has two methods

In the original code https://github.com/lartpang/CodeForArticle/blob/7a922c720702c727d7a28fd17f3db66e0b9ba135/sod_metrics/metrics/metric_best.py#L46-L58

def cal_em_with_threshold(self, pred: np.ndarray, gt: np.ndarray, threshold: float) -> float: binarized_pred = pred >= threshold if self.gt_fg_numel == 0: binarized_pred_bg_numel = np.count_nonzero(~binarized_pred) enhanced_matrix_sum = binarized_pred_bg_numel elif self.gt_fg_numel == self.gt_size: binarized_pred_fg_numel = np.count_nonzero(binarized_pred) enhanced_matrix_sum = binarized_pred_fg_numel else: enhanced_matrix_sum = self.cal_enhanced_matrix(binarized_pred, gt) em = enhanced_matrix_sum / (self.gt_size - 1 + _EPS) return em

Combined with the statistical values of the front and background elements under each threshold calculated in the previous code, the above code here can actually be simplified by using the existing operation results, that is, the first two branches of if In addition, the threshold division does not need explicit processing, because it has been done in the cumulative histogram Therefore, the code here can be incorporated into cal in the case of dynamic threshold calculation_ enhanced_ In the calculation process of matrix Get the final integrated method directly:

def cal_em_with_cumsumhistogram(self, pred: np.ndarray, gt: np.ndarray) -> np.ndarray: """ Naming rules for function internal variables: pred attribute(prospect fg,background bg)_gt attribute(prospect fg,background bg)_Variable meaning If only pred perhaps gt，Then another corresponding attribute location is used`_`replace """ pred = (pred * 255).astype(np.uint8) bins = np.linspace(0, 256, 257) fg_fg_hist, _ = np.histogram(pred[gt], bins=bins) fg_bg_hist, _ = np.histogram(pred[~gt], bins=bins) fg_fg_numel_w_thrs = np.cumsum(np.flip(fg_fg_hist), axis=0) fg_bg_numel_w_thrs = np.cumsum(np.flip(fg_bg_hist), axis=0) fg___numel_w_thrs = fg_fg_numel_w_thrs + fg_bg_numel_w_thrs bg___numel_w_thrs = self.gt_size - fg___numel_w_thrs if self.gt_fg_numel == 0: enhanced_matrix_sum = bg___numel_w_thrs elif self.gt_fg_numel == self.gt_size: enhanced_matrix_sum = fg___numel_w_thrs else: parts_numel_w_thrs, combinations = self.generate_parts_numel_combinations( fg_fg_numel=fg_fg_numel_w_thrs, fg_bg_numel=fg_bg_numel_w_thrs, pred_fg_numel=fg___numel_w_thrs, pred_bg_numel=bg___numel_w_thrs, ) results_parts = np.empty(shape=(4, 256), dtype=np.float64) for i, (part_numel, combination) in enumerate(zip(parts_numel_w_thrs, combinations)): align_matrix_value = 2 * (combination[0] * combination[1]) / \ (combination[0] ** 2 + combination[1] ** 2 + _EPS) enhanced_matrix_value = (align_matrix_value + 1) ** 2 / 4 results_parts[i] = enhanced_matrix_value * part_numel enhanced_matrix_sum = results_parts.sum(axis=0) em = enhanced_matrix_sum / (self.gt_size - 1 + _EPS) return em

Or for reuse, cal_em_with_threshold (this method needs to be retained, because there is another calculation case of E-measure that needs this method) can be reconstructed:

def cal_em_with_threshold(self, pred: np.ndarray, gt: np.ndarray, threshold: float) -> float: """ Naming rules for function internal variables: pred attribute(prospect fg,background bg)_gt attribute(prospect fg,background bg)_Variable meaning If only pred perhaps gt，Then another corresponding attribute location is used`_`replace """ binarized_pred = pred >= threshold fg_fg_numel = np.count_nonzero(binarized_pred & gt) fg_bg_numel = np.count_nonzero(binarized_pred & ~gt) fg___numel = fg_fg_numel + fg_bg_numel bg___numel = self.gt_size - fg___numel if self.gt_fg_numel == 0: enhanced_matrix_sum = bg___numel elif self.gt_fg_numel == self.gt_size: enhanced_matrix_sum = fg___numel else: parts_numel, combinations = self.generate_parts_numel_combinations( fg_fg_numel=fg_fg_numel, fg_bg_numel=fg_bg_numel, pred_fg_numel=fg___numel, pred_bg_numel=bg___numel, ) results_parts = [] for i, (part_numel, combination) in enumerate(zip(parts_numel, combinations)): align_matrix_value = 2 * (combination[0] * combination[1]) / \ (combination[0] ** 2 + combination[1] ** 2 + _EPS) enhanced_matrix_value = (align_matrix_value + 1) ** 2 / 4 results_parts.append(enhanced_matrix_value * part_numel) enhanced_matrix_sum = sum(results_parts) em = enhanced_matrix_sum / (self.gt_size - 1 + _EPS) return em

## Efficiency comparison

845 local gray prediction images and binary mask truth value data are used for test and comparison. The overall time comparison is as follows:

method | Total time (s) | Speed increase (Times) |
---|---|---|

'base' | 539.2173762321472s | x1 |

'best' | 19.94518733024597s | x27.0 (539.22/19.95) |

'cumsumhistogram' | 3.2935903072357178s | x163.8 (539.22/3.29) |

Although the specific time may be limited by hardware, the relative speed is obvious

The test code can be seen in my github: https://github.com/lartpang/CodeForArticle/tree/main/sod_metrics