Solve the problem of small target in object detection

In order to improve the performance of the model on small objects, we recommend the following techniques:

  • Improve image capture resolution
  • Improve the input resolution of the model
  • Tile image
  • Generate more data through expansion
  • Automatic learning model anchor
  • Filter out redundant classes

Why is the small goal problem difficult?

The problem of small objects puzzles the object detection models all over the world. Check the COCO evaluation results of the latest models YOLOv3, EfficientDet and YOLOv4:

View AP_S,AP_M,AP_L to get the most advanced model.

For example, in EfficientDet, the AP of small objects is only 12%, while that of large objects is 51%, which is almost five times the gap!

So why is it so difficult to detect small objects?

It all depends on the model. The target detection model forms features by aggregating pixels in the convolution layer.

Feature aggregation of target detection in PP-YOLO

At the end of the network, the prediction is based on the loss function, which summarizes the pixels based on the difference between the prediction and the real situation on the ground.

Loss function in YOLO

If the ground truth box is small, the signal will be small during training. In addition, small objects are most likely to have data marking errors, so their identification may be omitted. From experience and theory, small objects are difficult.

Improve image capture resolution

Very small objects may contain only a few pixels in the bounding box - which means that it is important to improve the resolution of the image to increase the richness of features that the detector can form from the small box. Therefore, if possible, we recommend capturing high-resolution images as much as possible.

Improve the input resolution of the model

Once we have a higher resolution image, we can expand the input resolution of the model. Warning: this will result in larger models requiring longer training time and slower inference at the beginning of deployment. We may need to run experiments to find the right trade-off between speed and performance.

In our tutorial on training YOLOv4, we can easily adjust the input resolution by changing the image size in the configuration file.

saturation = 1.5
exposure = 1.5
hue = .1

In our tutorial, partners can also easily adjust the input resolution by changing the "image size" parameter in the "training" command. This tutorial introduces how to train YOLOv5:

!python --img {YOUR RESOLUTON SIZE HERE} --batch 16 --epochs 10 --data '../data.yaml' --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results  --cache

Note: the improved results can be seen only when the maximum resolution of the training data is reached.

Tile image

Another good strategy for detecting small images is to take image tiling as a preprocessing step. Tiling effectively enlarges the detector to small objects, but allows us to maintain the small input resolution required for fast reasoning.

Tiling the image as a preprocessing step in Roboflow

If you use tiling during training, it's important to remember that we also need to tile images during reasoning.

Generate more data through expansion

Data expansion will generate new images from our basic data set, which is very useful to prevent the model from over fitting the training set.

Some particularly useful small object detection enhancements include random clipping, random rotation and mosaic enhancement.

Automatic learning model anchor

Anchor box is the prototype boundary box of model learning and prediction, that is, the anchor box can be preset, and sometimes it is not the best for our training data. It is good to customize and adjust these parameters to suit the tasks we are about to complete. The YOLOv5 model architecture will automatically perform this operation for us according to our customized data. All we have to do is start training.

Analyzing anchors... anchors/target = 4.66, Best Possible Recall (BPR) = 0.9675. Attempting to generate improved anchors, please wait... WARNING: Extremely small objects found. 35 of 1664 labels are < 3 pixels in width or height. Running kmeans for 9 anchors on 1664 points... thr=0.25: 0.9477 best possible recall, 4.95 anchors past thr n=9, img_size=416, metric_all=0.317/0.665-mean/best, past_thr=0.465-mean: 18,24,  65,37,  35,68,  46,135,  152,54,  99,109,  66,218,  220,128,  169,228 Evolving anchors with Genetic Algorithm: fitness = 0.6825: 100%|██████████| 1000/1000 [00:00<00:00, 1081.71it/s] thr=0.25: 0.9627 best possible recall, 5.32 anchors past thr n=9, img_size=416, metric_all=0.338/0.688-mean/best, past_thr=0.476-mean: 13,20,  41,32,  26,55,  46,72,  122,57,  86,102,  58,152,  161,120,  165,20

Filter out redundant classes

Class management is an important technology to improve the quality of data set. If one of our classes obviously overlaps with another class, we should filter this class from the data set. Maybe we don't think the small objects in the dataset are worth detecting, so we may want to take them out. We can quickly identify all these problems by performing condition checks on the advanced dataset in Roboflow Pro.

Class omission and class renaming can be realized through Roboflow's ontology management tool.


Detecting small objects correctly is indeed a challenge. In this article, we discuss some strategies to improve the small object detector, namely:

  • Improve image capture resolution
  • Improve the input resolution of the model
  • Tile image
  • Generate more data through expansion
  • Automatic learning model anchor
  • Filter out redundant classes

Posted by Lisa23 on Sun, 22 May 2022 13:30:29 +0300