

The more subsequent times you transform a bounding box's shape, the looser it will get, and the more your model's performance will degrade.

Polygons alleviate this problem, because the annotations are able to retain a tight fit after each transformation, only compositing to a box when fed into the model during training. For example, if you rotate a bounding-box annotated image 33 degrees and then rotate -33 degrees, you end up with the same image, but a much worse-fitting box: A bounding box rotated 33 degrees (center, red), then -33 degrees (right, yellow) This means that, if you augment an image multiple times, the fit fit only gets worse and worse. With rectangular annotations, the fit degrades each time you augment an image because there isn't any information about the portions of the box that represent the subject of interest and the portions of the box that enclose background pixels. Here, we can see that when cropping the photo, the bounding box (top) is unable to constrain itself to the size of the remaining parts of the basketball player – leaving significant background pixels in the box above the head and below the legs – whereas the polygon annotation (bottom) remains tight after cropping.įeeding loose bounding boxes into your model incentivizes it to learn to predict less tightly-fitting boxes and will degrade its performance. Cropping/translating a bbox (top, red) vs cropping/translating a polygon (bottom, green). Note that there is no way for the bounding box rotation to reliably keep a tight fit when undergoing spatial augmentation because the augmentation code doesn't know where the edges of the object of interest lie in the box.įor example, imagine instead of a single pencil, the top-left bounding box contained two pencils making an X shape – the red rotated box would fit as tightly as possible to fully constrain the rotated cross if your augmentation code instead optimized for the case in the image above, it would be throwing out useful pencil pixels from the 2-pencil X case. This is best illustrated through a few examples of traditional augmentations performed on bounding boxes (top) vs polygons (bottom): Rotating a bounding box (red, top) vs rotating a polygon (green, bottom).Īs you can see, the rotated polygon (green, bottom) retains a tight fit around the pencil, while the rotated bounding box (red, top) propagates white background pixels above and below the pencil into the transformed bounding box. Improved Augmentations with Polygon AnnotationsĪny augmentation that changes the size, shape, or orientation of an object will benefit from the additional information that polygons provide about the shapes and locations of objects.

The answer is: because it gives your model more information about the objects it can use to learn from which helps them learn more and make better predictions.

Object detection models are typically much faster and more widely supported, so they remain the best and most popular choice for solving many problems.Īt first blush, this might seem confusing, why would you spend the extra time and effort annotating your images with polygons if your model is only able to predict boxes? Polygons have traditionally been used for training image segmentation models, but polygons can also improve the training of object detection models (which predict bounding boxes).
