01. Predict with pre-trained SSD models¶

This article shows how to play with pre-trained SSD models with only a few lines of code.

First let’s import some necessary libraries:

from gluoncv import model_zoo, data, utils
from matplotlib import pyplot as plt

Load a pretrained model¶

Let’s get an SSD model trained with 512x512 images on Pascal VOC dataset with ResNet-50 V1 as the base model. By specifying pretrained=True, it will automatically download the model from the model zoo if necessary. For more pretrained models, please refer to Model Zoo.

net = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)

Out:

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/block.py:1512: UserWarning: Cannot decide type for the following arguments. Consider providing them as input:
        data: None
  input_sym_arg_type = in_param.infer_type()[0]
Downloading /root/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...

  0%|          | 0/132723 [00:00<?, ?KB/s]
  0%|          | 102/132723 [00:00<02:50, 776.13KB/s]
  0%|          | 515/132723 [00:00<01:01, 2153.95KB/s]
  2%|1         | 2184/132723 [00:00<00:18, 6872.72KB/s]
  6%|5         | 7459/132723 [00:00<00:05, 21937.22KB/s]
 10%|9         | 13000/132723 [00:00<00:03, 32519.11KB/s]
 16%|#5        | 20998/132723 [00:00<00:02, 47353.06KB/s]
 21%|##        | 27272/132723 [00:00<00:02, 52010.10KB/s]
 26%|##5       | 34107/132723 [00:00<00:01, 57003.10KB/s]
 31%|###1      | 41248/132723 [00:01<00:01, 61167.62KB/s]
 37%|###6      | 48817/132723 [00:01<00:01, 65551.47KB/s]
 42%|####2     | 55878/132723 [00:01<00:01, 66907.14KB/s]
 48%|####7     | 63675/132723 [00:01<00:00, 70230.01KB/s]
 54%|#####3    | 71044/132723 [00:01<00:00, 71082.41KB/s]
 59%|#####9    | 78794/132723 [00:01<00:00, 73003.15KB/s]
 65%|######4   | 86212/132723 [00:01<00:00, 73073.51KB/s]
 71%|#######   | 93875/132723 [00:01<00:00, 74135.73KB/s]
 76%|#######6  | 101304/132723 [00:01<00:00, 73462.10KB/s]
 82%|########2 | 109276/132723 [00:01<00:00, 75321.61KB/s]
 88%|########8 | 116819/132723 [00:02<00:00, 73789.12KB/s]
 94%|#########4| 124842/132723 [00:02<00:00, 75683.59KB/s]
100%|#########9| 132424/132723 [00:02<00:00, 74238.46KB/s]
100%|##########| 132723/132723 [00:02<00:00, 59593.56KB/s]

Pre-process an image¶

Next we download an image, and pre-process with preset data transforms. Here we specify that we resize the short edge of the image to 512 px. But you can feed an arbitrarily sized image.

You can provide a list of image file names, such as [im_fname1, im_fname2, ...] to gluoncv.data.transforms.presets.ssd.load_test() if you want to load multiple image together.

This function returns two results. The first is a NDArray with shape (batch_size, RGB_channels, height, width). It can be fed into the model directly. The second one contains the images in numpy format to easy to be plotted. Since we only loaded a single image, the first dimension of x is 1.

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
                          'gluoncv/detection/street_small.jpg?raw=true',
                          path='street_small.jpg')
x, img = data.transforms.presets.ssd.load_test(im_fname, short=512)
print('Shape of pre-processed image:', x.shape)

Out:

Downloading street_small.jpg from https://github.com/dmlc/web-data/blob/master/gluoncv/detection/street_small.jpg?raw=true...

  0%|          | 0/116 [00:00<?, ?KB/s]
117KB [00:00, 36875.08KB/s]
Shape of pre-processed image: (1, 3, 512, 512)

Inference and display¶

The forward function will return all detected bounding boxes, and the corresponding predicted class IDs and confidence scores. Their shapes are (batch_size, num_bboxes, 1), (batch_size, num_bboxes, 1), and (batch_size, num_bboxes, 4), respectively.

We can use gluoncv.utils.viz.plot_bbox() to visualize the results. We slice the results for the first image and feed them into plot_bbox:

class_IDs, scores, bounding_boxes = net(x)

ax = utils.viz.plot_bbox(img, bounding_boxes[0], scores[0],
                         class_IDs[0], class_names=net.classes)
plt.show()

Total running time of the script: ( 0 minutes 5.051 seconds)

Gallery generated by Sphinx-Gallery