02. Predict with pre-trained Faster RCNN models

This article shows how to play with pre-trained Faster RCNN model.

First let’s import some necessary libraries:

from matplotlib import pyplot as plt
import gluoncv
from gluoncv import model_zoo, data, utils

Load a pretrained model

Let’s get an Faster RCNN model trained on Pascal VOC dataset with ResNet-50 backbone. By specifying pretrained=True, it will automatically download the model from the model zoo if necessary. For more pretrained models, please refer to Model Zoo.

The returned model is a HybridBlock gluoncv.model_zoo.FasterRCNN with a default context of cpu(0).

net = model_zoo.get_model('faster_rcnn_resnet50_v1b_voc', pretrained=True)

Out:

Downloading /root/.mxnet/models/faster_rcnn_resnet50_v1b_voc-447328d8.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/faster_rcnn_resnet50_v1b_voc-447328d8.zip...

  0%|          | 0/121887 [00:00<?, ?KB/s]
  0%|          | 476/121887 [00:00<00:30, 3982.61KB/s]
  1%|1         | 1411/121887 [00:00<00:17, 6904.25KB/s]
  4%|4         | 5133/121887 [00:00<00:06, 18787.64KB/s]
  9%|8         | 10584/121887 [00:00<00:03, 31931.42KB/s]
 15%|#5        | 18289/121887 [00:00<00:02, 47475.76KB/s]
 20%|#9        | 24013/121887 [00:00<00:01, 50688.48KB/s]
 26%|##5       | 31647/121887 [00:00<00:01, 58901.84KB/s]
 31%|###       | 37615/121887 [00:00<00:01, 45610.16KB/s]
 35%|###4      | 42660/121887 [00:01<00:01, 41059.92KB/s]
 39%|###8      | 47146/121887 [00:01<00:02, 33945.65KB/s]
 42%|####1     | 50949/121887 [00:01<00:02, 29191.83KB/s]
 48%|####8     | 58520/121887 [00:01<00:01, 38867.49KB/s]
 54%|#####3    | 65311/121887 [00:01<00:01, 45428.23KB/s]
 58%|#####7    | 70536/121887 [00:01<00:01, 42768.92KB/s]
 63%|######2   | 76324/121887 [00:01<00:00, 46152.75KB/s]
 67%|######7   | 81905/121887 [00:02<00:01, 38308.27KB/s]
 71%|#######   | 86256/121887 [00:02<00:00, 37554.61KB/s]
 74%|#######4  | 90353/121887 [00:02<00:00, 35909.43KB/s]
 78%|#######7  | 94940/121887 [00:02<00:00, 36775.73KB/s]
 81%|########1 | 98788/121887 [00:02<00:00, 33123.65KB/s]
 87%|########7 | 106164/121887 [00:02<00:00, 42806.26KB/s]
 91%|######### | 110801/121887 [00:02<00:00, 39550.77KB/s]
 94%|#########4| 115031/121887 [00:03<00:00, 38373.90KB/s]
121888KB [00:03, 39383.28KB/s]

Pre-process an image

Next we download an image, and pre-process with preset data transforms. The default behavior is to resize the short edge of the image to 600px. But you can feed an arbitrarily sized image.

You can provide a list of image file names, such as [im_fname1, im_fname2, ...] to gluoncv.data.transforms.presets.rcnn.load_test() if you want to load multiple image together.

This function returns two results. The first is a NDArray with shape (batch_size, RGB_channels, height, width). It can be fed into the model directly. The second one contains the images in numpy format to easy to be plotted. Since we only loaded a single image, the first dimension of x is 1.

Please beware that orig_img is resized to short edge 600px.

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
                          'gluoncv/detection/biking.jpg?raw=true',
                          path='biking.jpg')
x, orig_img = data.transforms.presets.rcnn.load_test(im_fname)

Out:

Downloading biking.jpg from https://github.com/dmlc/web-data/blob/master/gluoncv/detection/biking.jpg?raw=true...

  0%|          | 0/244 [00:00<?, ?KB/s]
100%|##########| 244/244 [00:00<00:00, 47393.27KB/s]

Inference and display

The Faster RCNN model returns predicted class IDs, confidence scores, bounding boxes coordinates. Their shape are (batch_size, num_bboxes, 1), (batch_size, num_bboxes, 1) and (batch_size, num_bboxes, 4), respectively.

We can use gluoncv.utils.viz.plot_bbox() to visualize the results. We slice the results for the first image and feed them into plot_bbox:

demo faster rcnn

Total running time of the script: ( 0 minutes 7.135 seconds)

Gallery generated by Sphinx-Gallery