1. Predict with pre-trained Simple Pose Estimation models

This article shows how to play with pre-trained Simple Pose models with only a few lines of code.

First let’s import some necessary libraries:

from matplotlib import pyplot as plt
from gluoncv import model_zoo, data, utils
from gluoncv.data.transforms.pose import detector_to_simple_pose, heatmap_to_coord

Load a pretrained model

Let’s get a Simple Pose model trained with input images of size 256x192 on MS COCO dataset. We pick the one using ResNet-18 V1b as the base model. By specifying pretrained=True, it will automatically download the model from the model zoo if necessary. For more pretrained models, please refer to Model Zoo.

Note that a Simple Pose model takes a top-down strategy to estimate human pose in detected bounding boxes from an object detection model.

detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True)
pose_net = model_zoo.get_model('simple_pose_resnet18_v1b', pretrained=True)

# Note that we can reset the classes of the detector to only include
# human, so that the NMS process is faster.

detector.reset_class(["person"], reuse_weights=['person'])

Out:

Downloading /root/.mxnet/models/yolo3_mobilenet1.0_coco-66dbbae6.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/yolo3_mobilenet1.0_coco-66dbbae6.zip...

  0%|          | 0/88992 [00:00<?, ?KB/s]
  1%|          | 642/88992 [00:00<00:17, 5145.99KB/s]
  4%|3         | 3288/88992 [00:00<00:05, 14572.57KB/s]
 12%|#1        | 10253/88992 [00:00<00:02, 36718.39KB/s]
 19%|#9        | 17291/88992 [00:00<00:01, 48998.77KB/s]
 29%|##8       | 25640/88992 [00:00<00:01, 60811.68KB/s]
 37%|###7      | 32979/88992 [00:00<00:00, 64943.31KB/s]
 46%|####5     | 40816/88992 [00:00<00:00, 68490.18KB/s]
 55%|#####5    | 48964/88992 [00:00<00:00, 72529.44KB/s]
 64%|######4   | 56985/88992 [00:00<00:00, 74891.79KB/s]
 73%|#######3  | 65052/88992 [00:01<00:00, 76654.67KB/s]
 82%|########2 | 73253/88992 [00:01<00:00, 78275.46KB/s]
 91%|#########1| 81115/88992 [00:01<00:00, 77842.12KB/s]
100%|#########9| 88924/88992 [00:01<00:00, 77729.08KB/s]
88993KB [00:01, 65498.89KB/s]
Downloading /root/.mxnet/models/simple_pose_resnet18_v1b-f63d42ac.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/simple_pose_resnet18_v1b-f63d42ac.zip...

  0%|          | 0/55762 [00:00<?, ?KB/s]
  0%|          | 97/55762 [00:00<01:14, 748.77KB/s]
  1%|          | 507/55762 [00:00<00:25, 2187.04KB/s]
  4%|3         | 2181/55762 [00:00<00:07, 7132.01KB/s]
 13%|#3        | 7488/55762 [00:00<00:02, 22591.96KB/s]
 26%|##5       | 14333/55762 [00:00<00:01, 37480.59KB/s]
 41%|####      | 22599/55762 [00:00<00:00, 51795.17KB/s]
 53%|#####3    | 29749/55762 [00:00<00:00, 57855.53KB/s]
 66%|######6   | 37038/55762 [00:00<00:00, 62481.06KB/s]
 80%|########  | 44792/55762 [00:00<00:00, 67078.34KB/s]
 94%|#########4| 52672/55762 [00:01<00:00, 70637.09KB/s]
55763KB [00:01, 49552.59KB/s]

Pre-process an image for detector, and make inference

Next we download an image, and pre-process with preset data transforms. Here we specify that we resize the short edge of the image to 512 px. But you can feed an arbitrarily sized image.

This function returns two results. The first is a NDArray with shape (batch_size, RGB_channels, height, width). It can be fed into the model directly. The second one contains the images in numpy format to easy to be plotted. Since we only loaded a single image, the first dimension of x is 1.

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
                          'gluoncv/pose/soccer.png?raw=true',
                          path='soccer.png')
x, img = data.transforms.presets.ssd.load_test(im_fname, short=512)
print('Shape of pre-processed image:', x.shape)

class_IDs, scores, bounding_boxs = detector(x)

Out:

Downloading soccer.png from https://github.com/dmlc/web-data/blob/master/gluoncv/pose/soccer.png?raw=true...

  0%|          | 0/1561 [00:00<?, ?KB/s]
1562KB [00:00, 83480.97KB/s]
Shape of pre-processed image: (1, 3, 512, 605)

Process tensor from detector to keypoint network

Next we process the output from the detector.

For a Simple Pose network, it expects the input has the size 256x192, and the human is centered. We crop the bounding boxed area for each human, and resize it to 256x192, then finally normalize it.

In order to make sure the bounding box has included the entire person, we usually slightly upscale the box size.

pose_input, upscale_bbox = detector_to_simple_pose(img, class_IDs, scores, bounding_boxs)

Predict with a Simple Pose network

Now we can make prediction.

A Simple Pose network predicts the heatmap for each joint (i.e. keypoint). After the inference we search for the highest value in the heatmap and map it to the coordinates on the original image.

Display the pose estimation results

We can use gluoncv.utils.viz.plot_keypoints() to visualize the results.

ax = utils.viz.plot_keypoints(img, pred_coords, confidence,
                              class_IDs, bounding_boxs, scores,
                              box_thresh=0.5, keypoint_thresh=0.2)
plt.show()
demo simple pose

Total running time of the script: ( 0 minutes 6.072 seconds)

Gallery generated by Sphinx-Gallery