{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 05. Deep dive into SSD training: 3 tips to boost performance\n\nIn the previous tutorial `sphx_glr_build_examples_detection_train_ssd_voc.py`,\nwe briefly went through the basic APIs that help building the training pipeline of SSD.\n\nIn this article, we will dive deep into the details and introduce tricks that\nimportant for reproducing state-of-the-art performance.\nThese are the hidden pitfalls that are usually missing in papers and tech reports.\n\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loss normalization: use batch-wise norm instead of sample-wise norm\nThe training objective mentioned in paper is a weighted summation of localization\nloss(loc) and the confidence loss(conf).\n\n\\begin{align}L(x, c, l, g) = \\frac{1}{N} (L_{conf}(x, c) + \\alpha L_{loc}(x, l, g))\\end{align}\n\nBut the question is, what is the proper way to calculate ``N``? Should we sum up\n``N`` across the entire batch, or use per-sample ``N`` instead?\n\nTo illustrate this, please generate some dummy data:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import mxnet as mx\nx = mx.random.uniform(shape=(2, 3, 300, 300)) # use batch-size 2\n# suppose image 1 has single object\nid1 = mx.nd.array([1])\nbbox1 = mx.nd.array([[10, 20, 80, 90]]) # xmin, ymin, xmax, ymax\n# suppose image 2 has 4 objects\nid2 = mx.nd.array([1, 3, 5, 7])\nbbox2 = mx.nd.array([[10, 10, 30, 30], [40, 40, 60, 60], [50, 50, 90, 90], [100, 110, 120, 140]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, combine them into a batch by padding -1 as sentinal values:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "gt_ids = mx.nd.ones(shape=(2, 4)) * -1\ngt_ids[0, :1] = id1\ngt_ids[1, :4] = id2\nprint('class_ids:', gt_ids)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "gt_boxes = mx.nd.ones(shape=(2, 4, 4)) * -1\ngt_boxes[0, :1, :] = bbox1\ngt_boxes[1, :, :] = bbox2\nprint('bounding boxes:', gt_boxes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use a vgg16 atrous 300x300 SSD model in this example. For demo purpose, we\ndon't use any pretrained weights here\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from gluoncv import model_zoo\nnet = model_zoo.get_model('ssd_300_vgg16_atrous_voc', pretrained_base=False, pretrained=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some preparation before training\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from mxnet import gluon\nnet.initialize()\nconf_loss = gluon.loss.SoftmaxCrossEntropyLoss()\nloc_loss = gluon.loss.HuberLoss()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simulate the training steps by manually compute losses:\nYou can always use ``gluoncv.loss.SSDMultiBoxLoss`` which fulfills this function.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from mxnet import autograd\nfrom gluoncv.model_zoo.ssd.target import SSDTargetGenerator\ntarget_generator = SSDTargetGenerator()\nwith autograd.record():\n # 1. forward pass\n cls_preds, box_preds, anchors = net(x)\n # 2. generate training targets\n cls_targets, box_targets, box_masks = target_generator(\n anchors, cls_preds, gt_boxes, gt_ids)\n num_positive = (cls_targets > 0).sum().asscalar()\n cls_mask = (cls_targets >= 0).expand_dims(axis=-1) # negative targets should be ignored in loss\n # 3 losses, here we have two options, batch-wise or sample wise norm\n # 3.1 batch wise normalization: divide loss by the summation of num positive targets in batch\n batch_conf_loss = conf_loss(cls_preds, cls_targets, cls_mask) / num_positive\n batch_loc_loss = loc_loss(box_preds, box_targets, box_masks) / num_positive\n # 3.2 sample wise normalization: divide by num positive targets in this sample(image)\n sample_num_positive = (cls_targets > 0).sum(axis=0, exclude=True)\n sample_conf_loss = conf_loss(cls_preds, cls_targets, cls_mask) / sample_num_positive\n sample_loc_loss = loc_loss(box_preds, box_targets, box_masks) / sample_num_positive\n # Since ``conf_loss`` and ``loc_loss`` calculate the mean of such loss, we want\n # to rescale it back to loss per image.\n rescale_conf = cls_preds.size / cls_preds.shape[0]\n rescale_loc = box_preds.size / box_preds.shape[0]\n # then call backward and step, to update the weights, etc..\n # L = conf_loss + loc_loss * alpha\n # L.backward()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The norms are different, but sample-wise norms sum up to be the same with\nbatch-wise norm\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print('batch-wise num_positive:', num_positive)\nprint('sample-wise num_positive:', sample_num_positive)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
The per image ``num_positive`` is no longer 1 and 4 because multiple anchor\n boxes can be matched to a single object