.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "build/examples_action_recognition/demo_i3d_kinetics400.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_build_examples_action_recognition_demo_i3d_kinetics400.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_build_examples_action_recognition_demo_i3d_kinetics400.py:

3. Getting Started with Pre-trained I3D Models on Kinetcis400
================================================================

`Kinetics400 <https://deepmind.com/research/open-source/kinetics>`_  is an action recognition dataset
of realistic action videos, collected from YouTube. With 306,245 short trimmed videos
from 400 action categories, it is one of the largest and most widely used dataset in the research
community for benchmarking state-of-the-art video action recognition models.

`I3D <https://arxiv.org/abs/1705.07750>`_ (Inflated 3D Networks) is a widely adopted 3D video
classification network. It uses 3D convolution to learn spatiotemporal information directly from videos.
I3D is proposed to improve `C3D <https://arxiv.org/abs/1412.0767>`_ (Convolutional 3D Networks) by inflating from 2D models.
We can not only reuse the 2D models' architecture (e.g., ResNet, Inception), but also bootstrap
the model weights from 2D pretrained models. In this manner, training 3D networks for video
classification is feasible and getting much better results.

In this tutorial, we will demonstrate how to load a pre-trained I3D model from :ref:`gluoncv-model-zoo`
and classify a video clip from the Internet or your local disk into one of the 400 action classes.

Step by Step
------------

We will try out a pre-trained I3D model on a single video clip.

First, please follow the `installation guide <../../index.html#installation>`__
to install ``MXNet`` and ``GluonCV`` if you haven't done so yet.

.. GENERATED FROM PYTHON SOURCE LINES 27-37

.. code-block:: default


    import matplotlib.pyplot as plt
    import numpy as np
    import mxnet as mx
    from mxnet import gluon, nd, image
    from mxnet.gluon.data.vision import transforms
    from gluoncv.data.transforms import video
    from gluoncv import utils
    from gluoncv.model_zoo import get_model


.. GENERATED FROM PYTHON SOURCE LINES 38-39

Then, we download the video and extract a 32-frame clip from it.

.. GENERATED FROM PYTHON SOURCE LINES 39-50

.. code-block:: default


    from gluoncv.utils.filesystem import try_import_decord
    decord = try_import_decord()

    url = 'https://github.com/bryanyzhu/tiny-ucf101/raw/master/abseiling_k400.mp4'
    video_fname = utils.download(url)
    vr = decord.VideoReader(video_fname)
    frame_id_list = range(0, 64, 2)
    video_data = vr.get_batch(frame_id_list).asnumpy()
    clip_input = [video_data[vid, :, :, :] for vid, _ in enumerate(frame_id_list)]


.. GENERATED FROM PYTHON SOURCE LINES 51-56

Now we define transformations for the video clip.
This transformation function does three things:
center crop the image to 224x224 in size,
transpose it to ``num_channels*num_frames*height*width``,
and normalize with mean and standard deviation calculated across all ImageNet images.

.. GENERATED FROM PYTHON SOURCE LINES 56-64

.. code-block:: default


    transform_fn = video.VideoGroupValTransform(size=224, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    clip_input = transform_fn(clip_input)
    clip_input = np.stack(clip_input, axis=0)
    clip_input = clip_input.reshape((-1,) + (32, 3, 224, 224))
    clip_input = np.transpose(clip_input, (0, 2, 1, 3, 4))
    print('Video data is downloaded and preprocessed.')


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Video data is downloaded and preprocessed.


.. GENERATED FROM PYTHON SOURCE LINES 65-66

Next, we load a pre-trained I3D model.

.. GENERATED FROM PYTHON SOURCE LINES 66-71

.. code-block:: default


    model_name = 'i3d_inceptionv1_kinetics400'
    net = get_model(model_name, nclass=400, pretrained=True)
    print('%s model is successfully loaded.' % model_name)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Downloading /root/.mxnet/models/i3d_inceptionv1_kinetics400-81e0be10.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/i3d_inceptionv1_kinetics400-81e0be10.zip...
      0%|          | 0/51277 [00:00<?, ?KB/s]      0%|          | 100/51277 [00:00<01:02, 820.07KB/s]      1%|          | 508/51277 [00:00<00:21, 2341.81KB/s]      4%|4         | 2188/51277 [00:00<00:06, 7633.88KB/s]     16%|#5        | 8084/51277 [00:00<00:01, 25659.59KB/s]     28%|##8       | 14495/51277 [00:00<00:00, 38505.38KB/s]     40%|####      | 20665/51277 [00:00<00:00, 45780.50KB/s]     56%|#####6    | 28810/51277 [00:00<00:00, 57007.25KB/s]     69%|######9   | 35557/51277 [00:00<00:00, 60251.00KB/s]     86%|########6 | 44111/51277 [00:00<00:00, 67160.94KB/s]    51278KB [00:01, 48589.43KB/s]                           
    i3d_inceptionv1_kinetics400 model is successfully loaded.


.. GENERATED FROM PYTHON SOURCE LINES 72-75

Note that if you want to use InceptionV3 series model (i.e., i3d_inceptionv3_kinetics400),
please resize the image to have both dimensions larger than 299 (e.g., 340x450) and change input size from 224 to 299
in the transform function. Finally, we prepare the video clip and feed it to the model.

.. GENERATED FROM PYTHON SOURCE LINES 75-86

.. code-block:: default


    pred = net(nd.array(clip_input))

    classes = net.classes
    topK = 5
    ind = nd.topk(pred, k=topK)[0].astype('int')
    print('The input video clip is classified to be')
    for i in range(topK):
        print('\t[%s], with probability %.3f.'%
              (classes[ind[i].asscalar()], nd.softmax(pred)[0][ind[i]].asscalar()))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    The input video clip is classified to be
            [abseiling], with probability 0.991.
            [rock_climbing], with probability 0.009.
            [ice_climbing], with probability 0.000.
            [paragliding], with probability 0.000.
            [skydiving], with probability 0.000.


.. GENERATED FROM PYTHON SOURCE LINES 87-89

We can see that our pre-trained model predicts this video clip
to be ``abseiling`` action with high confidence.

.. GENERATED FROM PYTHON SOURCE LINES 92-97

Next Step
---------

If you would like to dive deeper into training I3D models on ``Kinetics400``,
feel free to read the next `tutorial on Kinetics400 <dive_deep_i3d_kinetics400.html>`__.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  2.471 seconds)


.. _sphx_glr_download_build_examples_action_recognition_demo_i3d_kinetics400.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: demo_i3d_kinetics400.py <demo_i3d_kinetics400.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: demo_i3d_kinetics400.ipynb <demo_i3d_kinetics400.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_