.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "build/examples_action_recognition/finetune_custom.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_build_examples_action_recognition_finetune_custom.py: 7. Fine-tuning SOTA video models on your own dataset ======================================================= This is a video action recognition tutorial using Gluon CV toolkit, a step-by-step example. The readers should have basic knowledge of deep learning and should be familiar with Gluon API. New users may first go through `A 60-minute Gluon Crash Course `_. You can `Start Training Now`_ or `Dive into Deep`_. Fine-tuning is an important way to obtain good video models on your own data when you don't have large annotated dataset or don't have the computing resources to train a model from scratch for your use case. In this tutorial, we provide a simple unified solution. The only thing you need to prepare is a text file containing the information of your videos (e.g., the path to your videos), we will take care of the rest. You can start fine-tuning from many popular pre-trained models (e.g., I3D, I3D-nonlocal, SlowFast) using a single command line. Start Training Now ~~~~~~~~~~~~~~~~~~ .. note:: Feel free to skip the tutorial because the training script is self-complete and ready to launch. :download:`Download Full Python Script: train_recognizer.py<../../../scripts/action-recognition/train_recognizer.py>` For more training command options, please run ``python train_recognizer.py -h`` Please checkout the `model_zoo <../model_zoo/index.html#action_recognition>`_ for training commands of reproducing the pretrained model. First, let's import the necessary libraries into python. .. GENERATED FROM PYTHON SOURCE LINES 32-50 .. code-block:: default from __future__ import division import argparse, time, logging, os, sys, math import numpy as np import mxnet as mx import gluoncv as gcv from mxnet import gluon, nd, init, context from mxnet import autograd as ag from mxnet.gluon import nn from mxnet.gluon.data.vision import transforms from gluoncv.data.transforms import video from gluoncv.data import VideoClsCustom from gluoncv.model_zoo import get_model from gluoncv.utils import makedirs, LRSequential, LRScheduler, split_and_load, TrainingHistory .. GENERATED FROM PYTHON SOURCE LINES 51-101 Custom DataLoader ------------------ We provide a general dataloader for you to use on your own dataset. Your data can be stored in any hierarchy, can be stored in either video format or already decoded to frames. The only thing you need to prepare is a text file, ``train.txt``. If your data is stored in image format (already decoded to frames). Your ``train.txt`` should look like: :: video_001 200 0 video_001 200 0 video_002 300 0 video_003 100 1 video_004 400 2 ...... video_100 200 10 There are three items in each line, separated by spaces. The first item is the path to your training videos, e.g., video_001. It should be a folder containing the frames of video_001.mp4. The second item is the number of frames in each video, e.g., 200. The third item is the label of the videos, e.g., 0. If your data is stored in video format. Your ``train.txt`` should look like: :: video_001.mp4 200 0 video_001.mp4 200 0 video_002.mp4 300 0 video_003.mp4 100 1 video_004.mp4 400 2 ...... video_100.mp4 200 10 Similarly, there are three items in each line, separated by spaces. The first item is the path to your training videos, e.g., video_001.mp4. The second item is the number of frames in each video. But you can put any number here because our video loader will compute the number of frames again automatically during training. The third item is the label of that video, e.g., 0. Once you prepare the ``train.txt``, you are good to go. Just use our general dataloader `VideoClsCustom `_ to load your data. In this tutorial, we will use UCF101 dataset as an example. For your own dataset, you can just replace the value of ``root`` and ``setting`` to your data directory and your prepared text file. Let's first define some basics. .. GENERATED FROM PYTHON SOURCE LINES 101-119 .. code-block:: default num_gpus = 1 ctx = [mx.gpu(i) for i in range(num_gpus)] transform_train = video.VideoGroupTrainTransform(size=(224, 224), scale_ratios=[1.0, 0.8], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) per_device_batch_size = 5 num_workers = 0 batch_size = per_device_batch_size * num_gpus train_dataset = VideoClsCustom(root=os.path.expanduser('~/.mxnet/datasets/ucf101/rawframes'), setting=os.path.expanduser('~/.mxnet/datasets/ucf101/ucfTrainTestlist/ucf101_train_split_1_rawframes.txt'), train=True, new_length=32, transform=transform_train) print('Load %d training samples.' % len(train_dataset)) train_data = gluon.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Load 9537 training samples. .. GENERATED FROM PYTHON SOURCE LINES 120-128 Custom Network -------------- You can always define your own network architecture. Here, we want to show how to fine-tune on a pre-trained model. Since I3D model is a very popular network, we will use I3D with ResNet50 backbone trained on Kinetics400 dataset (i.e., ``i3d_resnet50_v1_kinetics400``) as an example. For simple fine-tuning, people usually just replace the last classification (dense) layer to the number of classes in your dataset without changing other things. In GluonCV, you can get your customized model with one line of code. .. GENERATED FROM PYTHON SOURCE LINES 128-132 .. code-block:: default net = get_model(name='i3d_resnet50_v1_custom', nclass=101) net.collect_params().reset_ctx(ctx) print(net) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none conv14_weight is done with shape: (64, 3, 5, 7, 7) batchnorm5_gamma is done with shape: (64,) batchnorm5_beta is done with shape: (64,) batchnorm5_running_mean is done with shape: (64,) batchnorm5_running_var is done with shape: (64,) layer1_0_conv0_weight is done with shape: (64, 64, 3, 1, 1) layer1_0_batchnorm0_gamma is done with shape: (64,) layer1_0_batchnorm0_beta is done with shape: (64,) layer1_0_batchnorm0_running_mean is done with shape: (64,) layer1_0_batchnorm0_running_var is done with shape: (64,) layer1_0_conv1_weight is done with shape: (64, 64, 1, 3, 3) layer1_0_batchnorm1_gamma is done with shape: (64,) layer1_0_batchnorm1_beta is done with shape: (64,) layer1_0_batchnorm1_running_mean is done with shape: (64,) layer1_0_batchnorm1_running_var is done with shape: (64,) layer1_0_conv2_weight is done with shape: (256, 64, 1, 1, 1) layer1_0_batchnorm2_gamma is done with shape: (256,) layer1_0_batchnorm2_beta is done with shape: (256,) layer1_0_batchnorm2_running_mean is done with shape: (256,) layer1_0_batchnorm2_running_var is done with shape: (256,) layer1_downsample_conv0_weight is done with shape: (256, 64, 1, 1, 1) layer1_downsample_batchnorm0_gamma is done with shape: (256,) layer1_downsample_batchnorm0_beta is done with shape: (256,) layer1_downsample_batchnorm0_running_mean is done with shape: (256,) layer1_downsample_batchnorm0_running_var is done with shape: (256,) layer1_1_conv0_weight is done with shape: (64, 256, 3, 1, 1) layer1_1_batchnorm0_gamma is done with shape: (64,) layer1_1_batchnorm0_beta is done with shape: (64,) layer1_1_batchnorm0_running_mean is done with shape: (64,) layer1_1_batchnorm0_running_var is done with shape: (64,) layer1_1_conv1_weight is done with shape: (64, 64, 1, 3, 3) layer1_1_batchnorm1_gamma is done with shape: (64,) layer1_1_batchnorm1_beta is done with shape: (64,) layer1_1_batchnorm1_running_mean is done with shape: (64,) layer1_1_batchnorm1_running_var is done with shape: (64,) layer1_1_conv2_weight is done with shape: (256, 64, 1, 1, 1) layer1_1_batchnorm2_gamma is done with shape: (256,) layer1_1_batchnorm2_beta is done with shape: (256,) layer1_1_batchnorm2_running_mean is done with shape: (256,) layer1_1_batchnorm2_running_var is done with shape: (256,) layer1_2_conv0_weight is done with shape: (64, 256, 3, 1, 1) layer1_2_batchnorm0_gamma is done with shape: (64,) layer1_2_batchnorm0_beta is done with shape: (64,) layer1_2_batchnorm0_running_mean is done with shape: (64,) layer1_2_batchnorm0_running_var is done with shape: (64,) layer1_2_conv1_weight is done with shape: (64, 64, 1, 3, 3) layer1_2_batchnorm1_gamma is done with shape: (64,) layer1_2_batchnorm1_beta is done with shape: (64,) layer1_2_batchnorm1_running_mean is done with shape: (64,) layer1_2_batchnorm1_running_var is done with shape: (64,) layer1_2_conv2_weight is done with shape: (256, 64, 1, 1, 1) layer1_2_batchnorm2_gamma is done with shape: (256,) layer1_2_batchnorm2_beta is done with shape: (256,) layer1_2_batchnorm2_running_mean is done with shape: (256,) layer1_2_batchnorm2_running_var is done with shape: (256,) layer2_0_conv0_weight is done with shape: (128, 256, 3, 1, 1) layer2_0_batchnorm0_gamma is done with shape: (128,) layer2_0_batchnorm0_beta is done with shape: (128,) layer2_0_batchnorm0_running_mean is done with shape: (128,) layer2_0_batchnorm0_running_var is done with shape: (128,) layer2_0_conv1_weight is done with shape: (128, 128, 1, 3, 3) layer2_0_batchnorm1_gamma is done with shape: (128,) layer2_0_batchnorm1_beta is done with shape: (128,) layer2_0_batchnorm1_running_mean is done with shape: (128,) layer2_0_batchnorm1_running_var is done with shape: (128,) layer2_0_conv2_weight is done with shape: (512, 128, 1, 1, 1) layer2_0_batchnorm2_gamma is done with shape: (512,) layer2_0_batchnorm2_beta is done with shape: (512,) layer2_0_batchnorm2_running_mean is done with shape: (512,) layer2_0_batchnorm2_running_var is done with shape: (512,) layer2_downsample_conv0_weight is done with shape: (512, 256, 1, 1, 1) layer2_downsample_batchnorm0_gamma is done with shape: (512,) layer2_downsample_batchnorm0_beta is done with shape: (512,) layer2_downsample_batchnorm0_running_mean is done with shape: (512,) layer2_downsample_batchnorm0_running_var is done with shape: (512,) layer2_1_conv0_weight is done with shape: (128, 512, 1, 1, 1) layer2_1_batchnorm0_gamma is done with shape: (128,) layer2_1_batchnorm0_beta is done with shape: (128,) layer2_1_batchnorm0_running_mean is done with shape: (128,) layer2_1_batchnorm0_running_var is done with shape: (128,) layer2_1_conv1_weight is done with shape: (128, 128, 1, 3, 3) layer2_1_batchnorm1_gamma is done with shape: (128,) layer2_1_batchnorm1_beta is done with shape: (128,) layer2_1_batchnorm1_running_mean is done with shape: (128,) layer2_1_batchnorm1_running_var is done with shape: (128,) layer2_1_conv2_weight is done with shape: (512, 128, 1, 1, 1) layer2_1_batchnorm2_gamma is done with shape: (512,) layer2_1_batchnorm2_beta is done with shape: (512,) layer2_1_batchnorm2_running_mean is done with shape: (512,) layer2_1_batchnorm2_running_var is done with shape: (512,) layer2_2_conv0_weight is done with shape: (128, 512, 3, 1, 1) layer2_2_batchnorm0_gamma is done with shape: (128,) layer2_2_batchnorm0_beta is done with shape: (128,) layer2_2_batchnorm0_running_mean is done with shape: (128,) layer2_2_batchnorm0_running_var is done with shape: (128,) layer2_2_conv1_weight is done with shape: (128, 128, 1, 3, 3) layer2_2_batchnorm1_gamma is done with shape: (128,) layer2_2_batchnorm1_beta is done with shape: (128,) layer2_2_batchnorm1_running_mean is done with shape: (128,) layer2_2_batchnorm1_running_var is done with shape: (128,) layer2_2_conv2_weight is done with shape: (512, 128, 1, 1, 1) layer2_2_batchnorm2_gamma is done with shape: (512,) layer2_2_batchnorm2_beta is done with shape: (512,) layer2_2_batchnorm2_running_mean is done with shape: (512,) layer2_2_batchnorm2_running_var is done with shape: (512,) layer2_3_conv0_weight is done with shape: (128, 512, 1, 1, 1) layer2_3_batchnorm0_gamma is done with shape: (128,) layer2_3_batchnorm0_beta is done with shape: (128,) layer2_3_batchnorm0_running_mean is done with shape: (128,) layer2_3_batchnorm0_running_var is done with shape: (128,) layer2_3_conv1_weight is done with shape: (128, 128, 1, 3, 3) layer2_3_batchnorm1_gamma is done with shape: (128,) layer2_3_batchnorm1_beta is done with shape: (128,) layer2_3_batchnorm1_running_mean is done with shape: (128,) layer2_3_batchnorm1_running_var is done with shape: (128,) layer2_3_conv2_weight is done with shape: (512, 128, 1, 1, 1) layer2_3_batchnorm2_gamma is done with shape: (512,) layer2_3_batchnorm2_beta is done with shape: (512,) layer2_3_batchnorm2_running_mean is done with shape: (512,) layer2_3_batchnorm2_running_var is done with shape: (512,) layer3_0_conv0_weight is done with shape: (256, 512, 3, 1, 1) layer3_0_batchnorm0_gamma is done with shape: (256,) layer3_0_batchnorm0_beta is done with shape: (256,) layer3_0_batchnorm0_running_mean is done with shape: (256,) layer3_0_batchnorm0_running_var is done with shape: (256,) layer3_0_conv1_weight is done with shape: (256, 256, 1, 3, 3) layer3_0_batchnorm1_gamma is done with shape: (256,) layer3_0_batchnorm1_beta is done with shape: (256,) layer3_0_batchnorm1_running_mean is done with shape: (256,) layer3_0_batchnorm1_running_var is done with shape: (256,) layer3_0_conv2_weight is done with shape: (1024, 256, 1, 1, 1) layer3_0_batchnorm2_gamma is done with shape: (1024,) layer3_0_batchnorm2_beta is done with shape: (1024,) layer3_0_batchnorm2_running_mean is done with shape: (1024,) layer3_0_batchnorm2_running_var is done with shape: (1024,) layer3_downsample_conv0_weight is done with shape: (1024, 512, 1, 1, 1) layer3_downsample_batchnorm0_gamma is done with shape: (1024,) layer3_downsample_batchnorm0_beta is done with shape: (1024,) layer3_downsample_batchnorm0_running_mean is done with shape: (1024,) layer3_downsample_batchnorm0_running_var is done with shape: (1024,) layer3_1_conv0_weight is done with shape: (256, 1024, 1, 1, 1) layer3_1_batchnorm0_gamma is done with shape: (256,) layer3_1_batchnorm0_beta is done with shape: (256,) layer3_1_batchnorm0_running_mean is done with shape: (256,) layer3_1_batchnorm0_running_var is done with shape: (256,) layer3_1_conv1_weight is done with shape: (256, 256, 1, 3, 3) layer3_1_batchnorm1_gamma is done with shape: (256,) layer3_1_batchnorm1_beta is done with shape: (256,) layer3_1_batchnorm1_running_mean is done with shape: (256,) layer3_1_batchnorm1_running_var is done with shape: (256,) layer3_1_conv2_weight is done with shape: (1024, 256, 1, 1, 1) layer3_1_batchnorm2_gamma is done with shape: (1024,) layer3_1_batchnorm2_beta is done with shape: (1024,) layer3_1_batchnorm2_running_mean is done with shape: (1024,) layer3_1_batchnorm2_running_var is done with shape: (1024,) layer3_2_conv0_weight is done with shape: (256, 1024, 3, 1, 1) layer3_2_batchnorm0_gamma is done with shape: (256,) layer3_2_batchnorm0_beta is done with shape: (256,) layer3_2_batchnorm0_running_mean is done with shape: (256,) layer3_2_batchnorm0_running_var is done with shape: (256,) layer3_2_conv1_weight is done with shape: (256, 256, 1, 3, 3) layer3_2_batchnorm1_gamma is done with shape: (256,) layer3_2_batchnorm1_beta is done with shape: (256,) layer3_2_batchnorm1_running_mean is done with shape: (256,) layer3_2_batchnorm1_running_var is done with shape: (256,) layer3_2_conv2_weight is done with shape: (1024, 256, 1, 1, 1) layer3_2_batchnorm2_gamma is done with shape: (1024,) layer3_2_batchnorm2_beta is done with shape: (1024,) layer3_2_batchnorm2_running_mean is done with shape: (1024,) layer3_2_batchnorm2_running_var is done with shape: (1024,) layer3_3_conv0_weight is done with shape: (256, 1024, 1, 1, 1) layer3_3_batchnorm0_gamma is done with shape: (256,) layer3_3_batchnorm0_beta is done with shape: (256,) layer3_3_batchnorm0_running_mean is done with shape: (256,) layer3_3_batchnorm0_running_var is done with shape: (256,) layer3_3_conv1_weight is done with shape: (256, 256, 1, 3, 3) layer3_3_batchnorm1_gamma is done with shape: (256,) layer3_3_batchnorm1_beta is done with shape: (256,) layer3_3_batchnorm1_running_mean is done with shape: (256,) layer3_3_batchnorm1_running_var is done with shape: (256,) layer3_3_conv2_weight is done with shape: (1024, 256, 1, 1, 1) layer3_3_batchnorm2_gamma is done with shape: (1024,) layer3_3_batchnorm2_beta is done with shape: (1024,) layer3_3_batchnorm2_running_mean is done with shape: (1024,) layer3_3_batchnorm2_running_var is done with shape: (1024,) layer3_4_conv0_weight is done with shape: (256, 1024, 3, 1, 1) layer3_4_batchnorm0_gamma is done with shape: (256,) layer3_4_batchnorm0_beta is done with shape: (256,) layer3_4_batchnorm0_running_mean is done with shape: (256,) layer3_4_batchnorm0_running_var is done with shape: (256,) layer3_4_conv1_weight is done with shape: (256, 256, 1, 3, 3) layer3_4_batchnorm1_gamma is done with shape: (256,) layer3_4_batchnorm1_beta is done with shape: (256,) layer3_4_batchnorm1_running_mean is done with shape: (256,) layer3_4_batchnorm1_running_var is done with shape: (256,) layer3_4_conv2_weight is done with shape: (1024, 256, 1, 1, 1) layer3_4_batchnorm2_gamma is done with shape: (1024,) layer3_4_batchnorm2_beta is done with shape: (1024,) layer3_4_batchnorm2_running_mean is done with shape: (1024,) layer3_4_batchnorm2_running_var is done with shape: (1024,) layer3_5_conv0_weight is done with shape: (256, 1024, 1, 1, 1) layer3_5_batchnorm0_gamma is done with shape: (256,) layer3_5_batchnorm0_beta is done with shape: (256,) layer3_5_batchnorm0_running_mean is done with shape: (256,) layer3_5_batchnorm0_running_var is done with shape: (256,) layer3_5_conv1_weight is done with shape: (256, 256, 1, 3, 3) layer3_5_batchnorm1_gamma is done with shape: (256,) layer3_5_batchnorm1_beta is done with shape: (256,) layer3_5_batchnorm1_running_mean is done with shape: (256,) layer3_5_batchnorm1_running_var is done with shape: (256,) layer3_5_conv2_weight is done with shape: (1024, 256, 1, 1, 1) layer3_5_batchnorm2_gamma is done with shape: (1024,) layer3_5_batchnorm2_beta is done with shape: (1024,) layer3_5_batchnorm2_running_mean is done with shape: (1024,) layer3_5_batchnorm2_running_var is done with shape: (1024,) layer4_0_conv0_weight is done with shape: (512, 1024, 1, 1, 1) layer4_0_batchnorm0_gamma is done with shape: (512,) layer4_0_batchnorm0_beta is done with shape: (512,) layer4_0_batchnorm0_running_mean is done with shape: (512,) layer4_0_batchnorm0_running_var is done with shape: (512,) layer4_0_conv1_weight is done with shape: (512, 512, 1, 3, 3) layer4_0_batchnorm1_gamma is done with shape: (512,) layer4_0_batchnorm1_beta is done with shape: (512,) layer4_0_batchnorm1_running_mean is done with shape: (512,) layer4_0_batchnorm1_running_var is done with shape: (512,) layer4_0_conv2_weight is done with shape: (2048, 512, 1, 1, 1) layer4_0_batchnorm2_gamma is done with shape: (2048,) layer4_0_batchnorm2_beta is done with shape: (2048,) layer4_0_batchnorm2_running_mean is done with shape: (2048,) layer4_0_batchnorm2_running_var is done with shape: (2048,) layer4_downsample_conv0_weight is done with shape: (2048, 1024, 1, 1, 1) layer4_downsample_batchnorm0_gamma is done with shape: (2048,) layer4_downsample_batchnorm0_beta is done with shape: (2048,) layer4_downsample_batchnorm0_running_mean is done with shape: (2048,) layer4_downsample_batchnorm0_running_var is done with shape: (2048,) layer4_1_conv0_weight is done with shape: (512, 2048, 3, 1, 1) layer4_1_batchnorm0_gamma is done with shape: (512,) layer4_1_batchnorm0_beta is done with shape: (512,) layer4_1_batchnorm0_running_mean is done with shape: (512,) layer4_1_batchnorm0_running_var is done with shape: (512,) layer4_1_conv1_weight is done with shape: (512, 512, 1, 3, 3) layer4_1_batchnorm1_gamma is done with shape: (512,) layer4_1_batchnorm1_beta is done with shape: (512,) layer4_1_batchnorm1_running_mean is done with shape: (512,) layer4_1_batchnorm1_running_var is done with shape: (512,) layer4_1_conv2_weight is done with shape: (2048, 512, 1, 1, 1) layer4_1_batchnorm2_gamma is done with shape: (2048,) layer4_1_batchnorm2_beta is done with shape: (2048,) layer4_1_batchnorm2_running_mean is done with shape: (2048,) layer4_1_batchnorm2_running_var is done with shape: (2048,) layer4_2_conv0_weight is done with shape: (512, 2048, 1, 1, 1) layer4_2_batchnorm0_gamma is done with shape: (512,) layer4_2_batchnorm0_beta is done with shape: (512,) layer4_2_batchnorm0_running_mean is done with shape: (512,) layer4_2_batchnorm0_running_var is done with shape: (512,) layer4_2_conv1_weight is done with shape: (512, 512, 1, 3, 3) layer4_2_batchnorm1_gamma is done with shape: (512,) layer4_2_batchnorm1_beta is done with shape: (512,) layer4_2_batchnorm1_running_mean is done with shape: (512,) layer4_2_batchnorm1_running_var is done with shape: (512,) layer4_2_conv2_weight is done with shape: (2048, 512, 1, 1, 1) layer4_2_batchnorm2_gamma is done with shape: (2048,) layer4_2_batchnorm2_beta is done with shape: (2048,) layer4_2_batchnorm2_running_mean is done with shape: (2048,) layer4_2_batchnorm2_running_var is done with shape: (2048,) dense2_weight is skipped with shape: (101, 2048) dense2_bias is skipped with shape: (101,) Downloading /root/.mxnet/models/i3d_resnet50_v1_kinetics400-568a722e.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/i3d_resnet50_v1_kinetics400-568a722e.zip... 0%| | 0/208483 [00:00 64, kernel_size=(5, 7, 7), stride=(2, 2, 2), padding=(2, 3, 3), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (2): Activation(relu) (3): MaxPool3D(size=(1, 3, 3), stride=(2, 2, 2), padding=(0, 1, 1), ceil_mode=False, global_pool=False, pool_type=max, layout=NCDHW) ) (pool2): MaxPool3D(size=(2, 1, 1), stride=(2, 1, 1), padding=(0, 0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCDHW) (res_layers): HybridSequential( (0): HybridSequential( (0): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(64 -> 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (2): Activation(relu) (3): Conv3D(64 -> 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (5): Activation(relu) (6): Conv3D(64 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) ) (conv1): Conv3D(64 -> 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(64 -> 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (conv3): Conv3D(64 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu): Activation(relu) (downsample): HybridSequential( (0): Conv3D(64 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=True, in_channels=256) ) ) (1): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(256 -> 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (2): Activation(relu) (3): Conv3D(64 -> 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (5): Activation(relu) (6): Conv3D(64 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) ) (conv1): Conv3D(256 -> 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(64 -> 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (conv3): Conv3D(64 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu): Activation(relu) ) (2): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(256 -> 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (2): Activation(relu) (3): Conv3D(64 -> 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (5): Activation(relu) (6): Conv3D(64 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) ) (conv1): Conv3D(256 -> 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(64 -> 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (conv3): Conv3D(64 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu): Activation(relu) ) ) (1): HybridSequential( (0): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(256 -> 128, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (2): Activation(relu) (3): Conv3D(128 -> 128, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (5): Activation(relu) (6): Conv3D(128 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) ) (conv1): Conv3D(256 -> 128, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(128 -> 128, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (conv3): Conv3D(128 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu): Activation(relu) (downsample): HybridSequential( (0): Conv3D(256 -> 512, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=True, in_channels=512) ) ) (1): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(512 -> 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (2): Activation(relu) (3): Conv3D(128 -> 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (5): Activation(relu) (6): Conv3D(128 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) ) (conv1): Conv3D(512 -> 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (conv2): Conv3D(128 -> 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (conv3): Conv3D(128 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu): Activation(relu) ) (2): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(512 -> 128, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (2): Activation(relu) (3): Conv3D(128 -> 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (5): Activation(relu) (6): Conv3D(128 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) ) (conv1): Conv3D(512 -> 128, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(128 -> 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (conv3): Conv3D(128 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu): Activation(relu) ) (3): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(512 -> 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (2): Activation(relu) (3): Conv3D(128 -> 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (5): Activation(relu) (6): Conv3D(128 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) ) (conv1): Conv3D(512 -> 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (conv2): Conv3D(128 -> 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (conv3): Conv3D(128 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu): Activation(relu) ) ) (2): HybridSequential( (0): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(512 -> 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (2): Activation(relu) (3): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (5): Activation(relu) (6): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) ) (conv1): Conv3D(512 -> 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (conv3): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu): Activation(relu) (downsample): HybridSequential( (0): Conv3D(512 -> 1024, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=True, in_channels=1024) ) ) (1): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(1024 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (2): Activation(relu) (3): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (5): Activation(relu) (6): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) ) (conv1): Conv3D(1024 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (conv2): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (conv3): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu): Activation(relu) ) (2): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(1024 -> 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (2): Activation(relu) (3): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (5): Activation(relu) (6): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) ) (conv1): Conv3D(1024 -> 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (conv3): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu): Activation(relu) ) (3): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(1024 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (2): Activation(relu) (3): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (5): Activation(relu) (6): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) ) (conv1): Conv3D(1024 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (conv2): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (conv3): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu): Activation(relu) ) (4): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(1024 -> 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (2): Activation(relu) (3): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (5): Activation(relu) (6): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) ) (conv1): Conv3D(1024 -> 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (conv3): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu): Activation(relu) ) (5): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(1024 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (2): Activation(relu) (3): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (5): Activation(relu) (6): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) ) (conv1): Conv3D(1024 -> 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (conv2): Conv3D(256 -> 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (conv3): Conv3D(256 -> 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu): Activation(relu) ) ) (3): HybridSequential( (0): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(1024 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (2): Activation(relu) (3): Conv3D(512 -> 512, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (5): Activation(relu) (6): Conv3D(512 -> 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) ) (conv1): Conv3D(1024 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (conv2): Conv3D(512 -> 512, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (conv3): Conv3D(512 -> 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) (relu): Activation(relu) (downsample): HybridSequential( (0): Conv3D(1024 -> 2048, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=True, in_channels=2048) ) ) (1): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(2048 -> 512, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (2): Activation(relu) (3): Conv3D(512 -> 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (5): Activation(relu) (6): Conv3D(512 -> 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) ) (conv1): Conv3D(2048 -> 512, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False) (conv2): Conv3D(512 -> 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (conv3): Conv3D(512 -> 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) (relu): Activation(relu) ) (2): Bottleneck( (bottleneck): HybridSequential( (0): Conv3D(2048 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (2): Activation(relu) (3): Conv3D(512 -> 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (5): Activation(relu) (6): Conv3D(512 -> 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) ) (conv1): Conv3D(2048 -> 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (conv2): Conv3D(512 -> 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (conv3): Conv3D(512 -> 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) (relu): Activation(relu) ) ) ) (st_avg): GlobalAvgPool3D(size=(1, 1, 1), stride=(1, 1, 1), padding=(0, 0, 0), ceil_mode=True, global_pool=True, pool_type=avg, layout=NCDHW) (head): HybridSequential( (0): Dropout(p = 0.8, axes=()) (1): Dense(2048 -> 101, linear) ) (fc): Dense(2048 -> 101, linear) ) .. GENERATED FROM PYTHON SOURCE LINES 133-138 We also provide other customized network architectures for you to use on your own dataset. You can simply change the ``dataset`` part in any pretrained model name to ``custom``, e.g., ``slowfast_4x16_resnet50_kinetics400`` to ``slowfast_4x16_resnet50_custom``. Once you have the dataloader and network for your own dataset, the rest is the same as in previous tutorials. Just define the optimizer, loss and metric, and kickstart the training. .. GENERATED FROM PYTHON SOURCE LINES 141-143 Optimizer, Loss and Metric -------------------------- .. GENERATED FROM PYTHON SOURCE LINES 143-157 .. code-block:: default # Learning rate decay factor lr_decay = 0.1 # Epochs where learning rate decays lr_decay_epoch = [40, 80, 100] # Stochastic gradient descent optimizer = 'sgd' # Set parameters optimizer_params = {'learning_rate': 0.001, 'wd': 0.0001, 'momentum': 0.9} # Define our trainer for net trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params) .. GENERATED FROM PYTHON SOURCE LINES 158-161 In order to optimize our model, we need a loss function. For classification tasks, we usually use softmax cross entropy as the loss function. .. GENERATED FROM PYTHON SOURCE LINES 161-164 .. code-block:: default loss_fn = gluon.loss.SoftmaxCrossEntropyLoss() .. GENERATED FROM PYTHON SOURCE LINES 165-168 For simplicity, we use accuracy as the metric to monitor our training process. Besides, we record metric values, and will print them at the end of training. .. GENERATED FROM PYTHON SOURCE LINES 168-172 .. code-block:: default train_metric = mx.metric.Accuracy() train_history = TrainingHistory(['training-acc']) .. GENERATED FROM PYTHON SOURCE LINES 173-182 Training -------- After all the preparations, we can finally start training! Following is the script. .. note:: In order to finish the tutorial quickly, we only fine tune for 3 epochs, and 100 iterations per epoch for UCF101. In your experiments, you can set the hyper-parameters depending on your dataset. .. GENERATED FROM PYTHON SOURCE LINES 182-235 .. code-block:: default epochs = 0 lr_decay_count = 0 for epoch in range(epochs): tic = time.time() train_metric.reset() train_loss = 0 # Learning rate decay if epoch == lr_decay_epoch[lr_decay_count]: trainer.set_learning_rate(trainer.learning_rate*lr_decay) lr_decay_count += 1 # Loop through each batch of training data for i, batch in enumerate(train_data): # Extract data and label data = split_and_load(batch[0], ctx_list=ctx, batch_axis=0) label = split_and_load(batch[1], ctx_list=ctx, batch_axis=0) # AutoGrad with ag.record(): output = [] for _, X in enumerate(data): X = X.reshape((-1,) + X.shape[2:]) pred = net(X) output.append(pred) loss = [loss_fn(yhat, y) for yhat, y in zip(output, label)] # Backpropagation for l in loss: l.backward() # Optimize trainer.step(batch_size) # Update metrics train_loss += sum([l.mean().asscalar() for l in loss]) train_metric.update(label, output) if i == 100: break name, acc = train_metric.get() # Update history and print metrics train_history.update([acc]) print('[Epoch %d] train=%f loss=%f time: %f' % (epoch, acc, train_loss / (i+1), time.time()-tic)) # We can plot the metric scores with: train_history.plot() .. image-sg:: /build/examples_action_recognition/images/sphx_glr_finetune_custom_001.png :alt: finetune custom :srcset: /build/examples_action_recognition/images/sphx_glr_finetune_custom_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 236-240 We can see that the training accuracy increase quickly. Actually, if you look back tutorial 4 (Dive Deep into Training I3D mdoels on Kinetcis400) and compare the training curve, you will see fine-tuning can achieve much better result using much less time. Try fine-tuning other SOTA video models on your own dataset and see how it goes. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 5.860 seconds) .. _sphx_glr_download_build_examples_action_recognition_finetune_custom.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: finetune_custom.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: finetune_custom.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_