.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "build/examples_classification/transfer_learning_minc.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_build_examples_classification_transfer_learning_minc.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_build_examples_classification_transfer_learning_minc.py:

4. Transfer Learning with Your Own Image Dataset
=======================================================

Dataset size is a big factor in the performance of deep learning models.
``ImageNet`` has over one million labeled images, but
we often don't have so much labeled data in other domains.
Training a deep learning models on small datasets may lead to severe overfitting.

Transfer learning is a technique that addresses this problem.
The idea is simple: we can start training with a pre-trained model,
instead of starting from scratch.
As Isaac Newton said, "If I have seen further it is by standing on the
shoulders of Giants".

In this tutorial, we will explain the basics of transfer
learning, and apply it to the ``MINC-2500`` dataset.

Data Preparation
----------------

`MINC <http://opensurfaces.cs.cornell.edu/publications/minc/>`__ is
short for Materials in Context Database, provided by Cornell.
``MINC-2500`` is a resized subset of ``MINC`` with 23 classes, and 2500
images in each class. It is well labeled and has a moderate size thus is
perfect to be our example.

|image-minc|

To start, we first download ``MINC-2500`` from
`here <http://opensurfaces.cs.cornell.edu/publications/minc/>`__.
Suppose we have the data downloaded to ``~/data/`` and
extracted to ``~/data/minc-2500``.

After extraction, it occupies around 2.6GB disk space with the following
structure:

::

    minc-2500
    ├── README.txt
    ├── categories.txt
    ├── images
    └── labels

The ``images`` folder has 23 sub-folders for 23 classes, and ``labels``
folder contains five different splits for training, validation, and test.

We have written a script to prepare the data for you:

:download:`Download prepare_minc.py<../../../scripts/classification/finetune/prepare_minc.py>`

Run it with

::

    python prepare_minc.py --data ~/data/minc-2500 --split 1

Now we have the following structure:

::

    minc-2500
    ├── categories.txt
    ├── images
    ├── labels
    ├── README.txt
    ├── test
    ├── train
    └── val

In order to go through this tutorial within a reasonable amount of time,
we have prepared a small subset of the ``MINC-2500`` dataset,
but you should substitute it with the original dataset for your experiments.
We can download and extract it with:

.. GENERATED FROM PYTHON SOURCE LINES 79-88

.. code-block:: default


    import zipfile, os
    from gluoncv.utils import download

    file_url = 'https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/classification/minc-2500-tiny.zip'
    zip_file = download(file_url, path='./')
    with zipfile.ZipFile(zip_file, 'r') as zin:
        zin.extractall(os.path.expanduser('./'))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Downloading ./minc-2500-tiny.zip from https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/classification/minc-2500-tiny.zip...
      0%|          | 0/8037 [00:00<?, ?KB/s]    8038KB [00:00, 92173.94KB/s]            


.. GENERATED FROM PYTHON SOURCE LINES 89-93

Hyperparameters
----------

First, let's import all other necessary libraries.

.. GENERATED FROM PYTHON SOURCE LINES 93-105

.. code-block:: default


    import mxnet as mx
    import numpy as np
    import os, time, shutil

    from mxnet import gluon, image, init, nd
    from mxnet import autograd as ag
    from mxnet.gluon import nn
    from mxnet.gluon.data.vision import transforms
    from gluoncv.utils import makedirs
    from gluoncv.model_zoo import get_model


.. GENERATED FROM PYTHON SOURCE LINES 106-107

We set the hyperparameters as following:

.. GENERATED FROM PYTHON SOURCE LINES 107-124

.. code-block:: default


    classes = 23

    epochs = 5
    lr = 0.001
    per_device_batch_size = 1
    momentum = 0.9
    wd = 0.0001

    lr_factor = 0.75
    lr_steps = [10, 20, 30, np.inf]

    num_gpus = 1
    num_workers = 8
    ctx = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()]
    batch_size = per_device_batch_size * max(num_gpus, 1)


.. GENERATED FROM PYTHON SOURCE LINES 125-144

Things to keep in mind:

1. ``epochs = 5`` is just for this tutorial with the tiny dataset. please change it to a larger number in your experiments, for instance 40.
2. ``per_device_batch_size`` is also set to a small number. In your experiments you can try larger number like 64.
3. remember to tune ``num_gpus`` and ``num_workers`` according to your machine.
4. A pre-trained model is already in a pretty good status. So we can start with a small ``lr``.

Data Augmentation
-----------------

In transfer learning, data augmentation can also help.
We use the following augmentation in training:

2. Randomly crop the image and resize it to 224x224
3. Randomly flip the image horizontally
4. Randomly jitter color and add noise
5. Transpose the data from height*width*num_channels to num_channels*height*width, and map values from [0, 255] to [0, 1]
6. Normalize with the mean and standard deviation from the ImageNet dataset.


.. GENERATED FROM PYTHON SOURCE LINES 144-164

.. code-block:: default

    jitter_param = 0.4
    lighting_param = 0.1

    transform_train = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomFlipLeftRight(),
        transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param,
                                     saturation=jitter_param),
        transforms.RandomLighting(lighting_param),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    transform_test = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])


.. GENERATED FROM PYTHON SOURCE LINES 165-166

With the data augmentation functions, we can define our data loaders:

.. GENERATED FROM PYTHON SOURCE LINES 166-184

.. code-block:: default


    path = './minc-2500-tiny'
    train_path = os.path.join(path, 'train')
    val_path = os.path.join(path, 'val')
    test_path = os.path.join(path, 'test')

    train_data = gluon.data.DataLoader(
        gluon.data.vision.ImageFolderDataset(train_path).transform_first(transform_train),
        batch_size=batch_size, shuffle=True, num_workers=num_workers)

    val_data = gluon.data.DataLoader(
        gluon.data.vision.ImageFolderDataset(val_path).transform_first(transform_test),
        batch_size=batch_size, shuffle=False, num_workers = num_workers)

    test_data = gluon.data.DataLoader(
        gluon.data.vision.ImageFolderDataset(test_path).transform_first(transform_test),
        batch_size=batch_size, shuffle=False, num_workers = num_workers)


.. GENERATED FROM PYTHON SOURCE LINES 185-194

Note that only ``train_data`` uses ``transform_train``, while
``val_data`` and ``test_data`` use ``transform_test`` to produce deterministic
results for evaluation.

Model and Trainer
-----------------

We use a pre-trained ``ResNet50_v2`` model, which has balanced accuracy and
computation cost.

.. GENERATED FROM PYTHON SOURCE LINES 195-209

.. code-block:: default


    model_name = 'ResNet50_v2'
    finetune_net = get_model(model_name, pretrained=True)
    with finetune_net.name_scope():
        finetune_net.output = nn.Dense(classes)
    finetune_net.output.initialize(init.Xavier(), ctx = ctx)
    finetune_net.collect_params().reset_ctx(ctx)
    finetune_net.hybridize()

    trainer = gluon.Trainer(finetune_net.collect_params(), 'sgd', {
                            'learning_rate': lr, 'momentum': momentum, 'wd': wd})
    metric = mx.metric.Accuracy()
    L = gluon.loss.SoftmaxCrossEntropyLoss()


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Downloading /root/.mxnet/models/resnet50_v2-ecdde353.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet50_v2-ecdde353.zip...
      0%|          | 0/92862 [00:00<?, ?KB/s]      0%|          | 100/92862 [00:00<01:58, 780.93KB/s]      1%|          | 507/92862 [00:00<00:42, 2184.10KB/s]      2%|2         | 2179/92862 [00:00<00:12, 7118.55KB/s]      8%|8         | 7641/92862 [00:00<00:03, 23097.88KB/s]     14%|#3        | 12735/92862 [00:00<00:02, 32134.74KB/s]     22%|##1       | 20388/92862 [00:00<00:01, 46198.04KB/s]     29%|##8       | 26538/92862 [00:00<00:01, 50964.15KB/s]     37%|###7      | 34677/92862 [00:00<00:01, 57370.37KB/s]     46%|####5     | 42517/92862 [00:01<00:00, 63571.97KB/s]     53%|#####3    | 49220/92862 [00:01<00:00, 64596.44KB/s]     62%|######1   | 57553/92862 [00:01<00:00, 70161.55KB/s]     70%|######9   | 64781/92862 [00:01<00:00, 70790.83KB/s]     78%|#######8  | 72893/92862 [00:01<00:00, 73875.46KB/s]     86%|########6 | 80320/92862 [00:01<00:00, 73447.70KB/s]     96%|#########5| 88777/92862 [00:01<00:00, 76757.41KB/s]    92863KB [00:01, 55900.26KB/s]                           


.. GENERATED FROM PYTHON SOURCE LINES 210-225

Here's an illustration of the pre-trained model
and our newly defined model:

|image-model|

Specifically, we define the new model by::

1. load the pre-trained model
2. re-define the output layer for the new task
3. train the network

This is called "fine-tuning", i.e. we have a model trained on another task,
and we would like to tune it for the dataset we have in hand.

We define a evaluation function for validation and testing.

.. GENERATED FROM PYTHON SOURCE LINES 225-236

.. code-block:: default


    def test(net, val_data, ctx):
        metric = mx.metric.Accuracy()
        for i, batch in enumerate(val_data):
            data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0, even_split=False)
            label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0, even_split=False)
            outputs = [net(X) for X in data]
            metric.update(label, outputs)

        return metric.get()


.. GENERATED FROM PYTHON SOURCE LINES 237-249

Training Loop
-------------

Following is the main training loop. It is the same as the loop in
`CIFAR10 <dive_deep_cifar10.html>`__
and ImageNet.

.. note::

    Once again, in order to go through the tutorial faster, we are training on a small
    subset of the original ``MINC-2500`` dataset, and for only 5 epochs. By training on the
    full dataset with 40 epochs, it is expected to get accuracy around 80% on test data.

.. GENERATED FROM PYTHON SOURCE LINES 249-287

.. code-block:: default


    lr_counter = 0
    num_batch = len(train_data)

    for epoch in range(epochs):
        if epoch == lr_steps[lr_counter]:
            trainer.set_learning_rate(trainer.learning_rate*lr_factor)
            lr_counter += 1

        tic = time.time()
        train_loss = 0
        metric.reset()

        for i, batch in enumerate(train_data):
            data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0, even_split=False)
            label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0, even_split=False)
            with ag.record():
                outputs = [finetune_net(X) for X in data]
                loss = [L(yhat, y) for yhat, y in zip(outputs, label)]
            for l in loss:
                l.backward()

            trainer.step(batch_size)
            train_loss += sum([l.mean().asscalar() for l in loss]) / len(loss)

            metric.update(label, outputs)

        _, train_acc = metric.get()
        train_loss /= num_batch

        _, val_acc = test(finetune_net, val_data, ctx)

        print('[Epoch %d] Train-acc: %.3f, loss: %.3f | Val-acc: %.3f | time: %.1f' %
                 (epoch, train_acc, train_loss, val_acc, time.time() - tic))

    _, test_acc = test(finetune_net, test_data, ctx)
    print('[Finished] Test-acc: %.3f' % (test_acc))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    [Epoch 0] Train-acc: 0.026, loss: 4.044 | Val-acc: 0.065 | time: 4.6
    [Epoch 1] Train-acc: 0.017, loss: 4.177 | Val-acc: 0.022 | time: 3.0
    [Epoch 2] Train-acc: 0.035, loss: 4.017 | Val-acc: 0.043 | time: 3.0
    [Epoch 3] Train-acc: 0.009, loss: 3.971 | Val-acc: 0.022 | time: 3.0
    [Epoch 4] Train-acc: 0.009, loss: 3.643 | Val-acc: 0.043 | time: 3.0
    [Finished] Test-acc: 0.087


.. GENERATED FROM PYTHON SOURCE LINES 288-302

Next
----

Now that you have learned to muster the power of transfer
learning, to learn more about training a model on
ImageNet, please read `this tutorial <dive_deep_imagenet.html>`__.

The idea of transfer learning is the basis of
`object detection <../examples_detection/index.html>`_ and
`semantic segmentation <../examples_segmentation/index.html>`_,
the next two chapters of our tutorial.

.. |image-minc| image:: https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/datasets/MINC-2500.png
.. |image-model| image:: https://zh.gluon.ai/_images/fine-tuning.svg


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  21.406 seconds)


.. _sphx_glr_download_build_examples_classification_transfer_learning_minc.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: transfer_learning_minc.py <transfer_learning_minc.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: transfer_learning_minc.ipynb <transfer_learning_minc.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_