Note

Click here to download the full example code

01. Load web datasets with GluonCV Auto Module¶

This tutorial introduces the basic dataset preprocesses that can be used to download and load arbitrary custom dataset as long as they follow certain supported data formats.

The current version supports loading datasets for - Image Classification(with csv lists and raw images, or folder separated raw images) - Object Detection(as in Pascal VOC format or COCO json annatations)

Stay tuned for new applications and formats, we are also looking forward to seeing contributions that brings new formats to GluonCV!

That’s enough introduction, let’s take a look at how web datasets can be loaded into a recommended formats supported in GluonCV auto module.

Image Classification¶

Managing the labels of an image classification dataset is pretty simple. In this example we show a few ways to organize them.

First of all, we could infer labels from nested folder structure automatically like:

root/car/0001.jpg
root/car/xxxa.jpg
root/car/yyyb.jpg
root/bus/123.png
root/bus/023.jpg
root/bus/wwww.jpg

or even more with train/val/test splits like:

root/train/car/0001.jpg
root/train/car/xxxa.jpg
root/train/bus/123.png
root/train/bus/023.jpg
root/test/car/yyyb.jpg
root/test/bus/wwww.jpg

where root is the root folder, car and bus categories are well organized in sub-directories, respectively

from gluoncv.auto.tasks import ImageClassification

We can use ImageClassification.Dataset to load dataset from a folder, here root can be a local path or url, if it’s a url, the archieve file will be downloaded and extracted automatically to ~/.gluoncv by default, to change the default behavior, you may edit ~/.gluoncv/config.yml

train, val, test = ImageClassification.Dataset.from_folders(
    'https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip',
    train='train', val='val', test='test', exts=('.jpg', '.jpeg', '.png'))

Out:

Downloading /root/.gluoncv/archive/shopee-iet.zip from https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip...

  0%|          | 0/40895 [00:00<?, ?KB/s]
  0%|          | 51/40895 [00:00<01:39, 412.28KB/s]
  1%|          | 289/40895 [00:00<00:28, 1438.77KB/s]
  3%|3         | 1258/40895 [00:00<00:08, 4507.74KB/s]
  9%|9         | 3842/40895 [00:00<00:03, 11137.39KB/s]
 17%|#6        | 6914/40895 [00:00<00:02, 16116.47KB/s]
 24%|##4       | 9970/40895 [00:00<00:01, 19051.80KB/s]
 32%|###1      | 13042/40895 [00:00<00:01, 20918.29KB/s]
 39%|###9      | 16114/40895 [00:00<00:01, 22068.94KB/s]
 47%|####6     | 19186/40895 [00:01<00:00, 22863.27KB/s]
 54%|#####4    | 22258/40895 [00:01<00:00, 23348.44KB/s]
 62%|######1   | 25282/40895 [00:01<00:00, 23785.20KB/s]
 69%|######9   | 28273/40895 [00:01<00:00, 25382.63KB/s]
 75%|#######5  | 30835/40895 [00:01<00:00, 24110.76KB/s]
 81%|########1 | 33266/40895 [00:01<00:00, 23008.05KB/s]
 88%|########7 | 35970/40895 [00:01<00:00, 23698.54KB/s]
 95%|#########5| 38930/40895 [00:01<00:00, 24304.62KB/s]
100%|##########| 40895/40895 [00:02<00:00, 20447.05KB/s]
data/
├── test/
└── train/

train split

print('train', train)

Out:

train                                                  image  label
  /root/.gluoncv/datasets/shopee-iet/data/train/...      0
  /root/.gluoncv/datasets/shopee-iet/data/train/...      0
  /root/.gluoncv/datasets/shopee-iet/data/train/...      0
  /root/.gluoncv/datasets/shopee-iet/data/train/...      0
  /root/.gluoncv/datasets/shopee-iet/data/train/...      0
..                                                 ...    ...
/root/.gluoncv/datasets/shopee-iet/data/train/...      3
/root/.gluoncv/datasets/shopee-iet/data/train/...      3
/root/.gluoncv/datasets/shopee-iet/data/train/...      3
/root/.gluoncv/datasets/shopee-iet/data/train/...      3
/root/.gluoncv/datasets/shopee-iet/data/train/...      3

[800 rows x 2 columns]

test split

print('test', test)

Out:

test                                                 image  label
 /root/.gluoncv/datasets/shopee-iet/data/test/B...      0
 /root/.gluoncv/datasets/shopee-iet/data/test/B...      0
 /root/.gluoncv/datasets/shopee-iet/data/test/B...      0
 /root/.gluoncv/datasets/shopee-iet/data/test/B...      0
 /root/.gluoncv/datasets/shopee-iet/data/test/B...      0
..                                                ...    ...
/root/.gluoncv/datasets/shopee-iet/data/test/w...      3
/root/.gluoncv/datasets/shopee-iet/data/test/w...      3
/root/.gluoncv/datasets/shopee-iet/data/test/w...      3
/root/.gluoncv/datasets/shopee-iet/data/test/w...      3
/root/.gluoncv/datasets/shopee-iet/data/test/w...      3

[80 rows x 2 columns]

you may notice that the dataset is a pandas DataFrame, which are handy and it’s okay that certain split is empty, as in this case, validation split is empty

print('validation', val)

Out:

validation Empty ImageClassificationDataset
Columns: [image, label]
Index: []

you may split the train set to train and val for training and validation

train, val, _ = train.random_split(val_size=0.1, test_size=0)
print(len(train), len(val))

Out:

721 79

In some cases, you may get a raw folder without splits, you may use from_folders instead:

dataset = ImageClassification.Dataset.from_folder('https://s3.amazonaws.com/fast-ai-imageclas/oxford-iiit-pet.tgz')

Out:

Downloading /root/.gluoncv/archive/oxford-iiit-pet.tgz from https://s3.amazonaws.com/fast-ai-imageclas/oxford-iiit-pet.tgz...

  0%|          | 0/792683 [00:00<?, ?KB/s]
  1%|1         | 9174/792683 [00:00<00:08, 91710.13KB/s]
  2%|2         | 18501/792683 [00:00<00:08, 92621.53KB/s]
  4%|3         | 27764/792683 [00:00<00:09, 83448.60KB/s]
  5%|4         | 37173/792683 [00:00<00:08, 87385.00KB/s]
  6%|5         | 45997/792683 [00:00<00:08, 84760.22KB/s]
  7%|6         | 54533/792683 [00:00<00:11, 63370.97KB/s]
  8%|7         | 61580/792683 [00:00<00:14, 49460.41KB/s]
  9%|8         | 71043/792683 [00:01<00:12, 59277.35KB/s]
 10%|#         | 80627/792683 [00:01<00:10, 67939.08KB/s]
 11%|#1        | 88398/792683 [00:01<00:10, 65309.23KB/s]
 12%|#2        | 95603/792683 [00:01<00:10, 64708.89KB/s]
 13%|#2        | 102535/792683 [00:01<00:10, 63335.44KB/s]
 14%|#3        | 109181/792683 [00:01<00:11, 60404.43KB/s]
 15%|#4        | 118136/792683 [00:01<00:09, 67976.42KB/s]
 16%|#6        | 127582/792683 [00:01<00:08, 75144.71KB/s]
 17%|#7        | 135371/792683 [00:01<00:09, 69762.13KB/s]
 18%|#8        | 144439/792683 [00:02<00:08, 75366.52KB/s]
 19%|#9        | 152219/792683 [00:02<00:08, 75507.12KB/s]
 20%|##        | 159941/792683 [00:02<00:10, 59474.87KB/s]
 21%|##1       | 166512/792683 [00:02<00:10, 60367.85KB/s]
 22%|##1       | 173005/792683 [00:02<00:10, 58914.48KB/s]
 23%|##2       | 181699/792683 [00:02<00:09, 66061.87KB/s]
 24%|##4       | 190676/792683 [00:02<00:08, 72427.49KB/s]
 25%|##5       | 199250/792683 [00:02<00:07, 76117.15KB/s]
 26%|##6       | 207868/792683 [00:03<00:07, 78902.74KB/s]
 27%|##7       | 216750/792683 [00:03<00:07, 81760.21KB/s]
 28%|##8       | 225684/792683 [00:03<00:06, 83969.83KB/s]
 30%|##9       | 234188/792683 [00:03<00:06, 81017.01KB/s]
 31%|###       | 242384/792683 [00:03<00:07, 77620.08KB/s]
 32%|###1      | 250234/792683 [00:03<00:07, 76599.55KB/s]
 33%|###2      | 258133/792683 [00:03<00:06, 77275.60KB/s]
 34%|###3      | 265906/792683 [00:03<00:09, 57582.32KB/s]
 34%|###4      | 272410/792683 [00:03<00:08, 58012.78KB/s]
 35%|###5      | 279779/792683 [00:04<00:08, 61880.85KB/s]
 36%|###6      | 286708/792683 [00:04<00:08, 57126.95KB/s]
 37%|###7      | 294899/792683 [00:04<00:09, 51322.45KB/s]
 38%|###8      | 303073/792683 [00:04<00:08, 58098.92KB/s]
 39%|###9      | 311584/792683 [00:04<00:07, 64652.80KB/s]
 40%|####      | 320190/792683 [00:04<00:06, 70162.67KB/s]
 41%|####1     | 328745/792683 [00:04<00:06, 74298.56KB/s]
 42%|####2     | 336754/792683 [00:04<00:06, 75904.11KB/s]
 44%|####3     | 345569/792683 [00:05<00:05, 79345.98KB/s]
 45%|####4     | 354172/792683 [00:05<00:05, 81184.15KB/s]
 46%|####5     | 362441/792683 [00:05<00:07, 60572.76KB/s]
 47%|####6     | 371360/792683 [00:05<00:06, 67329.43KB/s]
 48%|####7     | 378880/792683 [00:05<00:06, 67207.93KB/s]
 49%|####8     | 386504/792683 [00:05<00:05, 69543.10KB/s]
 50%|####9     | 394215/792683 [00:05<00:05, 71585.44KB/s]
 51%|#####     | 401692/792683 [00:05<00:05, 72074.74KB/s]
 52%|#####1    | 410634/792683 [00:05<00:04, 76988.39KB/s]
 53%|#####2    | 418670/792683 [00:06<00:04, 77957.13KB/s]
 54%|#####3    | 426873/792683 [00:06<00:04, 79143.65KB/s]
 55%|#####4    | 435910/792683 [00:06<00:04, 82443.54KB/s]
 56%|#####6    | 444800/792683 [00:06<00:04, 84352.36KB/s]
 57%|#####7    | 453291/792683 [00:06<00:04, 84308.51KB/s]
 58%|#####8    | 462409/792683 [00:06<00:03, 86348.88KB/s]
 59%|#####9    | 471246/792683 [00:06<00:03, 86950.58KB/s]
 61%|######    | 479962/792683 [00:06<00:03, 86887.79KB/s]
 62%|######1   | 488947/792683 [00:06<00:03, 87770.49KB/s]
 63%|######2   | 497735/792683 [00:06<00:03, 80872.81KB/s]
 64%|######3   | 505936/792683 [00:07<00:04, 70299.81KB/s]
 65%|######4   | 513266/792683 [00:07<00:04, 65311.92KB/s]
 66%|######5   | 520392/792683 [00:07<00:04, 66820.20KB/s]
 67%|######6   | 529603/792683 [00:07<00:03, 73546.20KB/s]
 68%|######7   | 537181/792683 [00:07<00:04, 60731.50KB/s]
 69%|######8   | 543746/792683 [00:07<00:04, 50496.43KB/s]
 69%|######9   | 549819/792683 [00:07<00:04, 52758.14KB/s]
 70%|#######   | 558689/792683 [00:08<00:03, 61381.72KB/s]
 71%|#######1  | 565524/792683 [00:08<00:03, 63157.69KB/s]
 72%|#######2  | 572256/792683 [00:08<00:05, 43919.13KB/s]
 73%|#######2  | 577707/792683 [00:08<00:05, 41188.72KB/s]
 74%|#######3  | 583947/792683 [00:08<00:04, 45618.96KB/s]
 74%|#######4  | 590326/792683 [00:08<00:04, 49793.18KB/s]
 76%|#######5  | 598662/792683 [00:08<00:03, 58087.01KB/s]
 76%|#######6  | 606023/792683 [00:08<00:03, 60863.59KB/s]
 77%|#######7  | 612563/792683 [00:09<00:03, 52424.66KB/s]
 78%|#######8  | 621192/792683 [00:09<00:02, 60682.26KB/s]
 79%|#######9  | 629994/792683 [00:09<00:02, 67745.55KB/s]
 81%|########  | 639137/792683 [00:09<00:02, 74129.92KB/s]
 82%|########1 | 647192/792683 [00:09<00:01, 75911.83KB/s]
 83%|########2 | 655348/792683 [00:09<00:02, 68232.20KB/s]
 84%|########3 | 662530/792683 [00:09<00:01, 69174.28KB/s]
 84%|########4 | 669712/792683 [00:09<00:01, 64575.84KB/s]
 85%|########5 | 676395/792683 [00:10<00:01, 62989.06KB/s]
 86%|########6 | 682846/792683 [00:10<00:01, 61113.85KB/s]
 87%|########7 | 691801/792683 [00:10<00:01, 68778.02KB/s]
 88%|########8 | 698838/792683 [00:10<00:02, 34432.34KB/s]
 89%|########8 | 704496/792683 [00:10<00:02, 34238.34KB/s]
 90%|########9 | 712696/792683 [00:11<00:01, 40755.57KB/s]
 91%|######### | 720888/792683 [00:11<00:01, 47597.96KB/s]
 92%|#########1| 727559/792683 [00:11<00:01, 51653.04KB/s]
 93%|#########2| 733694/792683 [00:11<00:01, 51860.92KB/s]
 94%|#########3| 742415/792683 [00:11<00:00, 60420.92KB/s]
 95%|#########4| 751415/792683 [00:11<00:00, 68023.74KB/s]
 96%|#########5| 759979/792683 [00:11<00:00, 72755.07KB/s]
 97%|#########7| 769182/792683 [00:11<00:00, 78099.49KB/s]
 98%|#########8| 777853/792683 [00:11<00:00, 80538.75KB/s]
 99%|#########9| 786419/792683 [00:11<00:00, 81948.42KB/s]
100%|##########| 792683/792683 [00:12<00:00, 65670.02KB/s]
oxford-iiit-pet/
├── annotations/
└── images/

print(dataset)

Out:

                                                  image  label
   /root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
   /root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
   /root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
   /root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
   /root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
...                                                 ...    ...
/root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
/root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
/root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
/root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1
/root/.gluoncv/datasets/oxford-iiit-pet/oxford...      1

[7390 rows x 2 columns]

Visualize Image Classification Dataset¶

you may plot the sample images with show_images, like:

train.show_images(nsample=16, ncol=4, shuffle=True, fontsize=64)

womencasualshoes: 2, BabyPants: 0, BabyShirt: 1, womenchiffontop: 3, BabyPants: 0, BabyShirt: 1, BabyShirt: 1, womenchiffontop: 3, BabyShirt: 1, womencasualshoes: 2, womencasualshoes: 2, womencasualshoes: 2, womenchiffontop: 3, womencasualshoes: 2, womencasualshoes: 2, womenchiffontop: 3

Object Detection¶

The labels for object detection is a little bit more complicated than image classification, addtional information such as bounding box coordinates have to be stored in certain formats.

In GluonCV we support loading from common Pascal VOC and COCO formats.

The key difference between VOC and COCO format is the way how annotations are stored.

For VOC, raw images and annotations are stored in unique directory, where annotations are usually per image basis, e.g., JPEGImages/0001.jpeg and Annotations/0001.xml is a valid image-label pair.

In contrast, COCO format stores all labels in a single annotation file, e.g., all training annotations are stored in instaces_train2017.json, validation annotations are stored in instances_val2017.json.

Other than identifying the valid format of desired dataset, there’s not so much different in loading the dataset into gluoncv

from gluoncv.auto.tasks import ObjectDetection

A subset of Pascal VOC

dataset = ObjectDetection.Dataset.from_voc('https://autogluon.s3.amazonaws.com/datasets/tiny_motorbike.zip')

Out:

Downloading /root/.gluoncv/archive/tiny_motorbike.zip from https://autogluon.s3.amazonaws.com/datasets/tiny_motorbike.zip...

  0%|          | 0/21272 [00:00<?, ?KB/s]
  0%|          | 42/21272 [00:00<01:04, 330.93KB/s]
  1%|1         | 279/21272 [00:00<00:15, 1375.63KB/s]
  5%|5         | 1096/21272 [00:00<00:05, 3808.11KB/s]
 14%|#4        | 3069/21272 [00:00<00:01, 9405.22KB/s]
 24%|##3       | 5021/21272 [00:00<00:01, 12168.03KB/s]
 36%|###5      | 7645/21272 [00:00<00:00, 16325.27KB/s]
 50%|####9     | 10609/21272 [00:00<00:00, 20392.37KB/s]
 60%|#####9    | 12708/21272 [00:00<00:00, 19989.52KB/s]
 72%|#######2  | 15325/21272 [00:01<00:00, 21130.39KB/s]
 86%|########5 | 18269/21272 [00:01<00:00, 23528.13KB/s]
 97%|#########7| 20655/21272 [00:01<00:00, 22295.62KB/s]
21273KB [00:01, 17295.73KB/s]
tiny_motorbike/
├── Annotations/
├── ImageSets/
└── JPEGImages/

The dataset is once again a pandas DataFrame

print(dataset)

Out:

                                                 image  ...                         image_attr
  /root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 375.0}
  /root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 375.0}
  /root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 333.0}
  /root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 375.0}
  /root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 333.0, 'height': 500.0}
..                                                 ...  ...                                ...
/root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 333.0}
/root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 375.0}
/root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 375.0}
/root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 375.0}
/root/.gluoncv/datasets/tiny_motorbike/tiny_mo...  ...  {'width': 500.0, 'height': 331.0}

[220 rows x 3 columns]

The dataset supports random split as well

train, val, test = dataset.random_split(val_size=0.1, test_size=0.1)
print('train', len(train), 'val', len(val), 'test', len(test))

Out:

train 170 val 23 test 27

For object detection, rois column is a list of bounding boxes in dict, ‘image_attr’ is optional attributes that can accelerate some image pre-processing functions, for example:

print(train.loc[0])

Out:

image         /root/.gluoncv/datasets/tiny_motorbike/tiny_mo...
rois          [{'class': 'bicycle', 'xmin': 0.316, 'ymin': 0...
image_attr                    {'width': 500.0, 'height': 375.0}
Name: 0, dtype: object

Visualize Object Detection Dataset¶

you may plot the sample images as well as bounding boxes with show_images, like:

train.show_images(nsample=16, ncol=4, shuffle=True, fontsize=64)

Image(54), Image(26), Image(130), Image(78), Image(107), Image(98), Image(42), Image(28), Image(53), Image(104), Image(95), Image(25), Image(80), Image(156), Image(128), Image(112)

Next step¶

You have access to arbitrary datasets, e.g., kaggle competition datasets, you can start training by looking at these tutorials: - 02. Train Image Classification with Auto Estimator - 03. Train classifier or detector with HPO using GluonCV Auto task You may also check out the`d8 dataset <http://preview.d2l.ai/d8/main/>`_ with built-in datasets. D8 datasets is fully compatible with gluoncv.auto, you can directly plug-in datasets loaded from d8 and train with fit functions.

Total running time of the script: ( 0 minutes 46.529 seconds)

Gallery generated by Sphinx-Gallery