gluoncv.data.transforms¶
This file includes various transformations that are critical to vision tasks.
Bounding Box Transforms¶
Crop bounding boxes according to slice area. 

Flip bounding boxes according to image flipping directions. 

Resize bouding boxes according to image resize operation. 

Translate bounding boxes by offsets. 
Crop an image randomly with bounding box constraints. 
Image Transforms¶
Resize image with OpenCV. 

Resizes longer edge to size. 

Resizes shorter edge to size but make sure it’s capped at maximum size. 

Apply random pca lighting noise to input image. 

Random expand original image with borders, this is identical to placing the original image on a larger canvas. 

Randomly flip image along horizontal and vertical with probabilities. 

Resize the image to fit in the given area while keeping aspect ratio. 

Crop 10 regions from an array. 
Instance Segmentation Mask Transforms¶

Flip polygons according to image flipping directions. 

Resize polygons according to image resize operation. 

Convert list of polygons to full size binary mask 

Fill mask to full image size 
Preset Transforms¶
We include presets for reproducing SOTA performances described in different papers. This is a complimentary section and APIs are prone to changes.
Single Shot Multibox Object Detector¶
A util function to load all images, transform them to tensor by applying normalizations. 

A util function to transform all images to tensors as network input by applying normalizations. 

Default SSD training transform which includes tons of image augmentations. 

Default SSD validation transform. 
Faster RCNN¶
A util function to load all images, transform them to tensor by applying normalizations. 

A util function to transform all images to tensors as network input by applying normalizations. 

Default FasterRCNN training transform. 

Default FasterRCNN validation transform. 
Mask RCNN¶
A util function to load all images, transform them to tensor by applying normalizations. 

A util function to transform all images to tensors as network input by applying normalizations. 

Default Mask RCNN training transform. 

Default Mask RCNN validation transform. 
YOLO¶
A util function to load all images, transform them to tensor by applying normalizations. 

A util function to transform all images to tensors as network input by applying normalizations. 

Default YOLO training transform which includes tons of image augmentations. 

Default YOLO validation transform. 
API Reference¶
Bounding boxes transformation functions.

gluoncv.data.transforms.bbox.
affine_transform
(pt, t)[source]¶ Apply affine transform to a bounding box given transform matrix t.
 Parameters
pt (numpy.ndarray) – Bounding box with shape (1, 2).
t (numpy.ndarray) – Transformation matrix with shape (2, 3).
 Returns
New bounding box with shape (1, 2).
 Return type

gluoncv.data.transforms.bbox.
crop
(bbox, crop_box=None, allow_outside_center=True)[source]¶ Crop bounding boxes according to slice area.
This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.
 Parameters
bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\)
allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.
 Returns
Cropped bounding boxes with shape (M, 4+) where M <= N.
 Return type

gluoncv.data.transforms.bbox.
flip
(bbox, size, flip_x=False, flip_y=False)[source]¶ Flip bounding boxes according to image flipping directions.
 Parameters
bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
size (tuple) – Tuple of length 2: (width, height).
flip_x (bool) – Whether flip horizontally.
flip_y (bool) – Whether flip vertically.
 Returns
Flipped bounding boxes with original shape.
 Return type

gluoncv.data.transforms.bbox.
get_affine_transform
(center, scale, rot, output_size, shift=array([0.0, 0.0], dtype=float32), inv=0)[source]¶ Get affine transform matrix given center, scale and rotation.
 Parameters
 Returns
Affine matrix.
 Return type

gluoncv.data.transforms.bbox.
resize
(bbox, in_size, out_size)[source]¶ Resize bouding boxes according to image resize operation.
 Parameters
bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
in_size (tuple) – Tuple of length 2: (width, height) for input.
out_size (tuple) – Tuple of length 2: (width, height) for output.
 Returns
Resized bounding boxes with original shape.
 Return type

gluoncv.data.transforms.bbox.
translate
(bbox, x_offset=0, y_offset=0)[source]¶ Translate bounding boxes by offsets.
 Parameters
bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
 Returns
Translated bounding boxes with original shape.
 Return type
Addtional image transforms.

class
gluoncv.data.transforms.block.
RandomCrop
(size, pad=None, interpolation=2)[source]¶ Randomly crop src with size (width, height). Padding is optional. Upsample result if src is smaller than size.
 Parameters
size (int or tuple of (W, H)) – Size of the final output.
if int, size of the zeropadding if tuple, number of values padded to the edges of each axis.
((before_1, after_1), … (before_N, after_N)) unique pad widths for each axis. ((before, after),) yields same before and after pad for each axis. (pad,) or int is a shortcut for before = after = pad width for all axes.
interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
 Inputs:
data: input tensor with (Hi x Wi x C) shape.
 Outputs:
out: output tensor with (size[0] x size[1] x C) or (size x size x C) shape.

class
gluoncv.data.transforms.block.
RandomErasing
(probability=0.5, s_min=0.02, s_max=0.4, ratio=0.3, mean=(125.31, 122.96, 113.86))[source]¶ Randomly erasing the area in src between s_min and s_max with probability. ratio controls the ratio between width and height. mean means the value in erasing area.
 Parameters
 Inputs:
data: input tensor with (Hi x Wi x C) shape.
 Outputs:
out: output tensor with (Hi x Wi x C) shape.
Extended image transformations to mxnet.image.

gluoncv.data.transforms.image.
imresize
(src, w, h, interp=1)[source]¶ Resize image with OpenCV.
This is a duplicate of mxnet.image.imresize for name space consistency.
 Parameters
 Returns
out – The output of this function.
 Return type
NDArray or list of NDArrays
Examples
>>> import mxnet as mx >>> from gluoncv import data as gdata >>> img = mx.random.uniform(0, 255, (300, 300, 3)).astype('uint8') >>> print(img.shape) (300, 300, 3) >>> img = gdata.transforms.image.imresize(img, 200, 200) >>> print(img.shape) (200, 200, 3)

gluoncv.data.transforms.image.
random_expand
(src, max_ratio=4, fill=0, keep_ratio=True)[source]¶ Random expand original image with borders, this is identical to placing the original image on a larger canvas.
 Parameters
src (mxnet.nd.NDArray) – The original image with HWC format.
max_ratio (int or float) – Maximum ratio of the output image on both direction(vertical and horizontal)
fill (int or float or arraylike) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with perchannel values.
keep_ratio (bool) – If True, will keep output image the same aspect ratio as input.
 Returns
mxnet.nd.NDArray – Augmented image.
tuple – Tuple of (offset_x, offset_y, new_width, new_height)

gluoncv.data.transforms.image.
random_flip
(src, px=0, py=0, copy=False)[source]¶ Randomly flip image along horizontal and vertical with probabilities.
 Parameters
 Returns
mxnet.nd.NDArray – Augmented image.
tuple – Tuple of (flip_x, flip_y), records of whether flips are applied.

gluoncv.data.transforms.image.
random_pca_lighting
(src, alphastd, eigval=None, eigvec=None)[source]¶ Apply random pca lighting noise to input image.
 Parameters
img (mxnet.nd.NDArray) – Input image with HWC format.
alphastd (float) – Noise level [0, 1) for image with range [0, 255].
eigval (list of floats.) – Eigen values, defaults to [55.46, 4.794, 1.148].
eigvec (nested lists of floats) –
Eigen vectors with shape (3, 3), defaults to [[0.5675, 0.7192, 0.4009],
[0.5808, 0.0045, 0.8140], [0.5836, 0.6948, 0.4203]].
 Returns
Augmented image.
 Return type
mxnet.nd.NDArray

gluoncv.data.transforms.image.
resize_contain
(src, size, fill=0)[source]¶ Resize the image to fit in the given area while keeping aspect ratio.
If both the height and the width in size are larger than the height and the width of input image, the image is placed on the center with an appropriate padding to match size. Otherwise, the input image is scaled to fit in a canvas whose size is size while preserving aspect ratio.
 Parameters
src (mxnet.nd.NDArray) – The original image with HWC format.
size (tuple) – Tuple of length 2 as (width, height).
fill (int or float or arraylike) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with perchannel values.
 Returns
mxnet.nd.NDArray – Augmented image.
tuple – Tuple of (offset_x, offset_y, scaled_x, scaled_y)

gluoncv.data.transforms.image.
resize_long
(src, size, interp=2)[source]¶ Resizes longer edge to size. Note: resize_long uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_long to work. Resizes the original image by setting the longer edge to size and setting the shorter edge accordingly. This will ensure the new image will fit into the size specified. Resizing function is called from OpenCV.
 Parameters
src (NDArray) – The original image.
size (int) – The length to be set for the shorter edge.
interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Areabased (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moirefree results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method mentioned above. Note: When shrinking an image, it will generally look best with AREAbased interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.
 Returns
An ‘NDArray’ containing the resized image.
 Return type
NDArray
Example
>>> with open("flower.jpeg", 'rb') as fp: ... str_image = fp.read() ... >>> image = mx.img.imdecode(str_image) >>> image <NDArray 2321x3482x3 @cpu(0)> >>> size = 640 >>> new_image = mx.img.resize_long(image, size) >>> new_image <NDArray 386x640x3 @cpu(0)>

gluoncv.data.transforms.image.
resize_short_within
(src, short, max_size, mult_base=1, interp=2)[source]¶ Resizes shorter edge to size but make sure it’s capped at maximum size. Note: resize_short_within uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_short_within to work. Resizes the original image by setting the shorter edge to size and setting the longer edge accordingly. Also this function will ensure the new image will not exceed
max_size
even at the longer side. Resizing function is called from OpenCV. Parameters
src (NDArray) – The original image.
short (int) – Resize shorter side to
short
.max_size (int) – Make sure the longer side of new image is smaller than
max_size
.mult_base (int, default is 1) – Width and height are rounded to multiples of mult_base.
interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Areabased (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moirefree results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method mentioned above. Note: When shrinking an image, it will generally look best with AREAbased interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.
 Returns
An ‘NDArray’ containing the resized image.
 Return type
NDArray
Example
>>> with open("flower.jpeg", 'rb') as fp: ... str_image = fp.read() ... >>> image = mx.img.imdecode(str_image) >>> image <NDArray 2321x3482x3 @cpu(0)> >>> new_image = resize_short_within(image, short=800, max_size=1000) >>> new_image <NDArray 667x1000x3 @cpu(0)> >>> new_image = resize_short_within(image, short=800, max_size=1200) >>> new_image <NDArray 800x1200x3 @cpu(0)> >>> new_image = resize_short_within(image, short=800, max_size=1200, mult_base=32) >>> new_image <NDArray 800x1184x3 @cpu(0)>

gluoncv.data.transforms.image.
ten_crop
(src, size)[source]¶ Crop 10 regions from an array. This is performed same as: http://chainercv.readthedocs.io/en/stable/reference/transforms.html#tencrop
This method crops 10 regions. All regions will be in shape :obj`size`. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them. The crops are ordered in this order. * center crop * topleft crop * bottomleft crop * topright crop * bottomright crop * center crop (flipped horizontally) * topleft crop (flipped horizontally) * bottomleft crop (flipped horizontally) * topright crop (flipped horizontally) * bottomright crop (flipped horizontally)
 Parameters
src (mxnet.nd.NDArray) – Input image.
size (tuple) – Tuple of length 2, as (width, height) of the cropped areas.
 Returns
The cropped images with shape (10, size[1], size[0], C)
 Return type
mxnet.nd.NDArray
Experimental bounding box transformations.

gluoncv.data.transforms.experimental.bbox.
bbox_crop
(bbox, crop_box=None, allow_outside_center=True)¶ Crop bounding boxes according to slice area.
This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.
 Parameters
bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\)
allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.
 Returns
Cropped bounding boxes with shape (M, 4+) where M <= N.
 Return type

gluoncv.data.transforms.experimental.bbox.
bbox_iou
(bbox_a, bbox_b, offset=0)[source]¶ Calculate IntersectionOverUnion(IOU) of two bounding boxes.
 Parameters
bbox_a (numpy.ndarray) – An ndarray with shape \((N, 4)\).
bbox_b (numpy.ndarray) – An ndarray with shape \((M, 4)\).
offset (float or int, default is 0) – The
offset
is used to control the whether the width(or height) is computed as (right  left +offset
). Note that the offset must be 0 for normalized bboxes, whose ranges are in[0, 1]
.
 Returns
An ndarray with shape \((N, M)\) indicates IOU between each pairs of bounding boxes in bbox_a and bbox_b.
 Return type

gluoncv.data.transforms.experimental.bbox.
random_crop_with_constraints
(bbox, size, min_scale=0.3, max_scale=1, max_aspect_ratio=2, constraints=None, max_trial=50)[source]¶ Crop an image randomly with bounding box constraints.
This data augmentation is used in training of Single Shot Multibox Detector [#]_. More details can be found in data augmentation section of the original paper. .. [#] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
 Parameters
bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
size (tuple) – Tuple of length 2 of image shape as (width, height).
min_scale (float) – The minimum ratio between a cropped region and the original image. The default value is
0.3
.max_scale (float) – The maximum ratio between a cropped region and the original image. The default value is
1
.max_aspect_ratio (float) – The maximum aspect ratio of cropped region. The default value is
2
.constraints (iterable of tuples) – An iterable of constraints. Each constraint should be
(min_iou, max_iou)
format. If means no constraint if setmin_iou
ormax_iou
toNone
. If this argument defaults toNone
,((0.1, None), (0.3, None), (0.5, None), (0.7, None), (0.9, None), (None, 1))
will be used.max_trial (int) – Maximum number of trials for each constraint before exit no matter what.
 Returns
numpy.ndarray – Cropped bounding boxes with shape
(M, 4+)
where M <= N.tuple – Tuple of length 4 as (x_offset, y_offset, new_width, new_height).
Experimental image transformations.

gluoncv.data.transforms.experimental.image.
np_random_color_distort
(image, data_rng=None, eig_val=None, eig_vec=None, var=0.4, alphastd=0.1)[source]¶ Numpy version of random color jitter.
 Parameters
image (numpy.ndarray) – original image.
data_rng (numpy.random.rng) – Numpy random number generator.
eig_val (numpy.ndarray) – Eigen values.
eig_vec (numpy.ndarray) – Eigen vectors.
var (float) – Variance for the color jitters.
alphastd (type) – Jitter for the brightness.
 Returns
The jittered image
 Return type

gluoncv.data.transforms.experimental.image.
random_color_distort
(src, brightness_delta=32, contrast_low=0.5, contrast_high=1.5, saturation_low=0.5, saturation_high=1.5, hue_delta=18)[source]¶ Randomly distort image color space. Note that input image should in original range [0, 255].
 Parameters
src (mxnet.nd.NDArray) – Input image as HWC format.
brightness_delta (int) – Maximum brightness delta. Defaults to 32.
contrast_low (float) – Lowest contrast. Defaults to 0.5.
contrast_high (float) – Highest contrast. Defaults to 1.5.
saturation_low (float) – Lowest saturation. Defaults to 0.5.
saturation_high (float) – Highest saturation. Defaults to 1.5.
hue_delta (int) – Maximum hue delta. Defaults to 18.
 Returns
Distorted image in HWC format.
 Return type
mxnet.nd.NDArray
Transforms described in https://arxiv.org/abs/1512.02325.

class
gluoncv.data.transforms.presets.ssd.
SSDDALIPipeline
(num_workers, device_id, batch_size, data_shape, anchors, dataset_reader, seed= 1)[source]¶ DALI Pipeline with SSD training transform.
 Parameters
device_id (int) – DALI pipeline arg  Device id.
num_workers – DALI pipeline arg  Number of CPU workers.
batch_size – Batch size.
data_shape (int) – Height and width length. (height==width in SSD)
anchors (float list) – Normalized [ltrb] anchors generated from SSD networks. The shape length be
N*4
since it is a list of the N anchors that have all 4 float elements.dataset_reader (float) – Partial pipeline object, which __call__ function has to return (images, bboxes, labels) DALI EdgeReference tuple.
seed (int) – Random seed. Default value is 1, which corresponds to no seed.

class
gluoncv.data.transforms.presets.ssd.
SSDDefaultTrainTransform
(width, height, anchors=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), iou_thresh=0.5, box_norm=(0.1, 0.1, 0.2, 0.2), **kwargs)[source]¶ Default SSD training transform which includes tons of image augmentations.
 Parameters
width (int) – Image width.
height (int) – Image height.
anchors (mxnet.nd.NDArray, optional) –
Anchors generated from SSD networks, the shape must be
(1, N, 4)
. Since anchors are shared in the entire batch so it is1
for the first dimension.N
is the number of anchors for each image.Hint
If anchors is
None
, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
iou_thresh (float) – IOU overlap threshold for maximum matching, default is 0.5.
box_norm (arraylike of size 4, default is (0.1, 0.1, 0.2, 0.2)) – Std value to be divided from encoded values.

class
gluoncv.data.transforms.presets.ssd.
SSDDefaultValTransform
(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ Default SSD validation transform.

gluoncv.data.transforms.presets.ssd.
load_test
(filenames, short, max_size=1024, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or iterable of filenames.
 Parameters
filenames (str or list of str) – Image filename(s) to be loaded.
short (int) – Resize image short side to this short and keep aspect ratio.
max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our SSD implementation.
mean (iterable of float) – Mean pixel values.
std (iterable of float) – Standard deviations of pixel values.
 Returns
A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
 Return type
(mxnet.NDArray, numpy.ndarray) or list of such tuple

gluoncv.data.transforms.presets.ssd.
transform_test
(imgs, short, max_size=1024, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to transform all images to tensors as network input by applying normalizations. This function support 1 NDArray or iterable of NDArrays.
 Parameters
imgs (NDArray or iterable of NDArray) – Image(s) to be transformed.
short (int) – Resize image short side to this short and keep aspect ratio.
max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our SSD implementation.
mean (iterable of float) – Mean pixel values.
std (iterable of float) – Standard deviations of pixel values.
 Returns
A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
 Return type
(mxnet.NDArray, numpy.ndarray) or list of such tuple
Transforms for RCNN series.

class
gluoncv.data.transforms.presets.rcnn.
FasterRCNNDefaultTrainTransform
(short=600, max_size=1000, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), box_norm=(1.0, 1.0, 1.0, 1.0), num_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5, flip_p=0.5, ashape=128, multi_stage=False, **kwargs)[source]¶ Default FasterRCNN training transform.
 Parameters
short (int/tuple, default is 600) – Resize image shorter side to
short
. Resize the shorter side of the image randomly within the given range, if it is a tuple.max_size (int, default is 1000) – Make sure image longer side is smaller than
max_size
.net (mxnet.gluon.HybridBlock, optional) –
The fasterrcnn network.
Hint
If net is
None
, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
box_norm (arraylike of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.
num_sample (int, default is 256) – Number of samples for RPN targets.
pos_iou_thresh (float, default is 0.7) – Anchors larger than
pos_iou_thresh
is regarded as positive samples.neg_iou_thresh (float, default is 0.3) – Anchors smaller than
neg_iou_thresh
is regarded as negative samples. Anchors with IOU in betweenpos_iou_thresh
andneg_iou_thresh
are ignored.pos_ratio (float, default is 0.5) –
pos_ratio
defines how many positive samples (pos_ratio * num_sample
) is to be sampled.flip_p (float, default is 0.5) – Probability to flip horizontally, by default is 0.5 for random horizontal flip. You may set it to 0 to disable random flip or 1 to force flip.
ashape (int, default is 128) – Defines shape of pre generated anchors for target generation
multi_stage (boolean, default is False) – Whether the network output multi stage features.

class
gluoncv.data.transforms.presets.rcnn.
FasterRCNNDefaultValTransform
(short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ Default FasterRCNN validation transform.
 Parameters
short (int, default is 600) – Resize image shorter side to
short
.max_size (int, default is 1000) – Make sure image longer side is smaller than
max_size
.mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

class
gluoncv.data.transforms.presets.rcnn.
MaskRCNNDefaultTrainTransform
(short=600, max_size=1000, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), box_norm=(1.0, 1.0, 1.0, 1.0), num_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5, ashape=128, multi_stage=False, **kwargs)[source]¶ Default Mask RCNN training transform.
 Parameters
short (int/tuple, default is 600) – Resize image shorter side to
short
. Resize the shorter side of the image randomly within the given range, if it is a tuple.max_size (int, default is 1000) – Make sure image longer side is smaller than
max_size
.net (mxnet.gluon.HybridBlock, optional) –
The Mask RCNN network.
Hint
If net is
None
, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
box_norm (arraylike of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.
num_sample (int, default is 256) – Number of samples for RPN targets.
pos_iou_thresh (float, default is 0.7) – Anchors larger than
pos_iou_thresh
is regarded as positive samples.neg_iou_thresh (float, default is 0.3) – Anchors smaller than
neg_iou_thresh
is regarded as negative samples. Anchors with IOU in betweenpos_iou_thresh
andneg_iou_thresh
are ignored.pos_ratio (float, default is 0.5) –
pos_ratio
defines how many positive samples (pos_ratio * num_sample
) is to be sampled.ashape (int, default is 128) – Defines shape of pre generated anchors for target generation
multi_stage (boolean, default is False) – Whether the network output multi stage features.

class
gluoncv.data.transforms.presets.rcnn.
MaskRCNNDefaultValTransform
(short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ Default Mask RCNN validation transform.
 Parameters
short (int, default is 600) – Resize image shorter side to
short
.max_size (int, default is 1000) – Make sure image longer side is smaller than
max_size
.mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

gluoncv.data.transforms.presets.rcnn.
load_test
(filenames, short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.
 Parameters
filenames (str or list of str) – Image filename(s) to be loaded.
short (int, optional, default is 600) – Resize image short side to this short and keep aspect ratio.
max_size (int, optional, default is 1000) – Maximum longer side length to fit image. This is to limit the input image shape, avoid processing too large image.
mean (iterable of float) – Mean pixel values.
std (iterable of float) – Standard deviations of pixel values.
 Returns
A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
 Return type
(mxnet.NDArray, numpy.ndarray) or list of such tuple

gluoncv.data.transforms.presets.rcnn.
transform_test
(imgs, short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to transform all images to tensors as network input by applying normalizations. This function support 1 NDArray or iterable of NDArrays.
 Parameters
imgs (NDArray or iterable of NDArray) – Image(s) to be transformed.
short (int, optional, default is 600) – Resize image short side to this short and keep aspect ratio.
max_size (int, optional, default is 1000) – Maximum longer side length to fit image. This is to limit the input image shape, avoid processing too large image.
mean (iterable of float) – Mean pixel values.
std (iterable of float) – Standard deviations of pixel values.
 Returns
A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
 Return type
(mxnet.NDArray, numpy.ndarray) or list of such tuple
Transforms for YOLO series.

class
gluoncv.data.transforms.presets.yolo.
YOLO3DefaultTrainTransform
(width, height, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), mixup=False, **kwargs)[source]¶ Default YOLO training transform which includes tons of image augmentations.
 Parameters
width (int) – Image width.
height (int) – Image height.
net (mxnet.gluon.HybridBlock, optional) –
The yolo network.
Hint
If net is
None
, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
iou_thresh (float) – IOU overlap threshold for maximum matching, default is 0.5.
box_norm (arraylike of size 4, default is (0.1, 0.1, 0.2, 0.2)) – Std value to be divided from encoded values.

class
gluoncv.data.transforms.presets.yolo.
YOLO3DefaultValTransform
(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ Default YOLO validation transform.

gluoncv.data.transforms.presets.yolo.
load_test
(filenames, short=416, max_size=1024, stride=1, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.
 Parameters
filenames (str or list of str) – Image filename(s) to be loaded.
short (int, default=416) – Resize image short side to this short and keep aspect ratio. Note that yolo network
max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our YOLO implementation.
stride (int, optional, default is 1) – The stride constraint due to precise alignment of bounding box prediction module. Image’s width and height must be multiples of stride. Use stride = 1 to relax this constraint.
mean (iterable of float) – Mean pixel values.
std (iterable of float) – Standard deviations of pixel values.
 Returns
A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
 Return type
(mxnet.NDArray, numpy.ndarray) or list of such tuple

gluoncv.data.transforms.presets.yolo.
transform_test
(imgs, short=416, max_size=1024, stride=1, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to transform all images to tensors as network input by applying normalizations. This function support 1 NDArray or iterable of NDArrays.
 Parameters
imgs (NDArray or iterable of NDArray) – Image(s) to be transformed.
short (int, default=416) – Resize image short side to this short and keep aspect ratio. Note that yolo network
max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our YOLO implementation.
stride (int, optional, default is 1) – The stride constraint due to precise alignment of bounding box prediction module. Image’s width and height must be multiples of stride. Use stride = 1 to relax this constraint.
mean (iterable of float) – Mean pixel values.
std (iterable of float) – Standard deviations of pixel values.
 Returns
A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
 Return type
(mxnet.NDArray, numpy.ndarray) or list of such tuple