The first step in Pytorch: the use of transforms

In my previous article: The first step of Pytorch: (1) Use of Dataset class Here, whether we use torchvision.datasets or we customize the Dataset subclass, there is a parameter transforms that is passed in. I didn't explain it in detail in the last article, because this is a big piece of content, so I wrote this article to explain it.
transforms is an image processing function, which is mainly used to cut, flip, translate, affine and other operations on the indexed images, that is, to get the preprocessing process we want.

The torchvision.transforms module provided by pytorch is specially used for image preprocessing. This article introduces and tests these preprocessing methods in groups according to different processing methods.

be careful

  • transforms.Compose() Can combine multiple types of transformation operations
  • Convertible images include PIL Image, Tensor Image or batch of Tensor Images. Tensor Image is a tensor of shape (C, H, W), batch of Tensor Images is a tensor of shape (B, C, H, W)
  • To set the random number seed use torch.manual_seed(n)
  • The transform method mentioned in this article is a class. We have two processing methods, one is to instantiate the transform class, and then pass in the image, and the other is to instantiate a transforms.Compose() class. Then pass in the image processing to an instance of transforms.Compose . The difference is that if you directly instantiate the transform class, you can only perform one transform operation on the image at a time. However, the transforms.Compose() class supports passing in multiple transform classes, that is, a transforms.Compose() contains multiple transform classes, so that multiple operations can be implemented at one time:
import torchvision.transforms as transforms
pic = imread('...')
#---------Method 1---------- One processing method at a time
transform = transforms.CenterCrop(720)  # Center crop
picProcessed1 = transform(pic)
transform = transforms.RandomHorizontalFlip(p=0.5)  # Random horizontal flip
picProcessed2 = transform(picProcessed1)

#---------Method 2------------One step
transform = transforms.Compose([
    transforms.CenterCrop(720)
    transforms.RandomHorizontalFlip(p=0.5)
    ])
picProcessed = transform(pic)

0. Overview of Transforms method

1 Crop

1.1 Center Crop: transforms.CenterCrop()
1.2 Random Crop: transforms.RandomCrop()
1.3 Random aspect ratio crop: transforms.RandomResizedCrop()
1.4 Top, bottom, left and right center cropping: transforms.FiveCrop()
1.5 Flip after cropping up, down, left, right and center: transforms.TenCrop()

2 Flip and Rotation

2.1 Flip horizontally with probability p: transforms.RandomHorizontalFlip(p=0.5)
2.2 Flip vertically with probability p: transforms.RandomVerticalFlip(p=0.5)
2.3 Random rotation: transforms.RandomRotation()

3 Image Transformation

3.1 resize: transforms.Resize
3.2 Normalization: transforms.Normalize
3.3 Convert to tensor and normalize to [0-1]: transforms.ToTensor
3.4 Padding: transforms.Pad
3.5 Modify brightness, contrast and saturation: transforms.ColorJitter
3.6 Convert grayscale image: transforms.Grayscale
3.7 Linear transformation: transforms.LinearTransformation()
3.8 Affine transformation: transforms.RandomAffine
3.9 Convert to grayscale image with probability p: transforms.RandomGrayscale
3.10 Convert data to PILImage: transforms.ToPILImage
3.11 transforms.Lambda: Apply a user-defined lambda as a transform.

Note that this article will use the following code as the test code, one by one.
In addition, unless otherwise specified, the data type of the processed tensor or image remains unchanged.

import matplotlib.pyplot as plt
import torchvision.transforms as transforms
import torchvision as tv

# define conversion function
transform = transforms.Compose([
    transforms.CenterCrop(720) #Here is an example
    #========================================
    # Add the transformation function that needs to be put in here
    #========================================
])

# Read the picture
picTensor = tv.io.read_image("testpic.jpg")

# transformation diagram
picTransformed = transform(picTensor)

# show
picNumpy = picTensor.permute(1, 2, 0).numpy()
plt.imshow(picNumpy)
plt.show()
picNumpy = picTransformed.permute(1, 2, 0).numpy()
plt.imshow(picNumpy)
plt.show()

The sample image used in this article is: torch.Size([3, 2100, 3174])

1 Crop

1.1 Center Crop: transforms.CenterCrop( size )

*size* can be int,list,turple, The function is to crop the picture,
Only keep the center point of the picture as the center,*size* for that part of the size. code show as below:

transforms.CenterCrop(size=1080)
output:
CenterCrop(size=(1080, 1080))
Converted image shape: torch.Size([3, 2100, 3174])
Converted image shape: torch.uint8 

Converted image shape: torch.Size([3, 1080, 1080])
Converted image shape: torch.uint8


1.2 Random Crop: transforms.RandomCrop()

transforms.RandomCrop(size=1600, padding=300,
                      pad_if_needed=False, fill=0, padding_mode='constant')
here pad is to fill the surrounding pad=300 pixels, and then crop,
It is equivalent to expanding a circle of borders for the picture first, and then randomly finding a position to crop it. It may or may not cut the filled border into it.
output:
RandomCrop(size=(1600, 1600), padding=300)
Converted image shape: torch.Size([3, 2100, 3174])
Converted image shape: torch.uint8 

Converted image shape: torch.Size([3, 1600, 1600])
Converted image shape: torch.uint8


1.3 Random aspect ratio crop: transforms.RandomResizedCrop()

transforms.RandomResizedCrop(size=900, scale=(0.03, 1.8),
                             ratio=(0.9999, 1.0), interpolation=2)
Crop with zoom.
scale express size the zoom factor, ratio express size Aspect ratio multiples of ,
So for what we want size ,will first be scaled and changed aspect ratio, then the resulting size to crop the original image,
Finally, restore our cropped image back to the input size size.
This achieves our crop with scaling and aspect ratio transformation.



1.4 Top, bottom, left and right center cropping: transforms.FiveCrop(size:int)

return a `tuple`,(Top left crop, top right crop, bottom left crop, bottom right crop, middle crop),
this`tuple`contains 5 tensor ,each tensor Represents a screenshot of the original image,
The screenshot position is the upper left, upper right, lower left, lower right, middle. size is the parameter passed in size
 Code:
transforms.FiveCrop(1000)
Output: (take the upper left as an example)
FiveCrop(size=(1000, 1000))
Converted image shape: torch.Size([3, 2100, 3174])
Converted image shape: torch.uint8 

Converted image shape: torch.Size([3, 1000, 1000])
Converted image shape: torch.uint8



1.5 Flip after cropping up, down, left, right and center: transforms.TenCrop()

and FiveCrop() Similar, except that it returns a 10-tuple, and the last five elements are the horizontal flip of the first five.

2 Flip and Rotation


2.1 Flip horizontally with probability p: transforms.RandomHorizontalFlip(p=0.5)
The mediocre horizontal flip is not necessarily flipped, but is performed according to the probability of p=0.5, which can be set to 1 to make it flipped. (the same below)


2.2 Flip vertically with probability p: transforms.RandomVerticalFlip(p=0.5)


2.3 Random rotation: transforms.RandomRotation(degrees=90, resample=False, expand=False, center=None)

The unremarkable image rotation operation needs to be paid attention to here.
For example, if we want to rotate the image without changing the size of the image, we need to expand Set as False,
This will automatically cut off those parts that go beyond the border;
If set to True ,The image border is automatically expanded, leaving the image intact.
degrees is not the set rotation angle,
but will be [-degrees, +degrees] Randomly choose an angle.
This is directly the angle value, not the radian system.
center is the center of rotation


#3 Image Transformation

3.1 resize: transforms.Resize(size=900)

if size only one int ,That will scale the short side to the specified size while maintaining the aspect ratio. size ,
The long side is scaled according to the aspect ratio


3.2 Standardization: transforms.Normalize(mean=[], var=[])

mean,var are all ternary sequences, corresponding to the mean variance of the three channels respectively
 Note: The image entered at this time should be 0-1 between float,instead of 0-255 of int8 type


3.3 Convert to tensor and normalize to [0-1]: transforms.ToTensor

The input object of this function is mainly numpy.ndarray as well as PIL.image type of picture.
This method is to convert images read by other libraries into pytorch unified picture Tensor form. which is:
1. shape for [B, C, H, W]
2. dtype for torch.float


3.4 Padding: transforms.Pad(padding, fill=0, padding_mode='constant')

padding_mode='constant'In mode:
padding: How many pixels to fill on which sides. three situations:
1.alone int ,Indicates that the image should be filled on all sides int pixel value fill of pixels.
2.a tuple or array of length 2 list. 
So list[0] Indicates padding on the left and right of the image list[0]pixel value fill or fill[0] of pixels.
3.a tuple or array of length 4 list. 
So list[0] Indicates padding on the left of the image list[0]pixel value fill or fill[0] pixels; 
list[1] Indicates padding on the left of the image list[1]pixel value fill or fill[1] pixels;
list[2] Indicates padding on the left of the image list[2]pixel value fill or fill[2] pixels; 
list[4] Indicates padding on the left of the image list[4]pixel value fill or fill[4] of pixels.
padding_mode You can also choose:
edge: pads with the last value at the edge of the image
reflect: pads with reflection of image without repeating the last value on the edge. 
For example, padding [1, 2, 3, 4] with 2 elements on both sides 
in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
No need to set in the above two modes fill. 
*Note: the documentation says fill Could be a $3 tuple ,
Then you can give RGB Three channels add borders of different grayscales to achieve colored borders,
But I didn't try it.*


3.5 Modify Brightness, Contrast and Saturation

transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
Randomly modify brightness, contrast, saturation and hue,
Its random range is [ max(0,1-brightness), 1+brightness],
brightness,contrast,saturation are all possible [0,∞]
hue It is special and can only be used in [0, 0.5]value between


3.6 Convert grayscale image: transforms.Grayscale(num_output_channels=1)

Simply convert to grayscale, the number of output channels can be 1 or 3
 If output channel 3 is selected, R=G=B


3.7 Linear transformation: transforms.LinearTransformation()
This is so troublesome, I won't, see for yourself Documentation Bar

3.8 Affine transformation: transforms.RandomAffine

transforms.RandomAffine(degrees=10, translate=None, scale=None, shear=(20, 30), resample=0, fillcolor=0)
of this function shear It's cutting, it's more important
translate is a translation
degrees is rotating
 This is a comprehensive function that can implement many functions, and can make others default values, only use one of them


3.9 Random Perspective: transforms.RandomPerspective()

transforms.RandomPerspective(distortion_scale=0.5, p=0.5, interpolation=2, fill=0)
random according to probability p The degree of progress is distortion_scale perspective transformation of


3.10 Convert to grayscale image with probability p: transforms.RandomGrayscale
It is not necessarily converted into a grayscale image, according to a certain probability

3.11 Convert data to PILImage: transforms.ToPILImage
Convert it to PILImage , I won't write it if I don't need it

3.12 transforms.Lambda: Apply a user-defined lambda as a transform.
Oh this will not.

Tags: OpenCV neural networks Pytorch Computer Vision Deep Learning

Posted by jozard on Sun, 01 May 2022 09:13:21 +0300