11, Recurrent neural network RNN & LSTM
11.1 time series representation
[seq_len, feature_len]: [sequence length, feature length / dimension / representation]
Text message:
1. One hot code:
The specific location code is 1 and the rest is 0
Disadvantages: sparse
2.[words, words_vec]
Batch:
[word num, b, word vec]
[b, word num, word vec]
Coding method:
word2vec vs GloVe
from torcchnlp.word_to_vector import GloVe vectors = GloVe() vectors['hello'] -1.7494 0.6242 ... -0.6202 20.928 [torch.FloatTensor of size 100]
11.2 RNN
features:
1. Weight sharing
2. Continuous memory unit (storing context information)
PyTorch implementation
nn.RNN:
__init__
(input_size, hidden_size, num_layers)
input_size: the dimension of the input word vector
hidden_size: memory_size
num_layers: the default is 1
forward
out, ht = forward(x, h0)
x: [seq len, b, word vec]
h0/ht: [num layers, b, h dim]
out: [seq len, b, h dim]
Single layer RNN
Note: out_size will not change; h indicates the state of memory under the last timestamp
Layer 2 RNN shape verification
nn.RNNCell:
__ init__: Exactly the same as RNN
forward:
ht = rnncell(xt, ht_1)
xt: [b, word vec]
ht_1/ht: [num layers, b, h dim]
out = torch.stack([h1, h2, ..., ht])
11.3 time series prediction practice
import numpy as np import torch import torch.nn as nn import torch.optim as optim from matplotlib import pyplot as plt num_time_steps = 50 input_size = 1 hidden_size = 16 output_size = 1 lr = 0.01 class Net(nn.Module): def __init__(self, ): super(Net, self).__init__() self.rnn = nn.RNN( input_size=input_size, hidden_size=hidden_size, num_layers=1, batch_first=True, # [b, seq, feature] ) for p in self.rnn.parameters(): nn.init.normal_(p, mean=0.0, std=0.001) self.linear = nn.Linear(hidden_size, output_size) def forward(self, x, hidden_prev): # (self, x, h0) out, hidden_prev = self.rnn(x, hidden_prev) # [1, seq, h] => [seq, h] out = out.view(-1, hidden_size) out = self.linear(out) # [seq,h] => [seq, 1] out = out.unsqueeze(dim=0) # => [1, seq, 1] return out, hidden_prev # Train model = Net() criterion = nn.MSELoss() optimizer = optim.Adam(model.parameters(), lr) hidden_prev = torch.zeros(1, 1, hidden_size) # h0 for iter in range(6000): start = np.random.randint(3, size=1)[0] # 0~3 time_steps = np.linspace(start, start + 10, num_time_steps) data = np.sin(time_steps) data = data.reshape(num_time_steps, 1) x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1) y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1) output, hidden_prev = model(x, hidden_prev) hidden_prev = hidden_prev.detach() loss = criterion(output, y) model.zero_grad() loss.backward() # for p in model.parameters(): # print(p.grad.norm()) # torch.nn.utils.clip_grad_norm(p, 10) optimizer.step() if iter % 100 == 0: print("Iteration: {} loss {} ".format(iter, loss.item())) start = np.random.randint(3, size=1)[0] time_steps = np.linspace(start, start + 10, num_time_steps) data = np.sin(time_steps) data = data.reshape(num_time_steps, 1) x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1) y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1) predictions = [] input = x[:, 0, :] for _ in range(x.shape[1]): input = input.view(1, 1, 1) (pred, hidden_prev) = model(input, hidden_prev) input = pred predictions.append(pred.detach().numpy().ravel()[0]) x = x.data.numpy().ravel() y = y.data.numpy() plt.scatter(time_steps[:-1], x.ravel(), s=90) plt.plot(time_steps[:-1], x.ravel()) plt.scatter(time_steps[1:], predictions) plt.show()
Operation results:
Iteration: 0 loss 0.5240068435668945
Iteration: 100 loss 0.004781486000865698
Iteration: 200 loss 0.0025698889512568712
Iteration: 300 loss 0.0021712062880396843
Iteration: 400 loss 0.003106305142864585
Iteration: 500 loss 0.006951724644750357
Iteration: 600 loss 0.00876646488904953
Iteration: 700 loss 0.0003261358942836523
Iteration: 800 loss 0.001015920890495181
Iteration: 900 loss 0.003062265692278743
Iteration: 1000 loss 0.0043131341226398945
Iteration: 1100 loss 0.00014511161134578288
Iteration: 1200 loss 0.0009089858504012227
Iteration: 1300 loss 0.0009695018525235355
Iteration: 1400 loss 0.001020518015138805
Iteration: 1500 loss 0.0009882590966299176
Iteration: 1600 loss 0.0004311317461542785
Iteration: 1700 loss 0.0012930548982694745
Iteration: 1800 loss 0.0005156291299499571
Iteration: 1900 loss 0.001561652636155486
Iteration: 2000 loss 0.0007380764000117779
Iteration: 2100 loss 0.0012094884878024459
Iteration: 2200 loss 0.00036121331504546106
Iteration: 2300 loss 0.000719703733921051
Iteration: 2400 loss 9.609026892576367e-05
Iteration: 2500 loss 0.0009065204649232328
Iteration: 2600 loss 0.001319637056440115
Iteration: 2700 loss 0.0006897666025906801
Iteration: 2800 loss 0.00015256847837008536
Iteration: 2900 loss 0.00026130853802897036
Iteration: 3000 loss 8.58261773828417e-05
Iteration: 3100 loss 0.0008577909902669489
Iteration: 3200 loss 0.00030213454738259315
Iteration: 3300 loss 0.00019907366367988288
Iteration: 3400 loss 0.0004966565757058561
Iteration: 3500 loss 0.0009238572674803436
Iteration: 3600 loss 0.00027086795307695866
Iteration: 3700 loss 0.0005339682684279978
Iteration: 3800 loss 0.00024678310728631914
Iteration: 3900 loss 0.0007551009184680879
Iteration: 4000 loss 0.00019022168999072164
Iteration: 4100 loss 0.00015904009342193604
Iteration: 4200 loss 0.00032481923699378967
Iteration: 4300 loss 2.64111440628767e-05
Iteration: 4400 loss 0.0003545751387719065
Iteration: 4500 loss 0.00014119122351985425
Iteration: 4600 loss 0.000840750231873244
Iteration: 4700 loss 0.00024481775471940637
Iteration: 4800 loss 0.00013383818441070616
Iteration: 4900 loss 0.0003125799121335149
Iteration: 5000 loss 0.00022037429152987897
Iteration: 5100 loss 0.00046490851673297584
Iteration: 5200 loss 0.00030112380045466125
Iteration: 5300 loss 0.00048140008584596217
Iteration: 5400 loss 6.63367536617443e-05
Iteration: 5500 loss 0.0008455420611426234
Iteration: 5600 loss 0.00020897392823826522
Iteration: 5700 loss 0.0001983413239941001
Iteration: 5800 loss 0.00013303171726875007
Iteration: 5900 loss 0.0002572516386862844
11.4 gradient dispersion and gradient explosion
1. Explanation of terms
Gradient dispersion: due to the chain rule of derivative, the multiplication of gradients less than 1 in continuous layers will make the gradient smaller and smaller, and finally lead to a gradient of 0 in a layer.
Gradient explosion: due to the chain rule of derivative, the multiplication of gradients greater than 1 in consecutive layers will make the gradient larger and larger, and eventually lead to the problem of too large gradient.
2. Solutions
Gradient explosion:
The gradient truncation method is used, that is, when the gradient exceeds a threshold, it will be clipped
loss = criterion(output, y) model.zero_grad() loss.backward() for p in model.parameters(): print(p.grad.norm()) # View the modulus of each gradient torch.nn.utils.clip_grad_norm_(p, 10) # Those greater than 10 are truncated optimizer.step()
Gradient dispersion:
LSTM
11.5 LSTM principle
Can refer to Li Feifei computer vision - personal notes (week 5)
Problems of traditional RNN: short term memory
LSTM: long-short-term memory
3 σ Represent three kinds of doors respectively
Note: the opening of each door is determined by the result calculated by the back propagation algorithm
Simple addition
Note: ht in the output gate becomes the output symbol
Understand LSTM from another perspective:
11.6 use of LSTM
nn.LSTM:
__init__: (input_size, hidden_size, num_layers)
LSTM.forward:
out, (ht, ct) = lstm(x, [ht_1, ct_1])
x:[seq, b, vec]
h/c:[num_layer, b, h]
out:[seq, b, h]
demo:
nn.LSTMCell:
__init__: (input_size, hidden_size, num_layers)
LSTMCell.forward:
ht, ct = lstmcell(xt, [ht_1, ct_1])
xt:[b, vec]
ht/ct:[b, h]
Single layer demo:
Two layer:
12, Transfer learning
12.1 custom dataset
1. Load dataset
·Inherited from torch utils. data. Dataset
· __ len __: Number of data set samples
· __ getitem __: Return sample
class NumbersDataset(Dataset): def __init__(self, training=True): if training: self.samples = list(range(1, 1001)) else: self.samples = list(range(1001, 1501)) def __len__(self): return len(self.samples) def __getitem__(self, idx): return self.samples[idx]
Data preprocessing:
·Image Resize - fixes the size of the image
·Data Augment - add data sets to improve performance
·Rotate - rotate
·Crop crop
·Normalize - make the data set more stable and easy to converge
·Mean,std
·ToTensor
import csv import glob import os import random import torch from PIL import Image from torch.utils.data import Dataset, DataLoader from torchvision import transforms class Pokemon(Dataset): def __init__(self, root, resize, mode): super(Pokemon, self).__init__() self.root = root self.resize = resize self.name2label = {} # "squirtle":0 for name in sorted(os.listdir(os.path.join(root))): # traverse folder if not os.path.isdir(os.path.join(root, name)): # Filter file continue self.name2label[name] = len(self.name2label.keys()) # print(self.name2label) # image, label self.images, self.labels = self.load_csv('images.csv') # train:validation:test = 6:2:2 if mode == 'train': # 60% self.images = self.images[:int(0.6*len(self.images))] self.labels = self.labels[:int(0.6*len(self.labels))] elif mode == 'val': # 20% = 60%->80% self.images = self.images[int(0.6*len(self.images)):int(0.8*len(self.images))] self.labels = self.labels[int(0.6*len(self.labels)):int(0.8*len(self.labels))] else: # 20% = 80%->100% self.images = self.images[int(0.8*len(self.images)):] self.labels = self.labels[int(0.8*len(self.labels)):] def load_csv(self, filename): if not os.path.exists(os.path.join(self.root, filename)): # csv file does not exist, create images = [] for name in self.name2label.keys(): # 'pokemon\\mewtwo\\00001.png images += glob.glob(os.path.join(self.root, name, '*.png')) images += glob.glob(os.path.join(self.root, name, '*.jpg')) images += glob.glob(os.path.join(self.root, name, '*.jpeg')) # 1167, 'pokemon\\bulbasaur\\00000000.png' print(len(images), images) random.shuffle(images) with open(os.path.join(self.root, filename), mode='w', newline='') as f: writer = csv.writer(f) for img in images: # 'pokemon\\bulbasaur\\00000000.png' name = img.split(os.sep)[-2] # bulbasaur label = self.name2label[name] # 'pokemon\\bulbasaur\\00000000.png', 0 writer.writerow([img, label]) print('writen into csv file:', filename) # Save as csv file # read from csv file images, labels = [], [] with open(os.path.join(self.root, filename)) as f: reader = csv.reader(f) for row in reader: # 'pokemon\\bulbasaur\\00000000.png', 0 img, label = row label = int(label) images.append(img) labels.append(label) assert len(images) == len(labels) return images, labels def __len__(self): return len(self.images) def denormalize(self, x_hat): mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] # x_hat = (x-mean)/std # x = x_hat*std = mean # x: [c, h, w] # mean: [3] => [3, 1, 1] mean = torch.tensor(mean).unsqueeze(1).unsqueeze(1) std = torch.tensor(std).unsqueeze(1).unsqueeze(1) # print(mean.shape, std.shape) x = x_hat * std + mean return x def __getitem__(self, idx): # idx:[0~len(images)] # self.images, self.labels # img: 'pokemon\\bulbasaur\\00000000.png' # label: 0 img, label = self.images[idx], self.labels[idx] tf = transforms.Compose([ lambda x:Image.open(x).convert('RGB'), # string path => image data transforms.Resize((int(self.resize*1.25), int(self.resize*1.25))), transforms.RandomRotation(15), transforms.CenterCrop(self.resize), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) img = tf(img) label = torch.tensor(label) return img, label def main(): import visdom import time import torchvision viz = visdom.Visdom() tf = transforms.Compose([ transforms.Resize((64,64)), transforms.ToTensor(), ]) db = torchvision.datasets.ImageFolder(root='pokemon', transform=tf) loader = DataLoader(db, batch_size=32, shuffle=True) print(db.class_to_idx) for x, y in loader: viz.images(x, nrow=8, win='batch', opts=dict(title='batch')) viz.text(str(y.numpy()), win='label', opts=dict(title='batch-y')) time.sleep(10) # Another visualization method # db = Pokemon('pokemon', 64, 'train') # # x, y = next(iter(db)) # print('sample:', x.shape, y.shape, y) # # viz.image(db.denormalize(x), win='sample_x', opts=dict(title='sample_x')) # # loader = DataLoader(db, batch_size=32, shuffle=True, num_workers=8) # # for x, y in loader: # viz.images(db.denormalize(x), nrow=8, win='batch', opts=dict(title='batch')) # viz.text(str(y.numpy()), win='label', opts=dict(title='batch-y')) # # time.sleep(10) if __name__ == '__main__': main()
Operation results:
[3 4 1 2 3 3 4 1 1 1 2 1 4 2 3 3 1 0 3 4 1 2 3 4 1 4 3 3 3 2 4 0]
12.2 creating models
import torch from torch import nn from torch.nn import functional as F class ResBlk(nn.Module): """ resnet block """ def __init__(self, ch_in, ch_out, stride=1): """ :param ch_in: :param ch_out: """ super(ResBlk, self).__init__() self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=3, stride=stride, padding=1) self.bn1 = nn.BatchNorm2d(ch_out) self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=3, stride=1, padding=1) self.bn2 = nn.BatchNorm2d(ch_out) self.extra = nn.Sequential() if ch_out != ch_in: # [b, ch_in, h, w] => [b, ch_out, h, w] self.extra = nn.Sequential( nn.Conv2d(ch_in, ch_out, kernel_size=1, stride=stride), nn.BatchNorm2d(ch_out) ) def forward(self, x): """ :param x: [b, ch, h, w] :return: """ out = F.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) # short cut. # extra module: [b, ch_in, h, w] => [b, ch_out, h, w] # element-wise add: out = self.extra(x) + out out = F.relu(out) return out class ResNet18(nn.Module): def __init__(self, num_class): super(ResNet18, self).__init__() self.conv1 = nn.Sequential( nn.Conv2d(3, 16, kernel_size=3, stride=3, padding=0), nn.BatchNorm2d(16) ) # followed 4 blocks # [b, 16, h, w] => [b, 32, h ,w] self.blk1 = ResBlk(16, 32, stride=3) # [b, 32, h, w] => [b, 64, h, w] self.blk2 = ResBlk(32, 64, stride=3) # # [b, 64, h, w] => [b, 128, h, w] self.blk3 = ResBlk(64, 128, stride=2) # # [b, 128, h, w] => [b, 256, h, w] self.blk4 = ResBlk(128, 256, stride=2) # [b, 256, 7, 7] self.outlayer = nn.Linear(256*3*3, num_class) def forward(self, x): """ :param x: :return: """ x = F.relu(self.conv1(x)) # [b, 64, h, w] => [b, 1024, h, w] x = self.blk1(x) x = self.blk2(x) x = self.blk3(x) x = self.blk4(x) # print(x.shape) x = x.view(x.size(0), -1) x = self.outlayer(x) return x def main(): blk = ResBlk(64, 128) tmp = torch.randn(2, 64, 224, 224) out = blk(tmp) print('block:', out.shape) model = ResNet18(5) tmp = torch.randn(2, 3, 224, 224) out = model(tmp) print('resnet:', out.shape) p = sum(map(lambda p: p.numel(), model.parameters())) print('parameters size:', p) if __name__ == '__main__': main()
Operation results:
block: torch.Size([2, 128, 224, 224])
resnet: torch.Size([2, 5])
parameters size: 1234885
12.3 training and testing
import torch from torch import optim, nn import visdom import torchvision from torch.utils.data import DataLoader from pokemon import Pokemon from resnet import ResNet18 batchsz = 32 lr = 1e-3 epochs = 10 device = torch.device("cuda") torch.manual_seed(1234) # Random seed train_db = Pokemon('pokemon', 224, mode='train') val_db = Pokemon('pokemon', 224, mode='val') test_db = Pokemon('pokemon', 224, mode='test') train_loader = DataLoader(train_db, batch_size=batchsz, shuffle=True, num_workers=4) val_loader = DataLoader(val_db, batch_size=batchsz, num_workers=2) test_loader = DataLoader(test_db, batch_size=batchsz, num_workers=2) viz = visdom.Visdom() def evalute(model, loader): model.eval() correct = 0 total = len(loader.dataset) for x, y in loader: x, y = x.to(device), y.to(device) with torch.no_grad(): logits = model(x) pred = logits.argmax(dim=1) correct += torch.eq(pred, y).sum().float().item() return correct / total def main(): model = ResNet18(5).to(device) # Number of species optimizer = optim.Adam(model.parameters(), lr=lr) criteon = nn.CrossEntropyLoss() # Receive logits best_acc, best_epoch = 0, 0 global_step = 0 viz.line([0], [-1], win='loss', opts=dict(title='loss')) viz.line([0], [-1], win='val_acc', opts=dict(title='val_acc')) for epoch in range(epochs): for step, (x, y) in enumerate(train_loader): # x: [b, 3, 224, 224], y: [b] x, y = x.to(device), y.to(device) model.train() logits = model(x) loss = criteon(logits, y) # Internal one hot optimizer.zero_grad() loss.backward() optimizer.step() viz.line([loss.item()], [global_step], win='loss', update='append') global_step += 1 if epoch % 1 == 0: val_acc = evalute(model, val_loader) if val_acc > best_acc: best_epoch = epoch best_acc = val_acc torch.save(model.state_dict(), 'best.mdl') # Save the model with the best results viz.line([val_acc], [global_step], win='val_acc', update='append') print('best acc:', best_acc, 'best epoch:', best_epoch) model.load_state_dict(torch.load('best.mdl')) print('loaded from ckpt!') # checkpoint test_acc = evalute(model, test_loader) print('test acc:', test_acc) if __name__ == '__main__': main()
12.4 style transfer
import torch from torch import optim, nn import visdom import torchvision from torch.utils.data import DataLoader from pokemon import Pokemon # from resnet import ResNet18 from torchvision.models import resnet18 from utils import Flatten batchsz = 32 lr = 1e-3 epochs = 10 device = torch.device('cuda') torch.manual_seed(1234) train_db = Pokemon('pokemon', 224, mode='train') val_db = Pokemon('pokemon', 224, mode='val') test_db = Pokemon('pokemon', 224, mode='test') train_loader = DataLoader(train_db, batch_size=batchsz, shuffle=True, num_workers=4) val_loader = DataLoader(val_db, batch_size=batchsz, num_workers=2) test_loader = DataLoader(test_db, batch_size=batchsz, num_workers=2) viz = visdom.Visdom() def evalute(model, loader): model.eval() correct = 0 total = len(loader.dataset) for x, y in loader: x, y = x.to(device), y.to(device) with torch.no_grad(): logits = model(x) pred = logits.argmax(dim=1) correct += torch.eq(pred, y).sum().float().item() return correct / total def main(): # model = ResNet18(5).to(device) trained_model = resnet18(pretrained=True) model = nn.Sequential(*list(trained_model.children())[:-1], # [b, 512, 1, 1];0-17 floors Flatten(), # [b, 512, 1, 1] => [b, 512] nn.Linear(512, 5) ).to(device) # x = torch.randn(2, 3, 224, 224) # print(model(x).shape) optimizer = optim.Adam(model.parameters(), lr=lr) criteon = nn.CrossEntropyLoss() best_acc, best_epoch = 0, 0 global_step = 0 viz.line([0], [-1], win='loss', opts=dict(title='loss')) viz.line([0], [-1], win='val_acc', opts=dict(title='val_acc')) for epoch in range(epochs): for step, (x, y) in enumerate(train_loader): # x: [b, 3, 224, 224], y: [b] x, y = x.to(device), y.to(device) model.train() logits = model(x) loss = criteon(logits, y) optimizer.zero_grad() loss.backward() optimizer.step() viz.line([loss.item()], [global_step], win='loss', update='append') global_step += 1 if epoch % 1 == 0: val_acc = evalute(model, val_loader) if val_acc > best_acc: best_epoch = epoch best_acc = val_acc torch.save(model.state_dict(), 'best.mdl') viz.line([val_acc], [global_step], win='val_acc', update='append') print('best acc:', best_acc, 'best epoch:', best_epoch) model.load_state_dict(torch.load('best.mdl')) print('loaded from ckpt!') test_acc = evalute(model, test_loader) print('test acc:', test_acc) if __name__ == '__main__': main()
Operation results:
best acc: 0.9356223175965666 best epoch: 9
loaded from ckpt!
test acc: 0.9444444444444444
12.5 supplementary code
utils.py
Added a function needed for Flatten operation and drawing images
from matplotlib import pyplot as plt import torch from torch import nn class Flatten(nn.Module): def __init__(self): super(Flatten, self).__init__() def forward(self, x): shape = torch.prod(torch.tensor(x.shape[1:])).item() return x.view(-1, shape) def plot_image(img, label, name): fig = plt.figure() for i in range(6): plt.subplot(2, 3, i + 1) plt.tight_layout() plt.imshow(img[i][0]*0.3081+0.1307, cmap='gray', interpolation='none') plt.title("{}: {}".format(name, label[i].item())) plt.xticks([]) plt.yticks([]) plt.show()
12.6 problems encountered
1. Blue screen exception during visualizing visdom
Solution: replace the static folder under the relative path anaconda3 \ lib \ site packages \ visdom. Reference blog: pytorch visdom blue screen, you can download this file and directly overwrite it
2.viz.line drawing cannot be updated in real time
Solution: go online again after offline, and the broken line chart is updated successfully