libtorch study notes - MNIST combat

MNIST libtorch practical exercise

Ready to work

First download the MNIST database, http://yann.lecun.com/exdb/mnist/
After downloading, do not decompress it with software such as winrar. For example, t10k-images-idx3-ubyte is decompressed into t10k-images.idx3-ubyte. It is best to decompress it with tar in Linux environment.
Suppose you unzip to I:\MNIST.

train and save results

define network

Similar to the previous one, but padding needs to be defined. After all, MNIST training and test images are both 28x28, but LeNet-5 expects the input image to be 32x32, so it is necessary to define padding for the convolutional network, here is (32 - 28)/2 = 2 . So compared to the previous code, it needs to be slightly changed:

struct LeNet5 : torch::nn::Module
{
	// padding can be passed into the convolutional layer C1 to align the input image to 32x32
	LeNet5(int arg_padding=0)
		: C1(register_module("C1", torch::nn::Conv2d(torch::nn::Conv2dOptions(1, 6, 5).padding(arg_padding))))
		, C3(register_module("C3", torch::nn::Conv2d(6, 16, 5)))
		, F5(register_module("F5", torch::nn::Linear(16 * 5 * 5, 120)))
		, F6(register_module("F6", torch::nn::Linear(120, 84)))
		, OUTPUT(register_module("OUTPUT", torch::nn::Linear(84, 10)))
	{
	}

	~LeNet5()
	{
	}

	int64_t num_flat_features(torch::Tensor input)
	{
		int64_t num_features = 1;
		auto sizes = input.sizes();
		for (auto s:sizes) {
			num_features *= s;
		}
		return num_features;
	}

	torch::Tensor forward(torch::Tensor input)
	{
		namespace F = torch::nn::functional;
		// 2x2 Max pooling
		auto x = F::max_pool2d(F::relu(C1(input)), F::MaxPool2dFuncOptions({ 2,2 }));
		// If it is a square matrix, it can be defined with just one number
		x = F::max_pool2d(F::relu(C3(x)), F::MaxPool2dFuncOptions(2));
		x = x.view({ -1, num_flat_features(x) });
		x = F::relu(F5(x));
		x = F::relu(F6(x));
		x = OUTPUT(x);
		return x;
	}

	// Define the padding of the C1 convolutional network
	int m_padding = 0;
	torch::nn::Conv2d	C1;
	torch::nn::Conv2d	C3;
	torch::nn::Linear	F5;
	torch::nn::Linear	F6;
	torch::nn::Linear	OUTPUT;
};

start training

Please see the following code:

	{
		tm_start = std::chrono::system_clock::now();
		auto dataset = torch::data::datasets::MNIST("I:\\MNIST\\")
			.map(torch::data::transforms::Normalize<>(0.5, 0.5))
			.map(torch::data::transforms::Stack<>());
		auto data_loader = torch::data::make_data_loader(std::move(dataset));

		tm_end = std::chrono::system_clock::now();

		printf("It takes %lld msec to load MNIST handwriting database.\n", 
			std::chrono::duration_cast<std::chrono::milliseconds>(tm_end - tm_start).count());

		tm_start = std::chrono::system_clock::now();
		// The input image is 28x28, you need to set padding to 2 and convert it to 32x32
		LeNet5 net1(2);

		auto criterion = torch::nn::CrossEntropyLoss();
		auto optimizer = torch::optim::SGD(net1.parameters(), torch::optim::SGDOptions(0.001).momentum(0.9));
		tm_end = std::chrono::system_clock::now();
		printf("It takes %lld msec to prepare training handwriting.\n",
			std::chrono::duration_cast<std::chrono::milliseconds>(tm_end - tm_start).count());

		tm_start = std::chrono::system_clock::now();
		int64_t kNumberOfEpochs = 2;
		for (int64_t epoch = 1; epoch <= kNumberOfEpochs; ++epoch) {

			int i = 0;
			auto running_loss = 0.;
			for (torch::data::Example<>& batch : *data_loader) {

				auto inputs = batch.data;
				auto labels = batch.target;

				optimizer.zero_grad();
				// feed data to the network
				auto outputs = net1.forward(inputs);
				// Loss calculated by cross entropy
				auto loss = criterion(outputs, labels);
				// Feedback to the network, adjust the weight parameters for further optimization
				loss.backward();
				optimizer.step();

				running_loss += loss.item().toFloat();
				if ((i + 1) % 3000 == 0)
				{
					printf("[%lld, %5d] loss: %.3f\n", epoch + 1, i + 1, running_loss / 3000);
					running_loss = 0.;
				}

				i++;
			}
		}

		printf("Finish training!\n");
		torch::serialize::OutputArchive archive;
		net1.save(archive);
		archive.save_to("I:\\mnist.pt");
		printf("Save the training result to I:\\mnist.pt.\n");

		tm_end = std::chrono::system_clock::now();
		printf("It takes %lld msec to finish training handwriting!\n", 
			std::chrono::duration_cast<std::chrono::milliseconds>(tm_end - tm_start).count());
	}

output result

In the debug configuration, the speed is too slow. It is best to switch to the Release configuration, which will enable optimization, but the training still takes some time. There are 60,000 images to be trained, and it took a few minutes on my machine:

The result is not bad, after two rounds of training, the loss becomes relatively small.

Code interpretation

MNIST database description see http://yann.lecun.com/exdb/mnist/
train-images-idx3-ubyte: training set images
train-labels-idx1-ubyte: training set labels
t10k-images-idx3-ubyte: test set images
t10k-labels-idx1-ubyte: test set labels

pass

torch::data::datasets::MNIST("I:\\MNIST\\")

The train-images-idx3-ubyte/train-labels-idx1-ubyte will be loaded, and the train-image structure is as follows:

TRAINING SET IMAGE FILE (train-images-idx3-ubyte):
[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000803(2051) magic number
0004     32 bit integer  60000            number of images
0008     32 bit integer  28               number of rows
0012     32 bit integer  28               number of columns
0016     unsigned byte   ??               pixel
0017     unsigned byte   ??               pixel
........
xxxx     unsigned byte   ??               pixel
Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

And normalize each pixel into [0. ~ 0.1], and then pass the following statement:

	.map(torch::data::transforms::Normalize<>(0.5, 0.5))

normalize each pixel to [-1.0 ~ 1.0] for easy processing, which can also be expressed as the following formula:
Iˉ=image‾/255.0Dˉ=(Iˉ−0.5)/0.5 \bar I = \overline {image}/255.0\\ \bar D = (\bar I - 0.5)/0.5 Iˉ=image​/255.0Dˉ=(Iˉ−0.5)/0.5
Then use the following statement to convert 60000 rank 3 tensors (1x28x28) into rank 4 tensors (60000x1x28x28):

.map(torch::data::transforms::Stack<>())

For this kind of neural network for multi-classification, the cross-entropy loss function is often used:

auto criterion = torch::nn::CrossEntropyLoss();

The following code is used for training, and the output results are calculated using the cross-entropy loss function and the true label, and the information is fed back to the network through the derivation of the loss function, and the parameters are adjusted by the Stochastic Gradient Descent optimizer. , so as to achieve the purpose of training optimization and learning:

	// Optimizer Gradient Zeroing
	optimizer.zero_grad();
	// feed data to the network
	auto outputs = net1.forward(inputs);
	// Loss calculated by cross entropy
	auto loss = criterion(outputs, labels);
	// Feedback to the network, adjust the weight parameters for further optimization
	loss.backward();
	// The optimizer does network parameter adjustment
	optimizer.step();

After the final training is completed, save the training results for the next loading and use.

	torch::serialize::OutputArchive archive;
	net1.save(archive);
	archive.save_to("I:\\mnist.pt");

Load training results and testing

The training results have been obtained before and can be loaded with the following code:

	{
		tm_start = std::chrono::system_clock::now();
		LeNet5 net1(2);
		torch::serialize::InputArchive archive;
		archive.load_from("I:\\mnist.pt");

		net1.load(archive);

		auto dataset = torch::data::datasets::MNIST("I:\\MNIST\\", torch::data::datasets::MNIST::Mode::kTest)
			.map(torch::data::transforms::Normalize<>(0.5, 0.5))
			.map(torch::data::transforms::Stack<>());
		auto data_loader = torch::data::make_data_loader(std::move(dataset));

		int total_test_items = 0, passed_test_items = 0;
		for (torch::data::Example<>& batch : *data_loader)
		{
			// Process the test data with the trained network
			auto outputs = net1.forward(batch.data);
			// get the predicted value, 0 ~ 9
			auto predicted = torch::max(outputs, 1);
			// Get label data, 0 ~ 9
			auto labels = batch.target;
			// Compare forecast and actual results and update statistics
			if (labels[0].item<int>() == std::get<1>(predicted).item<int>())
				passed_test_items++;

			total_test_items++;

			//printf("label: %d.\n", labels[0].item<int>());
			//printf("predicted label: %d.\n", std::get<1>(predicted).item<int>());
			//std::cout << std::get<1>(predicted) << '\n';

			//break;
		}
		tm_end = std::chrono::system_clock::now();
		
		printf("Total test items: %d, passed test items: %d, pass rate: %.3f%%, cost %lld msec.\n", 
			total_test_items, passed_test_items, passed_test_items*100.f/total_test_items,
			std::chrono::duration_cast<std::chrono::milliseconds>(tm_end - tm_start).count());
	}

output result

10000 test pictures, about 8 seconds, the average recognition of each picture is 0.8ms, which is still very fast!

Tags: C++ Machine Learning neural networks Pytorch Deep Learning libtorch

Posted by chrys on Wed, 25 May 2022 15:51:55 +0300