[basic algorithm] C + + implementation of Huffman tree and Huffman coding (attach relevant knowledge points of Huffman tree and priority queue)

Write in front

The course of algorithm course this semester is actually an extension of undergraduate algorithm course. Although many algorithms were known before, they didn't have a deep understanding, and they didn't have the awareness of hands-on practice before, so this semester try to implement each important algorithm as much as possible, and record the implementation process and code.

code implementation

#include<iostream>
#include<queue>
#include<vector>
#include<map>
#include<string>
using namespace std;
map<int,string>HuffCode;
struct TreeNode{
	int val;
	TreeNode* left;
	TreeNode* right;
	TreeNode(int v):val(v),left(NULL),right(NULL)
	{}
	TreeNode():val(0),left(NULL),right(NULL)
	{}	
//I don't know why there is a problem with the overloaded < operator in the structure 
//	bool operator<(const TreeNode& node)
//	{
//		return val>node.val;
//	}
};
struct cmp{
	//Priority queues are overloaded with the < sign, but note here that val > node - > val is actually the priority with the smallest val value. 
	bool operator()(TreeNode* p1,TreeNode* p2)
	{
		return p1->val>p2->val;
	}
}; 
void PreOrder(TreeNode* node,string s)
{
	if(node!=NULL)
	{
		cout<<node->val<<" ";
	    PreOrder(node->left,s+'0');
	    PreOrder(node->right,s+'1');
	    if(node->left==NULL&&node->right==NULL)
		{
			HuffCode[node->val]=s;
		}
	}
}
int main()
{
	priority_queue<TreeNode*,vector<TreeNode*>,cmp>Huffman;
	int n;
	TreeNode* CurrentNode1,*CurrentNode2,*Head,*NewNode;
	int* input=new int[n];
	cin>>n;
	for(int i=0;i<n;i++)
	{
		cin>>input[i];
	}
	for(int i=0;i<n;i++)
	{
		TreeNode* node=new TreeNode(input[i]);
		Huffman.push(node);
	}
	while(!Huffman.empty())
	{
		CurrentNode1=Huffman.top();
		Huffman.pop();
		if(Huffman.empty())
		{
			Head=CurrentNode1;
			break;
		}
		else
		{
			CurrentNode2=Huffman.top();
			Huffman.pop();
		}
		NewNode=new TreeNode;
		NewNode->left=CurrentNode1;
		NewNode->right=CurrentNode2;
		NewNode->val=CurrentNode1->val+CurrentNode2->val;
		Huffman.push(NewNode);
	}
	PreOrder(Head,"");
	cout<<endl;
	for(auto i=HuffCode.begin();i!=HuffCode.end();i++)
	{
		cout<<(*i).first<<" "<<(*i).second<<endl;
	}
	delete[] input;
	return 0;
}

Supplement of relevant knowledge points

(1) Huffman coding and Huffman tree

(2) Priority queue

Huffman coding problem needs to be solved by greedy algorithm. Greedy algorithm usually selects the smallest two frequencies for merging every time. Based on this problem-solving idea, it can be solved with the priority queue in STL. The priority queue is to select the smallest two frequencies for merging every time. But priority in STL_ There are several pits in the queue, which should be noted:
(1)priority_queue is also a kind of queue, which only needs #include.
(2)priority_ The purpose of queue is different from that of queue. Queue ensures the first in first out of input data, and priority_queue is to ensure that the data with the highest priority is obtained each time. In terms of use, queue can obtain the first and last elements of the queue through front and back, but priority_ The queue has only one top to get the data with the highest priority at the head of the queue (because there is no need to get the tail element). stack also obtains the top element by using top.
(3)priority_ The definition of priority in queue often needs to be customized, especially for objects such as structures. There are two ways to customize. One is to directly overload the < operator in the structure, and the other is to define a cmp structure outside and overload the < operator in this structure. I used the second method this time. I originally wanted to overload the < operator directly in the structure, but I found that overloading directly in the structure did not succeed (the data in the priority queue was not arranged in order from small to large). If you know the reason, you are welcome to comment in the comment area. Thank you very much.
(4)priority_ The format of the queue is priority_queue < Type, Container, Functional >, where Type is the data Type, Container is the Container for saving data, and Functional is the element comparison method. If the latter two parameters are not written, the Container defaults to vector and the comparison method defaults to operator <, that is, the priority queue is the large top heap and the queue head element is the largest.
(5)priority_ The use of overloaded operators in queue is different from that in normal times when we use sort. Take my code for example, the return in the comparison function in the code is return P1 - > val > P2 - > val; But in fact, it is a small top heap instead of a large top heap, that is, it is to place the TreeNode with small val value at the top of the team with high priority. When we usually use sort, return P1 - > val > P2 - > val; It means sorting from large to small. So we must pay attention to the difference between the two.

thinking

The idea is to use a priority queue. First, create a TreeNode node for each input data. The nodes corresponding to these data to be encoded are the leaf nodes of the Huffman tree constructed later. After that, build a while loop, and continue merging as long as there are two or more nodes in the priority queue (it seems that this while loop can also be used here, because the Huffman tree is merged n-1 times, and it can also be merged n-1 times directly). When merging each time, a new non leaf node is created. The left child and right child of the non leaf node are the two merged nodes. However, it should be noted that the merged nodes should continue to be push ed into the queue to participate in the subsequent node merging. Finally, the Huffman tree is constructed. Each time, 0,1 is recorded into the current string (left is 0 and right is 1) through sequential traversal, and the corresponding relationship between each leaf node and Huffman code is stored in the map.

Posted by RedDragon on Mon, 09 May 2022 16:18:02 +0300