Hash table (discrete table)

Hash table is generally a data structure used to reduce the data space.

For example, there are some numbers in the range of -1e9~1e9. We need to know whether a number has appeared, so we need to use the Hash table.

Generally, we use the method of moduling the maximum prime number within an acceptable range to Hash (if our data can be accepted in 1e5, find the first prime number greater than 1e5 and modulo all primitive numbers to this prime number).

Generally, Hash tables have conflicting positions, that is, multiple numbers share one position. Under normal circumstances, there are two ways to reduce conflicts:

1. Zipper method
The zipper method uses array + linked list, that is, if the Hash table has multiple numbers in one position, they can be placed on a linked list in this position. Generally, the number of nodes on the linked list will not be too many, so the expected query complexity is O(1).

(note that the linked list here is implemented by array, because array implementation is very convenient and faster. This method is also used for linked forward stars)

code:

```#include<bits/stdc++.h>
using namespace std;

const int N = 100003; //The first prime greater than 1e5.

int e[N], head[N], ne[N], idx;

void hash_insert(int x)
{
int r = (x % N + N) % N; //Note here that you need to mold n first, then + N, and then
//Modulo N to prevent negative modulo.
e[idx] = x;
head[r] = idx ++;
}

bool hash_find(int x)
{
int r = (x % N + N) % N;
for(int j = head[r]; ~j; j = ne[j])
{
if(e[j] == x)
return true;
}
return false;
}

int main()
{
// For (int j = 100000;; j + +) / / find the first prime number larger than the range.
// {
// 	int count = 0;
// 	for(int k = 2; k * k <= j; k ++)
// 	{
// 		if(j % k == 0)
// 			{
// 				count = 1;
// 				break;
// 			}
// 	}
// 	if(!count)
// 	{
// 		cout << j << endl;
// 		break;
// 	}
// }

int n, m;
cin >> n >> m;
for(int i = 0; i < n; i ++)
{
int num;
cin >> num;
hash_insert(num);
}
while(m --)
{
int p;
cin >> p;
if(hash_find(p))
cout << "Yes" << endl;
else
cout << "No" << endl;
}
return 0;
}
```
This method is a bit like looking for a seat. If the seat I should sit in is already occupied, I will find the first vacant seat after that. In this way to reduce the location of conflict. The advantage of this method is that only one array is used.

Note: generally, this method needs to be opened to about three times the limit range. If the current limit range is 1e5, it needs to be opened to 3e5, and then find the first prime number larger than 3e5, the module prime number. Pay attention to the initialization of the array!

code:

```#include<bits/stdc++.h>
#define inf 0x3f3f3f3f / / a number larger than the data range needs to be set here. The assumption here is - 1e9~1e9,
//0x3f3f3f3f3f is greater than 1e9;
using namespace std;

const int N = 300007; //Find the first prime greater than 3e5.

int kf[N];

int hash_find(int x)
{
int r = (x % N + N) % N;
while(kf[r] != inf && kf[r] != x)  //Find the first empty location.
{
r ++;
if(r == N)
r = 0;    //It should be noted here that if the boundary is not found, you need to continue to find it from the beginning.
}
return r;
}

int main()
{
// For (int j = 300000;; j + +) / / find the first prime number larger than the range.
// {
// 	int count = 0;
// 	for(int k = 2; k * k <= j; k ++)
// 	{
// 		if(j % k == 0)
// 			{
// 				count = 1;
// 				break;
// 			}
// 	}
// 	if(!count)
// 	{
// 		cout << j << endl;
// 		break;
// 	}
// }

int n, m;
cin >> n >> m;
memset(kf, 0x3f, sizeof kf); //The reason why it is not initialized to inf here is that memset is initialized by bytes. An int has four bytes
//Therefore, as long as each byte is initialized to 0x3f, the array can be initialized to 0x3f3f3f3f.
for(int i = 0; i < n; i ++)
{
int num;
cin >> num;
int k = hash_find(num);
kf[k] = num;
}
while(m --)
{
int p;
cin >> p;
int o = hash_find(p);
if(kf[o] != inf)    //If it is not empty, there are several in this position.
cout << "Yes" << endl;
else
cout << "No" << endl;
}
return 0;
}
```

Tags: hash

Posted by katierosy on Sat, 14 May 2022 04:59:47 +0300