Algorithm & Union find algorithm

This article mainly introduces the parallel search algorithm

1, Dynamic connectivity

Before introducing the algorithm, let's talk about what it is from and what problems it solves.
First look at the diagram to understand dynamic connectivity.

There are 10 points in the figure, one point pair is given at a time, for example (4, 3), which means that point 3 and point 4 are linked together. Is connected.

Next, there is another point pair (3, 8). Then 3 and 8 are connected. Similarly, there is a condition hidden at this time, and point 4 and point 8 are also connected. (of course, it is not drawn in the figure, only the connection per unit length is drawn)

At this point, we can summarize the following properties:

  1. Reflexivity: p and p are connected
  2. Symmetry: if p and Q are connected, then q and p are also connected.
  3. Transitivity: if P and q are connected, and q and r are connected, then p and r are also connected.

Next, (6, 5) normal operation. When (9, 4), when 9 and 4 are linked, because 4 and 8 are connected, 8 and 9 are connected. And is of unit length. Can draw. As shown in the figure

When the connection operation (8, 9) occurs again, the connected points set will show that point 8 and point 9 have been connected.
When the number increases, there will be more complex and chaotic situations, so as shown in the figure

In this kind of problems, we can meet the problems that need to be solved in reality, such as networks and mazes. For dynamic connectivity, we must have our own understanding.

2, Solve the problem (and check the set)

Baidu definition:
And look up the set. In some set application problems with N elements, we usually make each element form a single element set at the beginning, and then merge the sets of elements belonging to the same group in a certain order. In the meantime, we should repeatedly find out which set an element is in. This kind of problem has repeatedly appeared in the international and domestic competitions of Informatics in recent years. Its characteristic is that it seems not complex, but the amount of data is huge. If it is described with normal data structure, it is often too large in space for computers to bear; It takes only 3 seconds to describe the results of the test set, even if it is impossible to run the test set in the specified time.

The joint query set is a tree data structure, which is used to deal with the merging and query of some Disjoint Sets. It is often represented by forest in use.

The idea of adding nodes to a set is simply to connect them one by one. If appear The set will also change when the two points are dynamically connected. This set can be used as the data basis for finding the connection between two points. Therefore, each time a new point connection is made, the operation will be as follows:
First go to the collection to find out whether the link exists. If so, do not add it; If not, the link is added and a new link may be formed.


According to the above ideas, the simple idea is to use an array to correspond to its points and point correspondence.

  • Member: a collection that stores connected points.
  • Method: find the connection of a point; Judge whether the two points are linked; Add new link points to the collection
public abstract class UnionFind {
	protected int[] id;
	protected int count;//Number of all points
	
	public UnionFind(int N) {
		//Initialize component id array
		this.count = N;
		this.id = new int[N];
		for (int i = 0; i < N; i++) {
			id[i] = i;
		}
	}
	
	//Find the link of this point
	public abstract int find(int p);
	
	//Merge new links
	public abstract void union(int p, int q);

	public int getCount() {
		return count;
	}
	
	//Judge whether two points are linked
	public boolean connected(int p, int q) {
		return find(p) == find(q);
	}
	
}

Among them, we note that a point can be linked with multiple points to form a corresponding relationship. So how to express this relationship with an array? And is search just a simple search?
Combined with the above two operations, we have different ways to achieve this goal.

3, Quick find method

Yes, the main preference here is to find it quickly.

public class Quick_find extends UnionFind {

	public Quick_find(int N) {
		super(N);
	}

	@Override
	public int find(int p) {
		return id[p];
	}

	@Override
	public void union(int p, int q) {
		int pID = find(p);//Point p linked point
		int qID = find(q);//Point q linked point
		
		if (pID == qID) {
			return;//That is, the connection between these two points has been shown in the query set
		}
		//id[p] != id[q], at this time, the values of all elements equal to ID [P] in the array are changed to the values of id[q].
		for (int i = 0; i < id.length; i++) {
			if (id[i] == pID) {
				id[i] = qID;
			}
		}
		count--;
	}
}

It is very convenient to find here, that is, to obtain the array value directly, but merging the linked object is very troublesome at this time.
(4, 3) (3, 8) (9 , 4)


At this time, 3, 4 and 9 all point to 8, which will make it impossible to directly find the connection points of 3, 4 and 9 other than 8. For example, 3 and 4 are connected, but cannot be displayed directly in the set.

Therefore, the quick find method, where the find method is O (1) time, but the union method does need to traverse the array and modify the value, O(n) time.
Moreover, the more points, the higher the time complexity of the union method. Therefore, this kind of method is not suitable for large-scale connectivity problems.

4, Quick union method

Different from the above method, quick union is more inclined to add new connection point pairs.

Although the corresponding connection relationship is still stored in an array, it forms a tree structure.
This time, the array stores the points connected by each point. At the same time, it is used as the root node of the point in the logically formed tree.

The advantage of this method is that each time a new connection is added, only the value of one point is modified. So quick Union
However, when finding the connection root node of each point, the burden of finding is naturally increased. Different from quick find

Code implementation:

public class Quick_union extends UnionFind {

	public Quick_union(int N) {
		super(N);
	}

	@Override
	public int find(int p) {
		//Find the root node of the group where the p node belongs. The root node has the property id[root] = root
		while (p != id[p]) {
			p = id[p];
		}
		return p;
	}

	@Override
	public void union(int p, int q) {
		int pRoot = find(p);
		int qRoot = find(q);
		
		if (pRoot == qRoot) {//If the root nodes of the two points are the same, it means that they are in the same tree, and the two points are connected
			return;
		}
		id[pRoot] = qRoot; // Change a tree (i.e. a group) into a subtree of another lesson tree (i.e. a group)
		count--;
	}

}

Then came the question. When faced with large-scale data connectivity problems, is quick find necessarily slower than quick Union?
not always. Because if there are some extreme cases, because the depth of the tree is very deep, the find method is O(n). As shown in the figure

Although the above situation is extreme, it does mean that the depth of the tree determines the time complexity of the find method. So is there a solution?
The easy answer is to limit the height of the tree.

5, Weighted quick union method

As shown in the figure, what we solve is the merging of two trees.

The code is as follows:

public class WeightQuick_union {
	private int[] id;
	private int[] sz;//Tree height of each root node
	private int count;
	
	public WeightQuick_union(int N) {
		this.count = N;
		id = new int[N];
		for (int i = 0; i < N; i++) {
			id[i] = i;
		}
		sz = new int[N];
		for (int i = 0; i < N; i++) {
			sz[i] = 1;//At first, the height of each tree was 1
		}
	}

	public int getCount() {
		return count;
	}
	
	public boolean connected(int p, int q){
		return find(p) == find(q);
		
	}
	
	public int find(int p) {
		while(p != id[p]) {
			p = id[p];
		}
		return p;
	}
	
	public void union(int p, int q) {
		int pRoot = find(p);
		int qRoot = find(q);
		
		if (pRoot == qRoot) {
			return;
		}
		//Compare the height of the two numbers, add the small tree to the big tree, and modify the tree height
		if (sz[pRoot] < sz[qRoot]) {
			id[pRoot] = qRoot;
			sz[qRoot] += sz[pRoot];
		}else {
			id[qRoot] = pRoot;
			sz[pRoot] += sz[qRoot];
		}
		count--;
	}
	
}

The following figure shows the comparison of tree heights in weighted and unweighted quick Union

6, Summary

The algorithm of joint search set is an example learned from Algorithms.
Excellent guide: Introduction of union find algorithm
Articles on practical problems of merging and collecting: The maze algorithm and the shortest path solution are realized by find union

The author's level is limited. At present, he can only describe the above problems. If there are other situations, he can leave a message. If there are errors, please give advice. If there are continue to optimize, please share. Thank you!
Is this article rewarding? Is reading comfortable? What improvement suggestions? I hope you can leave me a message or a private letter. Your sharing is my progress. thank you.

September 8, 2020, 2nd floor of Network Security Institute

Tags: Java Algorithm

Posted by adhi_nugraha on Wed, 18 May 2022 06:16:46 +0300