# Algorithm and data structure (java data structure)

Data structure: the structure in which data is stored

Algorithm: how to operate data? It is more efficient and saves resources

## data structure

It can be divided into linear structure, tree structure and graph

Animation effect

### Linear table

Data is arranged like a line, mainly including array, linked list, queue, stack, etc

#### array

Linear table structure, which uses a continuous set of memory space to store the same type of data

Such as [1,2,3,4,5,6,7,8,9..., n]

```Logical structure: array a have n Elements, which can be expressed as( a1,a2,a3..an)

Physical structure: the array is stored in continuous memory space, and the subscript of each element is set for searching

data access :

a[i] = bassAddress + i * dataTypeSize  (Basic data address+subscript*Data type size, i.e int Type is 4 bytes)

The subscript starts at 0 because C The design starts from 0. If you start from 1, the computer will go one more time i-1 Subtraction of

Efficiency: because there are subscripts, data access will be efficient

Data insertion and deletion:

Each insertion and deletion will change the subscript of the data space and move the data, so the operation efficiency becomes inefficient

If you need to delete multiple data in algorithm improvement, if you can perform batch deletion at one time, you only need to move the data once.
Record the deleted data in sequence without deleting and moving. After recording, carry out statistical operation to move the data

as JVM Garbage collection algorithm

Container class java ArrayList    c++ Vector

ArrayList The bottom layer is realized by the array, which can dynamically expand the capacity and provide relevant operations to obtain, add and delete the array...

Source code analysis: the first time add When, the initial space size of the array is 10, add If the element exceeds the original array space, a 1 will be opened.5 Times the size of the new array, copy the new array to the original array
```

Discontinuous and non sequential storage structure. Each node has a data field and a pointer field pointing to the next node

Single linked list: 1 - > 2 - > 3 - > 4 - > 5 - > 6 - >... - > n

```Node: a data field and a pointer field pointing to the next node. When there is no subsequent node, the pointer field is saved null
performance :  There is no limit on the size. It naturally supports capacity expansion, but there is no subscript in random access. You can only traverse the whole linked list for query

add(),Add the new element to the end and create a node node(last,element,null) The predecessor is the previous node and the successor is null，
And set this node as the tail node last,If the precursor is also null Head node first It is also the new node

get(index),Through judgment index < (size>>1) ,If the obtained index is less than the middle value of the linked list, the first Follow up query next；
If greater than or equal to the intermediate value, from last Forward query precursor prev
```

Difference from ArrayList:

Random accessBased on subscript calculation, it can be accessed quickly and has high efficiencyIt is inefficient to query according to the address pointer according to the head or tail node in turn
Insert deleteWhen the memory needs to be expanded, the data needs to be deleted continuouslyThere is no limit on the size. When adding or deleting, you only need to change the pointer address of the predecessor and subsequent ones, which is efficient
Memory occupationOnly need to save data, smallNot only save the data, but also keep the two address pointers of the predecessor and the successor, which takes up a large space
```   //Reverse linked list
class node{
int val;
node next;
node(val){this.val=val;}
}
//Pre definition node
node prev = null;
//Get temporary node
while(temp!=null){
node cuur = temp.next;
temp.next = prev;
prev = temp;
temp = cuur;
}
return prev;
}
```

#### Stack

Features: first in first out, LAST IN FIRST OUT LIFO - "LAST IN FIRST OUT" is a restricted linear table

Data can only be operated at both ends of the stack. Data insertion and deletion are called entering and leaving the stack

Top of stack < – > 1, 2, 3, 4, 5, 6,..., n] bottom of stack

realization:
Stack source code analysis:

```Container class
class Stack<T> {
//common method
length
int size();
Add an element to the top of the stack
void push(T t);
Remove top element
T pop();
View top element
T peek();
}
```

#### queue

Features: FIFO - "FIRST IN FIRST OUT" is a restricted linear table

```Data from the tail
realization:
Applications: thread pool, cache, concurrent access
```

List - > N,..., 5, 4, 3, 2, 1 - > List

#### Hash table (hash table)

Features: a collection of key value structures

Generally, key is defined by an array, and value is defined by a linked list

In the hash table, the key stores the keyhash value. When the keyhash conflicts, it will compare whether the keys are equal,

If the keyhash is equal and the key is not equal during the insertion operation, it will be added directly after the original value chain

If keyhash is equal and key is equal, the original value will be directly replaced;

### tree

Tree is a data structure, which is a finite set of n (n > = 0) nodes. When n=0, it is called an empty tree. n> 0, the elements of a finite set form a hierarchical data structure.

#### nominal construction

• Subtree: in addition to the root node, each child node can be divided into multiple disjoint subtrees.
• Children and parents: if a node has a subtree, the node is called the "parents" of the subtree root, and the root of the subtree is the "child" of the node.
• Brother: nodes with the same parents are brothers to each other.
• Node degree: the number of subtrees a node has.
• Leaves: no subtree, that is, nodes with degree 0.
• Branch node: nodes other than leaf nodes, that is, nodes whose degree is not 0.
• Internal node: a branch node other than the root node.
• Hierarchy: the root node is the first layer, and the hierarchy of other nodes is equal to the hierarchy of their parent nodes plus 1
• Tree height: also known as the depth of the tree, the maximum level of nodes in the tree.
• Ordered tree: the order between the sub trees of the nodes in the tree is important, and the positions cannot be exchanged at will.
• Disordered tree: the order between the sub trees of the tree node is not important. You can exchange positions at will.
• Forest: a collection of 0 or more disjoint trees.

[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-uEU823SS-1609394285762)(D:\documents\study_note \ algorithm and data structure \ tree noun. png)]

#### Binary tree

A tree with at most two subtrees is called a binary tree

1. Skew tree: a binary tree with only left subtree at all nodes is called a left skew tree, and a binary tree with only right subtree at all nodes is called a right skew tree. (the essence is a linked list)
2. Full binary tree: the degree of all non leaf nodes in the binary tree is 2, and the leaf nodes are on the same level
3. Complete binary tree: if a binary tree has the same structure as the first m nodes of the full binary tree, such a binary tree is called a complete binary tree

[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-zMFpyj4P-1609394285764)(D:\documents\study_note \ algorithm and data structure \ binary tree. png)]

##### inorder traversal
1. Preorder traversal root - left - right

```public void preOrder(TreeNode node){
if(node != null){
//TODO
System.out.println(node.val);
preOrder(node.left);
preOrder(node.right);
}
}
```
2. Middle order traversal left root right

```public void inOrder(TreeNode node){
if(node != null){
inOrder(node.left);
//TODO
System.out.println(node.val);
inOrder(node.right);
}
}
```
3. Postorder traversal left right root

```public void postOrder(TreeNode node){
if(node != null){
postOrder(node.left);
postOrder(node.right);
//TODO
System.out.println(node.val);
}
}
```
##### Binary search tree (BST)
• If the left subtree of any node is not empty, the value of all nodes on the left subtree is less than that of its root node;
• If the right subtree of any node is not empty, the value of all nodes on the right subtree is greater than that of its root node;
• The left and right subtrees of any node are also binary search trees;
• There are no nodes with equal key values.

The advantage of binary search tree over other data structures is that the time complexity of search and insertion is low, which is O (log ⁡ n). Binary lookup tree is a basic data structure, which is used to build more abstract data structures, such as sets, multiple sets, associative arrays and so on.  ##### Binary balanced tree AVL

The depth of left and right subtrees shall not exceed 1

Either it is an empty tree, or the absolute value of the depth difference between the left and right subtrees of its root node does not exceed 1;

Its left and right subtrees are also balanced binary trees;

The balance factor of a binary tree node is defined as the depth of the left subtree minus the depth of the right subtree. Then the balance factor of all nodes of the balanced binary tree can only be - 1,0,1. ##### Red black tree (RBT)

RED BLACK tree is a binary search tree. It adds a storage bit to each node to record the color of the node, which can be RED or BLACK; Through the color constraint on any simple path from root to leaf, the RED BLACK tree ensures that the longest path does not exceed twice the shortest path, so it is approximately balanced.

• Each node is either red or black. (red or black)
• The root node is black. (root black)
• Each leaf node (leaf node refers to NIL pointer or NULL node at the end of the tree) is black. (leaf black)
• If a node is red, its two sons are black. (red black)
• For any node, each path to the NIL pointer at the end of the leaf node tree contains the same number of black nodes. (same as black under the path)
• When adding a node, it is a red node by default. If its parent is a black node, it will be added directly. If it is red, adjust the number of black nodes in the subtree

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-bc2jmvUt-1609394285768)(D:\documents\study_note \ algorithm and data structure \ red black tree. png)]

```Container class: ConcurrentHashMap & TreeMap

HashMap of key When the number of conflicting nodes exceeds 8, the linked list->Red black tree, when less than 8, red black tree->Linked list, put Source code analysis
class HashMap{
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
/**
*Implement map And related methods
*
* @param hash hash value of key
* @param key key
* @param value value
* @param onlyIfAbsent If true, do not change the existing value
* @param evict，If false, the table is in create mode.
* @Returns the previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;//The initial key array of the expanded length is 16
if ((p = tab[i = (n - 1) & hash]) == null) //Hash keys do not conflict
tab[i] = newNode(hash, key, value, null); //Add node at the end
else {// hash key conflict
Node<K,V> e; K k;
// When the key values are equal
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;//Get replacement value
// Unequal key values
else if (p instanceof TreeNode)//Judge whether the current value is stored in red black tree
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);//Call tree add node
else {//Otherwise, it is a linked list
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {//When the current node is null
p.next = newNode(hash, key, value, null);//Increase at the bottom
if (binCount >= TREEIFY_THRESHOLD - 1) //When the 8th node is inserted,
treeifyBin(tab, hash);//Convert to red black tree and expand the length of key array at the same time
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;//When the current node is equal to the direct end of the target node
p = e;
}
}

if (e != null) { //e != null.  Existing value
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)//Modify mode or original value is null
e.value = value;//Set to new value
afterNodeAccess(e);//Null function, which is the callback of LinkedHashMap post-processing
return oldValue;//Return original value
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);//Null function, which is the callback of LinkedHashMap post-processing
return null;
}

final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();//Initialize table or expand capacity
else if ((e = tab[index = (n - 1) & hash]) != null) {
TreeNode<K,V> hd = null, tl = null;
do {
TreeNode<K,V> p = replacementTreeNode(e, null);//Initialize the current node,
if (tl == null)
else {
p.prev = tl;
tl.next = p;
}
tl = p;
} while ((e = e.next) != null);
if ((tab[index] = hd) != null)
hd.treeify(tab);//Turn red black tree
}
}
/**
*Initialize or increase the size of the table. If null, allocate according to the initial capacity target maintained in the field threshold; Otherwise, since we use a power extension of 2, the elements in each bin must maintain the same index or offset the new table with a power of 2.
*/
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else {               // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
}
if (hiTail != null) {
hiTail.next = null;
}
}
}
}
}
return newTab;
}

//Red black tree node class
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
TreeNode<K,V> parent;  // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev;    // needed to unlink next upon deletion
boolean red;

/**
* Current chain header to red black tree
* @return root of tree
*/
final void treeify(Node<K,V>[] tab) {
TreeNode<K,V> root = null;
for (TreeNode<K,V> x = this, next; x != null; x = next) {
next = (TreeNode<K,V>)x.next;
x.left = x.right = null;
if (root == null) {
x.parent = null;
x.red = false;
root = x;
}
else {
K k = x.key;
int h = x.hash;
Class<?> kc = null;
for (TreeNode<K,V> p = root;;) {
int dir, ph;
K pk = p.key;
if ((ph = p.hash) > h)
dir = -1;
else if (ph < h)
dir = 1;
else if ((kc == null &&
(kc = comparableClassFor(k)) == null) ||
(dir = compareComparables(kc, k, pk)) == 0)
dir = tieBreakOrder(k, pk);

TreeNode<K,V> xp = p;
if ((p = (dir <= 0) ? p.left : p.right) == null) {
x.parent = xp;
if (dir <= 0)
xp.left = x;
else
xp.right = x;
root = balanceInsertion(root, x);
break;
}
}
}
}
moveRootToFront(tab, root);
}

/**
* Current red black tree chain transfer header
* @return root of tree
*/
final Node<K,V> untreeify(HashMap<K,V> map) {
Node<K,V> hd = null, tl = null;
for (Node<K,V> q = this; q != null; q = q.next) {
Node<K,V> p = map.replacementNode(q, null);
if (tl == null)
hd = p;
else
tl.next = p;
tl = p;
}
return hd;
}
}
}

```
##### Splay Tree

Self balanced binary search tree

##### Heap heap

Binary tree implemented with array (balanced binary tree), so it does not use parent pointer or child pointer. The heap is sorted according to the "heap attribute", which determines the position of nodes in the tree. If the root index is i (i=0. Is the root node of the tree), then the left subtree index = 2*i+1 and the right subtree index = 2*i+2; ```Container class: PriorityQueue
//insert
public boolean offer(E e) {
if (e == null)
throw new NullPointerException();
modCount++;
int i = size;
if (i >= queue.length)
grow(i + 1);
size = i + 1;
if (i == 0)
queue = e;
else
siftUp(i, e);
return true;
}
//Insert size transposition compared with parent class
private void siftUpUsingComparator(int k, E x) {
while (k > 0) {
// index of child is
int parent = (k - 1) >>> 1;
Object e = queue[parent];
if (comparator.compare(x, (E) e) >= 0)
break;
queue[k] = e;
k = parent;
}
queue[k] = x;
}
public E peek() {
return (size == 0) ? null : (E) queue;
}
public E poll() {
if (size == 0)
return null;
int s = --size;
modCount++;
E result = (E) queue;
E x = (E) queue[s];
queue[s] = null;
if (s != 0)
siftDown(0, x);
return result;
}
private E removeAt(int i) {
// assert i >= 0 && i < size;
modCount++;
int s = --size;
if (s == i) // removed last element
queue[i] = null;
else {
E moved = (E) queue[s];
queue[s] = null;
siftDown(i, moved);
if (queue[i] == moved) {
siftUp(i, moved);
if (queue[i] != moved)
return moved;
}
}
return null;
}
```
##### Huffman tree

The optimal path tree is to calculate the parent node through the leaf node. The sum of the two leaf nodes (from small to large) is the parent node. If this value exists, it is obtained directly. If it does not exist, the parent node is constructed

• Path and path length: the branches from one node to another in the tree form the path between two nodes. The number of branches on the path is called path length. If it is specified that the root node is located in the first layer, the path length from the root node to the node of layer h is H-1. If the path length to 40 is 1; The path length of 30 is 2; The path length of 20 is 3.

• Node weight: assign a value with a certain meaning to the node in the tree as the weight of the node, which is called the weight of the node;

• Weighted path length: the product of the path length from the root node to a node and the weight of the node. For example, the path length of node 10 in the above figure is 3, and its weighted path length is 10 * 3 = 30;

• Weighted path length of the tree: the weighted path length of the tree is the sum of the weighted path lengths of all leaf nodes, which is called WPL. WPL in the above figure = 1x40 + 2x30 + 3x10 + 3x20 = 190, and Huffman tree is the binary tree with the smallest weighted path of the tree. Huffman coding: on the path from the root node to each leaf node, the left branch is marked as 0 and the right branch is marked as 1. Connecting these 0 and 1 is the Huffman coding of leaf nodes

nodeHuffman coding
10100
20101
3011
400

Applying: file compression

#### Number of Trie tree prefixes and dictionaries

Branch extension management based on character prefix, similar to Hash table Application: keyword query

#### B tree

B-tree is a kind of data tree that can keep the balance of English. This data structure enables the actions of searching data, sequential access, inserting data and deleting to be completed in logarithmic time. B-tree, generally speaking, is a self balanced m-order tree. Different from self balanced binary search tree, B-tree is suitable for reading and writing storage systems with relatively large data blocks, such as disks.

• The root node has at least two children (degrees), which are extended and derived according to the maximum value of degrees.
• Each intermediate node contains k-1 elements and K children, where M / 2 < = k < = M
• Each leaf node contains k-1 elements, where M / 2 < = k < = M
• All leaf nodes are located on the same layer.
• The elements in each node are arranged from small to large, and k-1 elements in the node are just the value domain partition of the elements contained by K children.

[the external chain picture transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-to8cH9Kk-1609394285770)(D:\documents\study_note \ algorithm and data structure \ level 4 B-tree. jpg)]

#### B + tree

• B + tree is a kind of tree data structure, which is usually used in relational database (such as Mysql) and file system of operating system. The characteristic of B + tree is that it can keep the data stable and orderly, and its insertion and modification have relatively stable logarithmic time complexity. B + tree elements are inserted from bottom to top, which is just the opposite of binary tree.

• On the basis of B-tree, add the linked list pointer (B-tree + leaf ordered linked list) to the leaf node, all keywords appear in the leaf node, and the non leaf node is used as the index of the leaf node; B + trees always hit the leaf node.

• The non leaf nodes of the b + tree do not save data, but only save the critical value (maximum or minimum) of the subtree. Therefore, for nodes of the same size, the b + tree can have more branches than the b tree, making the tree more compact and less IO operations during query.

Optimize the B-Tree in the previous section. Since the non leaf node of B+Tree only stores key value information, assuming that each disk block can store 4 key values and pointer information, its structure after becoming B+Tree is shown in the following figure

5th order B + tree

[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-D8DB90Uh-1609394285772)(D:\documents\study_note \ algorithm and data structure \ level 5 B + tree. png)]

#### R tree

R-tree is a tree data structure used for spatial data storage. For example, index multidimensional data such as geographical location, rectangle and polygon.

The core idea of R-tree is to aggregate nodes with similar distance and represent them as the minimum circumscribed rectangle (MBR) of these nodes at the upper layer of the tree structure. This minimum circumscribed rectangle becomes a node of the upper layer. Because all nodes are in their smallest circumscribed rectangle, queries that do not intersect a rectangle must not intersect all nodes in the rectangle. Each rectangle on the leaf node represents an object. The nodes are the aggregation of objects, and the more objects are aggregated to the upper layer. Each layer can also be regarded as an approximation to the data set. The leaf node layer is the most fine-grained approximation, with a similarity of 100% with the data set. The coarser it goes to the upper layer.

It is suggested to save the image directly to the external link (v3-774g) of the anti-theft tree (it is suggested to save the image directly to the external link of the. 93g-notes algorithm)

### chart

A Graph is composed of a finite non empty set of vertices and a set of edges between vertices. It is usually expressed as: G(V,E), where g (Graph) represents a Graph, V (Vertices set) is the set of vertices in Graph G, and E(Edges set) is the set of edges in Graph G

noun

• Vertex: each point in the graph is called a vertex

• Edge: the relationship between each vertex and another vertex is called edge.

• Degree: degree, the number of edges each vertex has.

• Classification: according to the pointing relationship of edges, it can be divided into

• Directed graphs: the edge between any two vertices is a directed edge; Edges in a directed graph are represented by angle brackets "< >"; For example, < V1 / >; The degree of a directed graph is divided into

In degree: the number of edges pointing to the vertex

Out degree: the number of edges starting from the vertex

• Undirected graphs: the edge between any two vertices is an undirected edge; Edges in an undirected graph are represented by parentheses "()"; For example (V1,V2)

• Weight, the number of weights each edge has.

• Loop: if two vertices of an edge are the same, the edge is called a loop.

• Path: a path from vertex u to vertex v

• Connected graph: in an undirected graph, if any two vertices vivi and vjvj have paths connected, the undirected graph is called a connected graph.

• Strongly connected graph: in a directed graph, if any two vertices vivi and vjvj have paths connected, the directed graph is called strongly connected graph.

• Connected network: in a connected graph, if the edges of the graph have a certain meaning, each edge corresponds to a number, which is called weight; Weight represents the cost of connecting vertices. This kind of connected graph is called connected network.

• Spanning tree: the spanning tree of a connected graph refers to a connected subgraph, which contains all n vertices in the graph, but only enough n-1 edges to form a tree. A spanning tree with n vertices has only n-1 edges. If another edge is added to the spanning tree, it must be a ring.

• Minimum spanning tree: in all spanning trees of a connected network, the cost of all edges and the minimum spanning tree are called minimum spanning trees.  #### storage structure

The adjacency matrix of a graph is stored in two arrays to represent the graph. A one-dimensional array stores the vertex information in the graph, and a two-dimensional array (called Adjacency Matrix) stores the edge information in the graph.  Disadvantages: because the graph with n vertices needs n*n array elements to store, when the graph is sparse, a large number of 0 elements will appear using the adjacency matrix storage method, which will cause a great waste of space.

Each node in the table header and the adjacent node in the table header are stored in a corresponding vertex array. If the vertex corresponding to the header node has adjacent nodes, the adjacent nodes are successively stored in the one-way linked list pointed to by the header node. Similar to hashMap array + linked list   #### inorder traversal

Starting from the vertex, the adjacent vertices are accessed respectively, and the access is carried out smoothly at the level

##### Depth first algorithm DFS depth first search

Starting from the vertex, access a single adjacent vertex to the lowest layer, and then return to access other unreached vertices to the lowest layer ### String matching algorithm

#### BF Brute Force

Use substrings to match the main string one by one

```/**
* Storm matching Brute Force algorithm
*
* Match one by one for equality
* The main string t is divided into multiple substrings with the length of substring p, which are matched with substring p in turn
* @param t Main string
* @param p Substring
* @return  Returns the index of the first occurrence of the substring in the main string. - 1 indicates that there is no match
*/
public static int indexOfBF(String t,String p){
if(isEmpty(t) && isEmpty(p) && t.length() < p.length()){
return -1;
}
int a = 0;
int b = 0;
while (a < t.length() && b < p.length()){
if(t.charAt(a) == p.charAt(b)){
a++;
b++;
}
else{
a = a - b + 1;
b = 0;
}
}
if(b == p.length()){
return a - p.length();
}
return -1;
}
```

#### Hash table algorithm RK (Rabin, Karp) Rabin – Karp algorithm

The algorithm is used first Rotate hash Quickly screen out the text positions that cannot match the given string, and then check whether the remaining positions can be successfully matched. This algorithm can be extended to search for all matches of a single pattern string in text or multiple pattern strings in text.

```/**
* Hash matching RK algorithm
* It is an improvement of BF algorithm
* Thought: if the values of two strings after hash are different, they must be different; If they have the same value after hash, they are not necessarily the same.
*
* In practice, the main string t is divided into multiple substrings that are the same as the substring p, and the hash value of these substrings is calculated in turn. If the hash value is equal to the matching substring p, check whether each character is equal;
* @param t Main string
* @param p Substring
* @return  Returns the index of the first occurrence of the substring in the main string. - 1 indicates that there is no match
*/
public static int indexOfRK(String t,String p){
if(isEmpty(t) && isEmpty(p) && t.length() < p.length()){
return -1;
}
int p_len = p.length();
char[] t_chars = t.toCharArray();
char[] p_chars = p.toCharArray();
int p_hash = hash(p_chars, 26, 31, 0, p_len);
for (int i = 0; i < t.length() - p_len + 1; i++) {
int t_hash = hash(t_chars, 26, 31, i, p_len);
if(t_hash == p_hash){
for (int j = i; j < i+p_len; j++) {
if(t_chars[j] == p_chars[j-i]){
if(j == i+p_len-1){
return i;
}
}else{
break;
}
}
}
}
return -1;
}

/**
* DP Hash
* @param chars character set
* @param R  Hexadecimal 26, 16, 8, etc. strings generally use hexadecimal 26
* @param K  position
* @param begin  Start indexing
* @param len    length
* @return
*/
private static int hash(char[] chars,int R,int K,int begin,int len){
int hash = 0;
for (int i = begin; i < begin+len; i++) {
hash = hash*K + chars[i];
}
return hash%R ^ (hash >>> 16);
}
```

Posted by php-phan on Thu, 28 Apr 2022 20:28:54 +0300