# Concept of string

String: a finite sequence of zero or more characters, also known as a string

## Some strings

Empty string( null string)，It is zero in length and can be directly represented by two pairs of quotation marks "" The so-called sequence shows that the adjacent characters of the string have the relationship between predecessor and successor. There are still some concepts to explain. A space string is a string that contains only spaces. Note the difference between it and empty string. The space string has content and length, and can have more than one space. Substring and main string. The subsequence composed of any number of continuous characters in the string is called The substring of the string. Accordingly, the string containing the substring is called the main string. The position of the substring in the main string is the sequence number of the first character of the substring in the main string, "over","end" ~ * lie " In fact, it can be considered as"lover " friend" believe" Substrings of these word strings

# Naive matching algorithm

## Examples

## reflection:

The question itself emphasizes c In language strstr function Concept: definition: strstr(str1,str2) Function to determine the string str2 Is it str1 Substring of. If yes, the function returns str2 stay str1 The address that first appears in the; Otherwise, return NULL.

The first thing we think of is a simple and rough solution to match slowly. Start from the first character of the main string and match the string all the way back. If you encounter a different direct stop, then start to match backward from the second element of the main string. You can complete it by knowing that it is a substring and finding its starting position

Here's the code

int strStr (char * haystack, char * needle){ int len1 = strlen(haystack); int len2 = strlen(needle); if(len2 == 0) //The first step is to find the length of two strings. If the length of the shorter string is zero, we directly return 0-false return 0; int i = 0,j = 0; //Pay attention to the conditions for judging the end of the while loop //The reason why we don't use the for loop is that we don't know when i or j will move when reading the string // Double pointer method i and j point to long string and short string respectively while (haystack[i] != '\0' && needle[j] != '\0') { if(haystack[i] == needle[j]) { i++; j++; //If equal, i and j move together } else { i = i - j + 1; //The long string returns to the position where the comparison started j = 0;//Substring starts from scratch // } if (j == len2) //Substring end return i-j; //The long string pointer minus the substring pointer is the initial letter, because the substring length is j } return -1;//If the judgment fails, return - 1 }

This method is too slow! Finding the position of a substring requires too many steps. And many steps are not necessary at all.

This idea is very good!! Many great ideas are born in the process of perfecting and correcting the existing methods step by step. In the best case, the time complexity of this algorithm is O(n). That is, the n characters of the substring are exactly equal to the first n characters of the main string, and in the worst case, the time complexity is O(m*n). In contrast, the space complexity of this algorithm is O(1), that is, it consumes no space but time.

**Now let's get to our topic: how KMP algorithm optimizes these steps. In fact, the main idea of KMP is: "space for time".

**

# KMP matching algorithm

## concept

The longest equal prefix and suffix of a string:

**The formula given in the book is as follows**

Simple textbooks are boring. My understanding is as follows

String abcdab

Set of prefixes: {a,ab,abc,abcd,abcda}

Set of suffixes: {b,ab,dab,cdab,bcdab}

Then the longest equal prefix and suffix is ab, which is the essence of this formula!

## kmp

The so-called KMP algorithm:

**It is to build a next array to judge the backtracking position of J pointer. In our simple algorithm, j is backtracked every time

The initial position of the substring, but kmp is to avoid unnecessary backtracking

**

**The string before each character has the longest prefix, and the length of the longest prefix is the key to our shift, so we use a next array to store the length of the longest prefix of the substring. Moreover, the value of the next array is only related to the substring itself.

Therefore, next[i]=j means that the length of the longest phase of the string before the character with subscript I is j.

We can calculate that the next array of substring t= "abcabcmn" is next [0] = - 1 (there is no separate string processing in the front)

next[1]=0；next[2]=0；next[3]=0；next[4]=1；next[5]=2；next[6]=3；next[7]=0；

**

The code is as follows, which needs to be understood

## code

void getNext(int* next, char* s) { //Initialize next int j = 0; next[0] = j; for (int i = 1; i < strlen(s); i++) { //Note that i starts with 1 //If the front suffixes are different while (j > 0 && s[i] != s[j]) { //Then go back forward j = next[j - 1]; } //If the front suffix is the same if (s[i] == s[j]) { //i and j move backward at the same time (the increase of i is in the for loop) j++; } //Assign j (length of prefix) to next[i] next[i] = j; } } int strStr(char * haystack, char * needle){ int len1 = strlen(haystack); int len2 = strlen(needle); //When the need is an empty string, 0 is returned if (len2 == 0) { return 0; } //Build next array int* next = malloc(sizeof(int) * len2); getNext(next, needle); //The starting position of the next record is 0, so it also starts from 0 here int j = 0; for (int i = 0; i < len1; i++) { //Note that i starts from 0 when matching //If not while (j > 0 && haystack[i] != needle[j]) { //Return to the previous matching position j j = next[j - 1]; } //If match if (haystack[i] == needle[j]) { //i and j move backward at the same time (the increase of i is in the for loop) j++; } //When j is equal to the length of the need, the string need appears in the string haystack if (j == len2) { //Returns the first position of the need string return (i - len2 + 1); } } //If it is not found, it indicates that it does not exist and returns - 1 return -1; }