Text generator (AC automata + DP)

Everyone said that this question was a routine board of dp on AC automata, but they couldn't understand the analysis given by them (it may be because of beginner...)

I finally understood it for a long time and wrote this explanation to benefit my compatriots like me (I should be the only one to cook...)

Simplified meaning:

Give you \ (n \) pattern strings. You need to generate a string with a length of \ (m \) so that at least one pattern string can be matched successfully. Ask the total number of feasible generation schemes and take a module of 10007.

Multi string matching, counting, dp + AC automata (it's quite obvious here)

But how?

thinking

What kind of string makes at least one pattern string match? This thing is too difficult to deal with.

If it is difficult, it will be difficult. ---- the famous four character idiom in OI

It may be converted to find that the number of schemes without a pattern string can match successfully is \ (sum \). (complement conversion)

Then the scheme of a string without a pattern string is \ (26^m - sum \)

Firstly, an AC automata is established with \ (n \) pattern strings

Think about when there will be a text string so that no pattern string can match successfully?
//This is the code for AC automata to match.
//This function will output how many pattern strings match the text string successfully
void GetAns()
{
	int len = strlen(a),now = 0 , ans = 0;
	for(int i = 0 ; i < len  ; i ++)
	{
		int num = a[i] - 'a';
		now = AC[now].son[num];
		for(int u = now ; AC[u].end != -1 && u ; u = AC[u].Fail)
		{
			ans += AC[u].end;
			AC[u].end = -1;
		}
	}
	cout << ans << endl;
	return ;
}

By observing the process of AC automata obtaining answers, we find that:

When accessing a node in a text string, we will keep skipping the \ (Fail \) of this point, the \ (Fail \) of this point (this is called the \ (Fail \) chain) until it jumps to the root or the point where the answer has been calculated (it has been skipped, and it will be repeated after jumping down).

Then the answer is added up to the number of pattern strings ending at the point you jumped to.

Assuming that the \ (Fail \) pointer of \ (i \) points to the point \ (j \), according to the definition of the \ (Fail \) pointer, the string formed by the path from the root node to \ (j \) on trie is the suffix of the string formed by the path from the root node to \ (i \) on trie

Then the answer is obviously feasible

So what if we want the answer to be 0?

That is the current point and the point to jump to. There is no pattern string ending with them. These are the points we want to choose. As for other points, we want to "avoid".

Consider how to DP

According to the routine (there is no way, the routine still needs to know), the general state setting of DP on AC automata is like this: \ (DP [i] [j] \) < ----- it means that AC automata takes \ (I \) step and the last one is \ (j \)

According to the above analysis, $DP[i][k] $will add up \ (DP[i - 1][j]\) (\(k \) is the son of \ (j \), and it is satisfied that \ (j \) does not have a pattern string ending with the point on its \ (Fail \) chain (including \ (j \))

Finally, count all the answers \ (\ sum {I = 0} ^ {I = cnt} {DP [M] [i]} that is, the answer with any node on AC automata as "j", and the text length is required to be m \)

The answer is \ ((26 ^ m - sum) \) \ (MOD \) \ (10007 \)

This is the end

Note that in the sense of modulus, the subtraction should add \ (Mod \) to prevent it from becoming negative. See the code for details.

Code

#include <bits/stdc++.h>
using namespace std;
int n,m,cnt = 0;
const int MAXN = 6005,MAXM = 105,Mod = 10007;//Constant assignment
char s[1005];//The given pattern string is stored with this
struct node{
	int end,Fail;
	int son[26];
}AC[MAXN];//AC automata
int vis[MAXN];//Things to use when creating a Fail pointer
int f[MAXM][MAXN];//DP array
void build()
{
	int len = strlen(s),now = 0;
	for(int i = 0 ; i < len ; i ++)
	{
		int num = s[i] - 'A';
		if(AC[now].son[num] == 0)
			AC[now].son[num] = ++cnt;
		now = AC[now].son[num];
	}
	AC[now].end = 1;
}//Establish AC automata

void GetFail()
{
	int now = 0 , head = 0 , tail = 0;
	for(int i = 0 ; i < 26 ; i ++)
		if(AC[0].son[i])
			tail ++ , vis[tail] = AC[0].son[i];
	while(head < tail)
	{
		head ++;
		int v = vis[head];
		for(int i = 0 ; i < 26 ; i ++)
		{
			if(AC[v].son[i])
			{
				AC[AC[v].son[i]].Fail = AC[AC[v].Fail].son[i];
				tail ++;
				vis[tail] = AC[v].son[i];//Ordinary AC automata can be established
				AC[AC[v].son[i]].end |= AC[AC[AC[v].son[i]].Fail].end;//Here, the or operation is used to find out whether there is a point on the Fail chain that is the end of the pattern string
			}
			else AC[v].son[i] = AC[AC[v].Fail].son[i];
		}
	}
	return ;
}

int quick_power(int x,int y){
	int ans = 1 , op = x;
	if(y == 2)return x*x;
	if(x == 0)return 0;
	while(y){
		if(y % 2 == 1)ans *= op , ans %= Mod;
		op *= op , op %= Mod;
		y = y >> 1;
	}
	return ans % Mod;
}

void DP()
{
	f[0][0] = 1;
	for(int i = 1 ; i <= m ; i ++)
		for(int j = 0 ; j <= cnt ; j ++)
			if(!AC[j].end)//Obviously, we can't dynamically plan illegal points
			{
				for(int k = 0 ; k < 26 ; k ++)
				f[i][AC[j].son[k]] =( f[i][AC[j].son[k]] + f[i - 1][j] )% Mod;
			}
	int ans = 0;
	for(int j = 0 ; j <= cnt ; j ++)
		if(!AC[j].end)ans += f[m][j],ans %= Mod;
	cout <<(quick_power(26,m) - ans + Mod )% Mod;//Add Mod here, or you'll die
}

int main()
{
	cin >> n >> m;
	for(int i = 1 ; i <= n ; i ++)
	{
		cin >> s;
		build();
	}
	GetFail();//This is where Fail is built
	DP();//This is where DP is conducted
	return 0;
}

Posted by richza on Tue, 10 May 2022 05:59:11 +0300