基本介紹
- 中文名:AC自動機
- 外文名:Aho-Corasick automaton
- 拼音:ei si zi dong ji
- 性質:多模匹配算法
- 產生時間:1973年
- 產生地:貝爾實驗室創造
套用,案例,
套用
一個常見的例子就是給出n個單詞,再給出一段包含m個字元的文章,讓你找出有多少個單詞在文章里出現過。
如果你對KMP算法了解的話,應該知道KMP算法中的next函式(shift函式或者fail函式)是乾什麼用的。KMP中我們用兩個指針i和j分別表示,A[i-j+ 1..i]與B[1..j]完全相等。也就是說,i是不斷增加的,隨著i的增加j相應地變化,且j滿足以A[i]結尾的長度為j的字元串正好匹配B串的前 j個字元,當A[i+1]≠B[j+1],KMP的策略是調整j的位置(減小j值)使得A[i-j+1..i]與B[1..j]保持匹配且新的B[j+1]恰好與A[i+1]匹配,而next函式恰恰記錄了這個j應該調整到的位置。同樣AC自動機的失敗指針具有同樣的功能,也就是說當我們的模式串在Trie上進行匹配時,如果與當前節點的關鍵字不能繼續匹配,就應該去當前節點的失敗指針所指向的節點繼續進行匹配。
案例
Problem Description
In the modern time, Search engine came into the life of everybody.Wiskey also wants to bring this feature to his image retrieval system.Every image have a long description, when users type some keywords to find the image, the system will match the keywords with description of image and show the image which the most keywords be matched. To simplify the problem, giving you a description of image, and some keywords, you should tell me how many keywords will be match.
Input
The last line is the description, and the length will be not longer than1000000.
Output
Print how many keywords are contained in the description.
自動機 C++ 原始碼
#include<iostream>using namespace std;const int kind=26;struct node{ node *fail;//失敗指針 node *next[kind];//Trie每個節點的26個子節點(最多26個字母) int count;//是否為該單詞的最後一個節點 node(){//構造函式初始化 fail=NULL; count=0; memset(next,NULL,sizeof(next)); }} *q[1000000];//佇列,方便用於bfs構造失敗指針char keyword[51];//輸入的單詞char str[1000001];//模式串~~~~~~~~~~~~~~~~~~~~~~~~
自動機Pascal模組
POJ1204(AC自動機模板題)
題意:給一個N行長為M的字元串,給你一些需要去匹配的字元串,從任意一個字元串開始可以有八個方向,向上為A,順時針依次是A——H,問你去匹配的字元串在給你的N*M字元串中的坐標是怎么樣的。
代碼:
const maxnodes=500000;var fx:array[1..8]of char=('E','F','G','H','A','B','C','D'); t:array[0..maxnodes,'A'..'Z']of longint; f,q,w:array[0..maxnodes]of longint; e:array[0..1001,0..1001]of char; s:array[0..1001]of char; colu,line:array[0..1]of longint; done:array[0..1001]of boolean; ans:array[0..1001]of record x,y,f:longint; end; n,m,num,u,i,j,k,l,r,root,size,x,y,p,li,co,tmp,len:longint; c:char;function nx(varx,y:longint;f:longint):char;begin case f of 1:dec(x); 2:begin dec(x); inc(y); end; 3:inc(y); 4:begin inc(x); inc(y); end; 5:inc(x); 6:begin inc(x); dec(y); end; 7:dec(y); 8:begin dec(x); dec(y); end; end; if(x<1)or(x>n)or(y<1)or(y>m)then exit('!'); exit(e[x,y]);end; begin readln(n,m,num); line[0]:=1; line[1]:=n; colu[0]:=1; colu[1]:=m; for i:=1 to n do begin for j:=1 to m do read(e[i,j]); readln; end; root:=0; size:=0; for i:=1 to num do begin len:=0; while not eoln do begin inc(len); read(s[len]); end; p:=root; for j:=len downto 1 do begin c:=s[j]; if t[p,c]=0 then begin inc(size); t[p,c]:=size; end; p:=t[p,c]; end; inc(r); q[r]:=t[root,c]; f[q[r]]:=root; end; while l<r do begin inc(l); u:=q[l]; for c:='A' to 'Z' do if t[u,c]<>0 then begin inc(r); q[r]:=t[u,c]; p:=f[u]; while(p<>root)and(t[p][c]=0)do p:=f[p]; f[q[r]]:=t[p,c]; end; end; for k:=1 to 8 do for li:=0 to 1 do for co:=1 to m do begin x:=line[li]; y:=co; p:=root; c:=e[x,y]; while true do begin while(p<>root)and(t[p][c]=0)do p:=f[p]; p:=t[p,c]; tmp:=p; while tmp<>root do begin if(w[tmp]>0)and(not done[w[tmp]])then begin ans[w[tmp]].x:=x; ans[w[tmp]].y:=y; ans[w[tmp]].f:=k; done[w[tmp]]:=true; end; tmp:=f[tmp]; end; c:=nx(x,y,k); if c='!' then break; end; end; for k:=1 to 8 do for co:=0 to 1 do for li:=1 to n do begin x:=li; y:=colu[co]; p:=root; c:=e[x,y]; while true do begin while(p<>root)and(t[p,c]=0)do p:=f[p]; p:=t[p,c]; tmp:=p; while tmp<>root do begin if(w[tmp]>0)and(not done[w[tmp]])then begin ans[w[tmp]].x:=x; ans[w[tmp]].y:=y; ans[w[tmp]].f:=k; done[w[tmp]]:=true; end; tmp:=f[tmp]; end; c:=nx(x,y,k); if c='!'then break; end; end; for i:= 1 to num do writeln(ans[i].x-1,' ',ans[i].y-1,' ',fx[ans[i].f]);end.