基本介紹
- 中文名:Burrows-Wheeler變換
- 外文名:Burrows-Wheeler Transformation
- 簡稱:BWT
- 也稱:BurrowsWheeler變換
- 定義:可逆的字元串順序變換
概述
算法輸入 | SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES |
---|---|
算法輸出 | TEXYDST.E.IXIXIXXSSMPPS.B..E.S.EUSFXDIIOIIIT |
Burrows–Wheeler變換過程
Burrows–Wheeler變換過程 | |||||
---|---|---|---|---|---|
abracadabra | 0 1 2 3 4 5 6 7 8 9 10 | a b r a c a d a b r a b r a c a d a b r a a r a c a d a b r a a b a c a d a b r a a b r c a d a b r a a b r a a d a b r a a b r a c d a b r a a b r a c a a b r a a b r a c a d b r a a b r a c a d a r a a b r a c a d a b a a b r a c a d a b r | a a b r a c a d a b r a b r a a b r a c a d a b r a c a d a b r a a c a d a b r a a b r a d a b r a a b r a c b r a a b r a c a d a b r a c a d a b r a a c a d a b r a a b r a d a b r a a b r a c a r a a b r a c a d a b r a c a d a b r a a b | 10 7 0 3 5 8 1 4 6 9 2 | rdarcaaaabb |
#存入輸入序列@inputString = qw(a b r a c a d a b r a ); #構造全循環排列字元串#因為有字元數目相同的循環排列,所以循環11次for( $i = 0 ; $i <=11 ; $i ++ ){ $rotateStrings[$i] = join( "" , @inputString[ map( $_ - $i , 0 .. 10 ) ] );} #將全循環按照字典序排序@rotateIndex = sort { $rotateStrings[$a] cmp $rotateStrings[$b] } 0 .. 10 ; #取出每個排序後的循環排列字元串的最後一個字母,作為輸出foreach $_ ( @rotateIndex ){ $output .= substr($rotateStrings[$_],10,1);} #得到輸出結果:rdarcaaaabbprint "$output\n";
Burrows–Wheeler變換的逆過程
加列1 | 排序1 | 加列2 | 排序2 | 加列3 | 排序3 | 加列4 | 排序4 |
r d a r c a a a a b b | a a a a a b b c d r r | ra da aa ra ca ab ab ac ad br br | aa ab ab ac ad br br ca da ra ra | raa dab aab rac cad abr abr aca ada bra bra | aab abr abr aca ada bra bra cad dab raa rac | raab dabr aabr raca cada abra abra acad adab braa brac | aabr abra abra acad adab braa brac cada dabr raab raca |
加列5 | 排序5 | 加列6 | 排序6 | 加列7 | 排序7 |
raabr dabra aabra racad cadab abraa abrac acada adabr braab braca | aabra abraa abrac acada adabr braab braca cadab dabra raabr racad | raabra dabraa aabrac racada cadabr abraab abraca acadab adabra braabr bracad | aabrac abraab abraca acadab adabra braabr bracad cadabr dabraa raabra racada | raabrac dabraab aabraca racadab cadabra abraabr abracad acadabr adabraa braabra bracada | aabraca abraabr abracad acadabr adabraa braabra bracada cadabra dabraab raabrac racadab |
加列8 | 排序8 | 加列9 | 排序9 |
raabraca dabraabr aabracad racadabr cadabraa abraabra abracada acadabra adabraab braabrac bracadab | aabracad abraabra abracada acadabra adabraab braabrac bracadab cadabraa dabraabr raabraca racadabr | raabracad dabraabra aabracada racadabra cadabraab abraabrac abracadab acadabraa adabraabr braabraca bracadabr | aabracada abraabrac abracadab acadabraa adabraabr braabraca bracadabr cadabraab dabraabra raabracad racadabra |
加列10 | 排序10 | 加列11 | 排序11 |
raabracada dabraabrac aabracadab racadabraa cadabraabr abraabraca abracadabr acadabraab adabraabra braabracad bracadabra | aabracadab abraabraca abracadabr acadabraab adabraabra braabracad bracadabra cadabraabr dabraabrac raabracada racadabraa | raabracadab dabraabraca aabracadabr racadabraab cadabraabra abraabracad abracadabra acadabraabr adabraabrac braabracada bracadabraa | aabracadabr abraabracad abracadabra acadabraabr adabraabrac braabracada bracadabraa cadabraabra dabraabraca raabracadab racadabraab |
#輸入bwt的輸出結果@ibwtInput = qw(r d a r c a a a a b b); #Perl允許不聲明的構造空串for( $i = 0 ; $i <= 10 ; $i ++ ){ for( $j = 0 ; $j <= 10 ; $j ++){ #將bwt的結果放在串的每一個行的最前一列 $inverseStrings[$j] = $ibwtInput[$j].$inverseStrings[$j];}#將串進行字典序排序 @inverseStrings = sort {$a cmp $b} @inverseStrings;} #按照最後一位是EOF的結果輸出,通常用$標誌字元串結尾,這裡使用已知結果作為展示print $inverseStrings[2]."\n";
算法最佳化
@inputString=qw(a b r a c a d a b r a);@sortIndex = sort {join("",@inputString[$a..($a+$#inputString)]) cmp join("",@inputString[$b..($b+$#inputString)]) } 0..$#inputString;print join(",",@sortIndex)."\n".join("",@inputString[map($_-1,@sortIndex)])."\n";