I'm trying to implement Block Sorting. In the paper Burrows Wheeler Transform, Block Sorting requires appending a k ammount of EOF characters to the original string S, where EOF does not appear in S.
But since I will be processing binary files, there could be any possible combination of bits, so I cant choose in advance a single EOF character that I as开发者_StackOverflow中文版sure it won't be in S.
How do I solve this?
Since that EOF character is used to sort suffixes in a step, I've read that you can sort a suffix tree without the need of that EOF character. Should I use a suffix tree instead?
You can create a "virtual" EOF by using the length of your data containers OR by using a separate EOF table that tracks character positions of your virtual EOF characters.
[update for another idea]... Another option, chose a EOF char, call it 0x00 and a escape char, call it 0xFF. Scan your input and for all 0xFF and 0x00 perpend them with 0xFF. That is, simply escape them. Do the reverse when writing the data back
精彩评论