开发者

most efficient algorithm for word partitioning?

开发者 https://www.devze.com 2023-02-18 21:40 出处:网络
I\'ve been looking for an ef开发者_高级运维ficient word partitioning algorithm but without much success. For example, given the word hello I want to obtain all the possible partitions of that word: {h

I've been looking for an ef开发者_高级运维ficient word partitioning algorithm but without much success. For example, given the word hello I want to obtain all the possible partitions of that word: {h,e,l,l,o},{h,e,l,lo},{h,e,llo},...,{hello}. Everything I found talks about word splitting which isn't what I mean.

Thank you in advance!


You show some examples, where we can concentrate on the commas. Either there is a comma or not.

 Word        Commas
{h,e,l,l,o}  1111
{h,e,l,l o}  1110
{h,e,l l o}  1100
...
{h e l l o}  0000

So it seems obvious, that at 4 positions, there may be a comma or not, independently from each other. You need 4 Bit to encode the partitions, which is 2^4 possibilities, I guess that is 16.

So you can form a loop:

for (int i = 0; i < 15; ++i)
    bitsplit ("hello", i);

and iterate through your word while iterating over the bits of the binary representation of i. For example for 11, you have the bits: 8+2+1 = 1011 set. That means {h,el,l,o}.


The problem is NP complete and needs to be solved by backtracking.

The idea is at each level, you decide whether this character belongs to current partition or should go to a new one. Do this recursively and every time you reach end of the word, you have one partition.


Most likey you want to construct a suffix-trie.

0

精彩评论

暂无评论...
验证码 换一张
取 消