开发者

String Find/Replace Algorithm

开发者 https://www.devze.com 2022-12-10 17:02 出处:网络
I would like to be able to search a string for various words, when I find one, i want to split the string at that point into 3 parts (left, match, right), the matched text would be excluded, and the p

I would like to be able to search a string for various words, when I find one, i want to split the string at that point into 3 parts (left, match, right), the matched text would be excluded, and the process would continue with the new string left+right.

Now, once i have all my matches done, i need to reverse the process by reinserting the matched words (or a replacement for them) at the point they were removed. I have never really found what i wanted in any of my searches, so I thought I would ask for input here on SO.

Please let me know if this question needs further description.

BTW - at the moment, i have a very poor algorithm that replaces matched text with a unique string token, and then replaces the tokens with the replacement text for the appropriate match after all the matches have been done.

This is the goal:

one two three four five six 

match "three" replace with foo (remember we found three, and where we found it)

one two four five six
       |
     three

match "two four" and prevent it from being matched by anything (edited for clarity)

one five six
   |
 two four 
       |
     three

at this point, you cannot match for example "one two"

all the matches have been found, now put their replacements back in (in reverse order)

one two four开发者_如何学Go five six
       |
     three


one two foo four five six

What's the point? Preventing one match's replacement text from being matched by another pattern. (all the patterns are run at the same time and in the same order for every string that is processed)

I'm not sure the language matters, but I'm using Lua in this case.

I'll try rephrasing, I have a list of patterns i want to find in a given string, if I find one, I want to remove that part of the string so it isnt matched by anything else, but I want to keep track of where i found it so I can insert the replacement text there once I am done trying to match my list of patterns

Here's a related question:

Shell script - search and replace text in multiple files using a list of strings


Your algorithm description is unclear. There's no exact rule where the extracted tokens should be re-inserted.

Here's an example:

  1. Find 'three' in 'one two three four five six'
  2. Choose one of these two to get 'foo bar' as result:

    a. replace 'one two' with 'foo' and 'four five six' with 'bar'

    b. replace 'one two four five six' with 'foo bar'

  3. Insert 'three' back in the step 2 resulting string 'foo bar'

At step 3 does 'three' goes before 'bar' or after it?

Once you've come up with clear rules for reinserting, you can easily implement the algorithm as a recursive method or as an iterative method with a replacements stack.


Given the structure of the problem, I'd probably try an algorithm based on a binary tree.


pseudocode:

for( String snippet in snippets )
{
    int location = indexOf(snippet,inputData);
    if( location != -1)
    {
        // store replacement text for a found snippet on a stack along with the
        // location where it was found
        lengthChange = getReplacementFor(snippet).length - snippet.length;
        for each replacement in foundStack
        {
            // IF the location part of the pair is greater than the location just found
            //Increment the location part of the pair by the lengthChange to account
            // for the fact that when you replace a string with a new one the location
            // of all subsequent strings will be shifted 
        }

        //remove snippet
        inputData.replace(snippet, "");
    }
}

for( pair in foundStack )
{
    inputData.insert( pair.text, pair.location);
}

This is basically just doing exactly as you said in your problem description. Step through the algorithm, putting everything on a stack with the location it was found at. You use a stack so when you reinsert in the second half, it happens in reverse order so that the stored "location" applies to the current state of the inputString.

Edited with a potential fix for commenter's criticism. Does the commented for block within the first one account for your criticisms, or is it still buggy in certain scenarios?

0

精彩评论

暂无评论...
验证码 换一张
取 消