I need to mark up a string with identifiers indicating the start and end of a substring that has passed a test.
Assume I had the string "The quick brown fox jumps ove开发者_高级运维r the lazy dog" and I wanted to markup the string with a tag for every word starting with the characters 'b' and 'o'. The final string would look like "The quick <tag>brown</tag>
fox jumps <tag>over</tag>
the lazy dog".
Using a combination of regular expressions and LINQ I have the correct logic to accomplish what I want but my performance is not what I want it to be because I am using String.Insert to insert the tags. Our strings can be very long (>200k) and the number of substrings to tag can be close to a hundred. Below is the code I am using to insert the tags. Given I know the start and length of each substring how can I update the string 'input' faster?
.ForEach<Match>(m => {
input = input.Insert(m.Index + m.Length, "</tag>");
input = input.Insert(m.Index, "<tag>");
});
You should use a StringBuilder
.
For optimal performance, set the StringBuilder
's capacity before doing anything, then append chunks of the original string between tags.
Alternatively, move your logic to a MatchEvaluator
lambda expression and call RegeEx.Replace
.
Try this:
Regex
Regex.Replace("The quick brown fox jumps over the lazy dog", @"(^|\s)([bo]\w*)", "$1<tag>$2</tag>");
Results
The quick <tag>brown</tag> fox jumps <tag>over</tag> the lazy dog
Regular expressions should provide with a fairly quick replacement. Whether or not this method is the best depends on the length of the string and how much work is involved to actually match one of your "words."
You can use RegEx
directly - it has a Replace
method which should allow you to insert the tags around your matches.
I can't vouch for the speed of this, however. You can compile the RegEx
, which should improve performance, but even with this you will need to test with your specific circumstances.
String manipulation is notoriously slow. Use a System.Text.StringBuilder instead.
It also has an Insert method.
Also, MSDN has a nice article on improving Improving String Handling Performance that compares the StringBuilder to normal String operations. It's worth a read if you've never ran across this topic before.
精彩评论