I am processing user input on a search page. If the user selects an 'All Words' type search, then I remove any boolean search operators from the search text and stick ' AND '
between each real word. Pretty simple in most cases. However, I can't figure out how to remove two boolean operators in a row.
Here is my code:
// create the regex
private static Regex _cleaner =
new Regex("(\\s+(and|or|not|near)\\s+)|\"",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
// call the regex
_cleaner.Replace(searchText, " ")
The problem occurs when a user enters a search string like coffee and not tea
. The regex will remove the 'and', but not the 'not'. The resulting string is 'coffeenot tea' - what I want is 'coffee tea'.
The white space is required in the regex so I don't remove 'and', 'or', etc when embedded in real words (like 'band' or 'corps').
I have temporarily resolved this by calling the clean method twice, w开发者_C百科hich will remove two operators in a row (which is probably all I would ever need). But it is not very elegant, is it? I would really like to do it right. I feel like I am missing something simple...
Try adding word boundaries:
"\\b(and|or|not|near)\\b|\""
Change your regex to the following:
private static Regex _cleaner = new Regex("(\\s+(and|or|not|near)\\s+)*|\"", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Wouldn't just adding a +
fix the problem?
private static Regex _cleaner =
new Regex("(\\s+(and|or|not|near)\\s+)+|\"",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
// call the regex
_cleaner.Replace(searchText, " ")
Your regex is not matching because you require whitespace on each side of your term, but since it's not _and__not_
, you only match _and_
.
Consider this regex:
@"(?:and|or|not|near)\s+|"""
精彩评论