开发者

Process boolean phrase with Regex

开发者 https://www.devze.com 2023-03-16 12:43 出处:网络
I am processing user input on a search page. If the user selects an \'All Words\' type search, then I remove any boolean search operators from the search text and stick \' AND \' between each real wor

I am processing user input on a search page. If the user selects an 'All Words' type search, then I remove any boolean search operators from the search text and stick ' AND ' between each real word. Pretty simple in most cases. However, I can't figure out how to remove two boolean operators in a row.

Here is my code:

// create the regex
private static Regex _cleaner =
     new Regex("(\\s+(and|or|not|near)\\s+)|\"", 
          RegexOptions.Compiled | RegexOptions.IgnoreCase);

// call the regex
_cleaner.Replace(searchText, " ")

The problem occurs when a user enters a search string like coffee and not tea. The regex will remove the 'and', but not the 'not'. The resulting string is 'coffeenot tea' - what I want is 'coffee tea'.

The white space is required in the regex so I don't remove 'and', 'or', etc when embedded in real words (like 'band' or 'corps').

I have temporarily resolved this by calling the clean method twice, w开发者_C百科hich will remove two operators in a row (which is probably all I would ever need). But it is not very elegant, is it? I would really like to do it right. I feel like I am missing something simple...


Try adding word boundaries:

"\\b(and|or|not|near)\\b|\""


Change your regex to the following:

private static Regex _cleaner = new Regex("(\\s+(and|or|not|near)\\s+)*|\"", RegexOptions.Compiled | RegexOptions.IgnoreCase);


Wouldn't just adding a + fix the problem?

private static Regex _cleaner = 
    new Regex("(\\s+(and|or|not|near)\\s+)+|\"", 
              RegexOptions.Compiled | RegexOptions.IgnoreCase);

// call the regex
_cleaner.Replace(searchText, " ")


Your regex is not matching because you require whitespace on each side of your term, but since it's not _and__not_, you only match _and_.

Consider this regex:

@"(?:and|or|not|near)\s+|"""
0

精彩评论

暂无评论...
验证码 换一张
取 消