I have some search queries like so:
George AND NOT Washington OR Abraham
Dog OR cat AND NOT Wolf
for these searches I would want to get back results for George or Abraham but not Washington, etc.
basically I want to take the string and be able to submit a contextual search to my full-text catalog stored procedure search.
I am assuming I should use Regex but I am very unfamiliar with Regex in C#.
I found this article: http://support.microsoft.com/kb/246800 which I think is what I need to do, but I was hoping that I could have some help with the implementation.
Assuming you take a string as parameter and would like to return a string:
string input = 'George Washington AND NOT Martha OR Dog';
private string interpretSearchQuery(input)
{
// HALP!
开发者_开发技巧 /* replace ' AND ' | ' AND NOT ' with
* " AND "
* " AND NOT "
*
* replace ' OR ' | ' OR NOT ' with
* " OR "
* " OR NOT "
*
* add " to beginning of string and " to end of string
*/
return '"George Washington" AND NOT "Martha" OR "Dog"';
}
I would parse your string using Postfix notation (or Polish notation).
**Postfix algorithm**
The algorithm for evaluating any postfix expression is fairly straightforward:
While there are input tokens left
Read the next token from input.
If the token is a value
Push it onto the stack.
Otherwise, the token is an operator (operator here includes both operators, and functions).
It is known a priori that the operator takes n arguments.
If there are fewer than n values on the stack
(Error) The user has not input sufficient values in the expression.
Else, Pop the top n values from the stack.
Evaluate the operator, with the values as arguments.
Push the returned results, if any, back onto the stack.
If there is only one value in the stack
That value is the result of the calculation.
If there are more values in the stack
(Error) The user input has too many values.
So taking your input string:
'George Washington AND NOT Martha OR Dog'
And simplifing it to:
A = George
B = Washington
C = Martha
D = Dog
& = AND
! = NOT
| = OR
We would get the postfix notation of
AB&C!D|
Which means:
- Push value A (George)
- Push value B (Washington)
- AND by popping previous two values and pushing the result (George AND Washington)
- Push value C (Martha)
- NOT by popping previous two values and pushing the result (George AND Washington) NOT (Martha)
- Push value D (Dog)
- OR by popping previous two values and pushing the result ((George AND Washington) NOT (Martha)) OR (Dog)
This might get you started... I would refactor the crap out of this to make it more robust.
string input = "George Washington AND NOT Martha OR Dog";
private string interpretSearchQuery(string input)
{
StringBuilder builder = new StringBuilder();
var tokens = input.Split( ' ' );
bool quoteOpen = false;
foreach( string token in tokens )
{
if( !quoteOpen && !IsSpecial( token ) )
{
builder.AppendFormat( " \"{0}", token );
quoteOpen = true;
}
else if( quoteOpen && IsSpecial( token ))
{
builder.AppendFormat( "\" {0}", token );
quoteOpen = false;
}
else
{
builder.AppendFormat( " {0}", token );
}
}
if( quoteOpen )
{
builder.Append( "\"" );
}
return "'" + builder.ToString().Trim() + "'";
}
public static bool IsSpecial( string token )
{
return string.Compare( token, "AND", true ) == 0 ||
string.Compare( token, "OR", true ) == 0 ||
string.Compare( token, "NOT", true ) == 0;
}
Here is a solution that I came up with. The only issue is that malformed search queries will not parse properly and fail:
private string interpretSearchTerm(string searchTerm)
{
string term = "";
/* replace ' AND ' | ' AND NOT ' with
* " AND "
* " AND NOT "
*
* replace ' OR ' | ' OR NOT ' with
* " OR "
* " OR NOT "
*
* add " to beginning of string and " to end of string
*/
if (searchTerm.IndexOf("AND") > -1
|| searchTerm.IndexOf("OR") > -1
|| searchTerm.IndexOf("AND NOT") > -1
|| searchTerm.IndexOf("OR NOT") > -1)
{
term = searchTerm.Replace(" AND NOT ", "\"AND NOT\"")
.Replace(" AND ", "\"AND\"")
.Replace(" OR NOT", "\"OR NOT\"")
.Replace(" OR ", "\"OR\"");
term = "\"" + term + "\"";
return term;
}
else if (searchTerm.IndexOf("\"") > -1) return searchTerm;
else return "\"" + searchTerm + "\"";
}
I will now go about implementing the postfix algorithm that GalacticJello suggested. I will post it up when I get it working.
精彩评论