I am trying to hash out a strategy for implementing a very simple site search in ASP.NET MVC and SQL Server 2008.
Really, all I want to to do is to be able to rank search results based on the number of times a search word or phrase is found in the webpage. I attempted to do this using LINQtoSQL but I ran into a lot o开发者_如何学编程f issues where some LINQ commands don't have a SQL equivalent. This was a few months ago so I don't remember specific errors.
So, I'm just trying to figure out an approach. What I'm thinking is this:
Approach 1: I should probably write a program to spider the site and somehow index the site's text - I'm thinking I should save information in a table like:
- ID
- Word
- URL
I could then query that and rank based on how many time that word is associated with a certain URL. But then I realized that this technique would completely breakdown if a user was searching for a phrase.
Approach 2: Then I was toying with the idea of using SPROCs to create a temporary table with a record for each URL that would somehow parse the text and determine how many times the phrase or word appeared in each individual URL. and then we would return the results from the temp table. I am thinking the temporary table would look something like this:
- ID
- SearchText
- URL
- Frequency
And then select * from temptable order by Frequency asc
or something like that.
However, I'm not sure if SPROCs are capable of parsing text like that, or if simultanious searching would be possible.
I am looking for something very lightweight. I'm not really interested in using Lucene or Solr or anything like that because the learning curve seems very steep and those applications' features are far away more than what I need.
Any thoughts on how I should approach this problem? Is there a different approach that I should consider?
For your phrase versus word issue, why not use wildcards and LIKE operators?
Select Count(*) from temptable where SearchPhrase LIKE '%Apple%'
Maybe not exactly what you want, but Windows SharePoint Search Server isn't all that bad.
Yes, it has the word 'SharePoint' in it, which would usually make me grab the scissors on my desk and start stabbing my eyes out, but having to use it once in a pinch, I was actually somewhat impressed with it.
It's free, so maybe worth a couple of hours playing with it for comparison to writing something custom.
After a little poking around, it looks like SQL Server 2008's Full Text Search is what I would want to use. I'm not 100% sure yet, but it looks promising.
http://msdn.microsoft.com/en-us/library/ms142547.aspx
If you're considering Full Text Search, then also check out lucene.net.
I used FTS for one project, and later used lucene.net for another, and although the requirements were different from yours, I'd never go back to FTS now.
精彩评论