开发者

How can I sort an SQLite query ignoring articles ("the", "a", etc.)?

开发者 https://www.devze.com 2023-01-16 01:30 出处:网络
I\'m using C# to display a list of movie titles that I am calling from an SQLite database. Currently, I\'m using a custom ListBox class that has a function to sort the text stripping the word \'The\'

I'm using C# to display a list of movie titles that I am calling from an SQLite database. Currently, I'm using a custom ListBox class that has a function to sort the text stripping the word 'The' from the beginning of every item. However, it doesn't exactly seem to be the simplest way to do it, since it calls from the SQLite database and then sorts. I'd prefer to cut it down to just one step, hopefully sorting straight from the database in my "SELECT" query.

I've done some searching on this, and have found some suggestions, including creating an extra sort-by column in the database. While this is certainly a possibility, I'm wondering if there's any simpler options that don't require inserting almost identical duplicate information (especially if the database becomes larger). I'm pretty new to SQLi开发者_开发知识库te, but I've read something about creating a collate function that can be used to create custom ordering. However, I'm not sure if this is appropriate use for it and can't seem to find any help with implementing it in C#.

Was hoping someone might be able to share some guidance. If an extra sorting column is the best way to go, then that is what I shall do.


To avoid inserting duplicate data, what about having two columns: TITLE_PREFIX (usually empty, but sometimes contains "The ", or "A "; no index on this column) and TITLE (contains the title without "The " or "A "; this is the column you create the index on). To display the data, you have to combine TITLE_PREFIX and TITLE. But you just search on TITLE.


Here is the solution:

ORDER BY (CASE 
    WHEN sortTitle LIKE 'the %' THEN substr(sortTitle,5) 
    WHEN sortTitle LIKE 'a %' THEN substr(sortTitle,3) 
    WHEN sortTitle LIKE 'an %' THEN substr(sortTitle,4) 
    ELSE sortTitle END)


You could store each title in 2 parts: title and prefix.

With SQLite you can combine 2 string values via the || operator also known as the concatenate operator.

Here's an example:

SELECT prefix || ' ' || title FROM movies ORDER BY title

You can also use ltrim in case prefix is empty, so you don't have a space at the front:

SELECT ltrim(prefix || ' ' || title) FROM movies ORDER BY title

Another alternative is to store the prefix at the end of the title. For example at a lot of movie stores you will see something like:

Three Musketeers, The


Within C# Code

If you wanted to do this within C#, use LINQ to do the ordering for you. I've posted a full sample on PasteBin. This will allow you to:

  • avoid duplicating data in your database
  • take advantage of DB indexes as you normally would, no matter which RDBMS
  • put in noise words in a config file, thereby reducing downtime/rebuild/redeploy when modifying the list
  • ensure a solution is more readable in your client code
DropDownList1.DataSource = myBooks.OrderBy(n => ReplaceNoise(n.Title))

public string ReplaceNoise(string input)
{
     string[] noise = new string[] { "the", "an", "a" };

     //surely this could be LINQ'd 
     foreach (string n in noise)
     {
         if (input.ToLower().StartsWith(n))
         {
             return input.Substring(n.Length).Trim();
         }
     }
     return input;
}

Within your SQLite statement

How about simply replacing the noise words with blanks in the order by? It's an ugly first step, but strongly consider a new column to store this value for sorting purposes.

ORDER BY REPLACE(REPLACE([title],'the',''), 'a', '')

Admittedly, this gets ugly when you end up with this:

REPLACE(REPLACE(REPLACE(REPLACE([title],'The ',''),'a',''),'of',''),'by','')


You could try building a table that supports full-text searching (using the FTS module) on the title. Then you'll be able to do fast searches on any words in the title without requiring lots of extra work on your part. For example, a user query of good bad ugly might produce “The Good, the Bad and the Ugly” as one of its first results. The extra cost of all this is about a quarter of the length of the text itself in general, but might be more for your dataset, as titles aren't full english text. You also need to spend the time building those extra indices – you don't want to build them on your main dataset on a live system (obviously) – but that shouldn't be too big a problem.


Create a virtual column (result of a function that can be implemented in C#) and sort on this virtual column. The function could move "The" to the end as in "Three Musketeers, The" or discard "The", whatever you want it to do.

0

精彩评论

暂无评论...
验证码 换一张
取 消