开发者

Storing interesting words from a sentence

开发者 https://www.devze.com 2023-02-23 13:18 出处:网络
I have a sentence like the cat sat on the mat stored as a single sql field.I want to periodically search for keywords which are not not in a stop list, in this case cat sat matWhat\'s the best way to

I have a sentence like the cat sat on the mat stored as a single sql field. I want to periodically search for keywords which are not not in a stop list, in this case cat sat mat What's the best way to store them in an SQL table for quick searching?

As far as I can see it I see the following options

  1. Up to [n] additional column开发者_高级运维s per row, one for each word.
  2. Store all of the interesting words in a single, comma separated field.
  3. A new table, linked to the first with either of the above options.
  4. Do nothing and search for a match each time I have a new word to search on.

Which is best practice and which is fastest for searching for word matches? I'm using sqlite in python if that makes a difference.


I would suggest giving your sentences a key, likely IDENTITY. I would then create a second table linking to your sentence table, with a row for each interesting word.

If you'd like to search for say, words starting with ca- if you stored these words in a comma delimited you'd have to wildcard the start and end, whereas if they are each in a separate row you can bypass the beginning wildcard.

Also, assuming you find a match, in a comma separated list you'd have to parse out which word is actually a hit. With the second table you simply return the word itself. Not to mention the fact that storing multiple values in one field a major no-no in a relational database.


Your best bet is probably to do full text searching.

These questions FULL-TEXT Search in SQLite and SQLite full text search catalog will hopefully get you going in the right direction.


I do something similar with SQLite too. In my experience it's not as fast as other db's in this type of situation so it pays to make your schema as simple as possible.

  • Up to [n] additional columns per row, one for each word.
  • Store all of the interesting words in a single, comma separated field.
  • A new table, linked to the first with either of the above options.
  • Do nothing and search for a match each time I have a new word to search on.

Of your 4 options, 2) and 4) may be too slow if you're looking to scale and matching using LIKE. Matching using full text is faster though, so that's worth looking into. 1) looks to be bad database design, what if there's more words than columns ? And if there's less, it's just wasted space. 3) is best IMO, if you make the words the primary key in their own table the searching speed should be acceptably fast.

0

精彩评论

暂无评论...
验证码 换一张
取 消