开发者

Proper mysql datastructure for a fulltext search

开发者 https://www.devze.com 2022-12-18 04:51 出处:网络
Hoping someone can provide some mysql advice... I have 2 tables that look like this: searchTagsTable ID tag

Hoping someone can provide some mysql advice...

I have 2 tables that look like this:

searchTagsTable
    ID
    tag

dataTable
    ID
    title
    desc
    tagID

So the column "tagID" in "dataTable" is a comma-delimmited string of ids pointing to searchTagsTable.

I'd like to use mysql's built in fulltext search capabilities to search title, description, and tags.

I'm wondering: What is considered the "best" solution in a situation like this?

Should I leave the datastructure as it is? If so, how should I structure the sql to allow fulltext search of all three columns - title, desc and tag?

Or would it be preferable just to get rid of keywordsTable and have the actual tags comma delimmited in a "tags" column in dataTable?

Thanks in advance for your he开发者_C百科lp.

Travis


Should I leave the datastructure as it is? If so, how should I structure the sql to allow fulltext search of all three columns - title, desc and tag?

That wouldn't be possible. Indexes can only span columns of a single table.

Or would it be preferable just to get rid of keywordsTable and have the actual tags comma delimmited in a "tags" column in dataTable?

That would certainly be the simplest solution. You are currently not really getting any benefit from giving tags their own identity, since you can't use foreign keys and indexing on them.

However, MySQL's FULLTEXT indexing is not ideal for a tag system:

  • by default, it won't index words shorter than four letters;
  • by default, it has many (many) stopwords it won't index that you might want to use for tags;
  • it'll be less efficient than a normal index;
  • it only works in MyISAM, which is in all other respects a much worse database engine than InnoDB. Except where you really have to, you shouldn't really be using MyISAM today.

You can fix the minimum word length and stopwords by altering the MySQL configuration. This will make your indexes much bigger though. This may be an acceptable solution if you control the database everywhere your app will be deployed, and if you are only using tags as ‘extra words’ in a fulltext search-fodder, rather than a full categorisation system.

Otherwise... comma-delimited anything in a database is suspect IMO. It's usually better to use a one-to-many join table to express the idea that one entity has many tags. Then you can use a simple index to aid lookups instead of the limited FULLTEXT indexing scheme, which will be faster, more reliable, and allows you to use InnoDB and foreign keys. eg.:

dataTable
    ID       (primary key)
    title
    desc

dataTags
    ID       (foreign key -> dataTable)
    tagName  (index this column)

(You could still have the tagID->tagName mapping as well on top of this if you want the tags to have independent identity. I'm not sure if it's doing anything useful in your case though.)

If you need to get a comma-separated list from a one-to-many relation like the above, you can do it using the MySQL-specific GROUP_CONCAT function.

SELECT dataTable.*, GROUP_CONCAT(dataTags.tagName)
FROM dataTable
JOIN dataTags ON dataTags.ID=dataTable.ID
GROUP BY dataTable.ID;

That leaves the fulltext indexing of the title and desc. Which unfortunately does need you to put them in a MyISAM table.

A common alternative to this which you might also consider would be to keep the ‘canonical’ copies in the main table (potentially in an ACID-safe InnoDB table), and store a separate copy of all the title, desc and tags together in a FULLTEXT-indexed MyISAM table purely for fulltext search bait. This does mean you have to do an extra update each time you change the primary data (though if you fail or have to rollback a transaction, at least it's only relatively-unimportant search bait that's now wrong), but the advantage is you can apply extra processing to it, such as stemming and punctuation handling, which MySQL's FULLTEXT indexer doesn't do itself.

0

精彩评论

暂无评论...
验证码 换一张
取 消