开发者

Tag based searching with MySQL

开发者 https://www.devze.com 2023-03-21 18:14 出处:网络
I want to write a tag based search engine in MySQL, but I don\'t really know how to get to a pleasant result.

I want to write a tag based search engine in MySQL, but I don't really know how to get to a pleasant result.

I used LIKE, but as I stored over 18k keywords in the database, it's pretty slow.

What I got is a table like this:

id(int, primary key) article_cloud(text) keyword(varchar(40), FUL开发者_运维知识库LTEXT INDEX)

So I store one keyword per row and save all the refering article numbers in article_cloud.

I tried the MATCH() AGAINST() stuff, which works fine as long as the user types in the whole keyword. But I also want a suggest search, so that there are relevant articles popping up, while the user is typing. So I still need a similar statement to LIKE, but faster. And I have no idea what I could do.

Maybe this is the wrong concept of tag based searching. If you know a better one, please let me know. I'm fighting with this for days and can't figure out a satisfying solution. Thanks for reading :)


MATCH() AGAINST() / FULLTEXT searching is a quick fix to a problem - but your schema makes no sense at all - surely there are multiple keywords in each article? And using a fulltext index on a column which only contains a single word is rather dumb.

and save all the refering article numbers in article_cloud

No! storing multiple values in a single column is VERY bad practice. When those values are keys to another table, it's a mortal sin!

It looks like you've got a long journey ahead of you to create something which will work efficiently; the quickest route to the goal is probably to use Google or Yahoo's indexing services on your own data. But if you want to fix it yourself....

See this answer on creating a search engine - the keywords should be in a separate table with a N:1 relationship to your articles, primary key on keyword and article id, e.g.

CREATE TABLE article (
    id INTEGER NOT NULL autoincrement,
    modified TIMESTAMP,
    content TEXT
    ...
    PRIMARY KEY (id)
);

CREATE TABLE keyword (
    word VARCHAR(20),
    article_id INTEGER, /* references article.id
    relevance FLOAT DEFAULT 0.5, /* allow users to record relevance of keyword to article*/
    PRIMARY KEY (word, article_id)
);

CREATE TEMPORARY TABLE search (
    word VARCHAR(20),
    PRIMARY KEY (word)
);

Then split the words entered by the user, convert them to a consistent case (same as used for populating the keyword table) and populate the search table, then find matches using....

SELECT article.id, SUM(keyword.relevance)
FROM article, keyword, search
WHERE article.id=keyword.article_id
AND keyword.word=search.word
GROUP BY article_id
ORDER BY SUM(keyword.relevance) DESC
LIMIT 0,3

It'll be a lot more efficient if you can maintain a list of words or rules about words NOT to use as keywords (e.g. ignore any words of 3 chars or less in mixed or lower case will omit stuff like 'a', 'to', 'was', 'and', 'He'...).


Have a look at Sphinx and Lucene


I tried the MATCH() AGAINST() stuff, which works fine as long as the user types in the whole keyword.

what do you think that FULLTEXT means?

I had 40 000 entries in my table, using no indexes (local use) and it searched for maximally 0.1 sec with LIKE '%SOMETHING%'

You may LIMIT your queries output

0

精彩评论

暂无评论...
验证码 换一张
取 消