开发者

Order by relevance using boolean fulltext search

开发者 https://www.devze.com 2023-01-11 14:18 出处:网络
I am having problems using MySQL\'s fulltext search and returning the results in order by relevance. I have to use boolean full text search, which does not return results in order by relevance. Howeve

I am having problems using MySQL's fulltext search and returning the results in order by relevance. I have to use boolean full text search, which does not return results in order by relevance. However, I need these results in order of relevance. If I attempt to add an order by clause on the en开发者_C百科d of the query, the query results to using filesort, which makes the query incredibly slow (over 1000 times slower than without). I am not sure what can be done.

Here is my query:

SELECT g.id, MATCH(g.searchable_name) AGAINST ('test*' IN BOOLEAN MODE) AS relevance
 FROM games g
 WHERE MATCH(g.searchable_name) AGAINST ('test*' IN BOOLEAN MODE)
 ORDER BY relevance DESC
 LIMIT 0, 31

Thanks in advance.


At first you should consider that IN BOOLEAN MODE does not return a score, instead it returns binary (1 = found, 0 = not found):

mysql>SELECT
        topic_id,
        MATCH(topic_text) AGAINST('+tuning' IN BOOLEAN MODE) AS binary
    FROM
        topics_search
    LIMIT 10
+----------+----------+
| topic_id | binary   |
+----------+----------+
| 2        | 0        |
| 4        | 0        |
| 5        | 0        |
| 6        | 1        |
| 7        | 0        |
| 8        | 0        |
| 11       | 0        |
| 12       | 0        |
| 13       | 0        |
| 14       | 0        |
+----------+----------+
10 rows in set (9 ms)

Only the natural fulltext search is able to generate a score (the IN NATURAL LANGUAGE MODE modifier is not given as it is the default mode):

mysql>SELECT SQL_NO_CACHE
        topic_id,
        MATCH(topic_text) AGAINST('tuning') AS score
    FROM
        topics_search
    WHERE
        host_id = 1
    ORDER BY
        score DESC
    LIMIT 10
+--------------------+--------------------+
| topic_id           | score              |
+--------------------+--------------------+
| 153257             | 5.161948204040527  |
| 17925              | 4.781417369842529  |
| 66459              | 4.648380279541016  |
| 373176             | 4.570812702178955  |
| 117173             | 4.55166482925415   |
| 167016             | 4.462575912475586  |
| 183286             | 4.4519267082214355 |
| 366132             | 4.348565101623535  |
| 95502              | 4.293642520904541  |
| 29615              | 4.178250789642334  |
+--------------------+--------------------+
10 rows in set (478 ms)

Side note: Incredible slow because score can not have an index.

So you need the natural search to sort by score. But the natural search does not support operators like the * wildcard. And now we have our dilemma because its not useful to search for tunin* in BOOLEAN and do a parallel search in NATURAL using the key tunin as no text will include that partial word.

mysql>SELECT SQL_NO_CACHE
        topic_id,
        MATCH(topic_text) AGAINST('tunin') AS score
    FROM
        topics_search
    WHERE
        MATCH(topic_text) AGAINST('tunin*' IN BOOLEAN MODE)
    AND
        MATCH(topic_text) AGAINST('tunin') > 0
    ORDER BY
        score DESC
    LIMIT 10
Empty set (170 ms)

Conclusion
It is not possible to search with the wildcard operator and sort the results by relevance.

Except you find a way to obtain all words in your fulltext index that were hit by your wildcard search and use them in a second query or you build your own score based on LIKE with counting the amount of words inside of one resulting row. Interesting enough to open a new question.

0

精彩评论

暂无评论...
验证码 换一张
取 消