开发者

mysql - speedup regex

开发者 https://www.devze.com 2022-12-22 11:41 出处:网络
I have a table: +--------+------------------+------+-----+---------+----------------+ Field| Type| Null | Key | Default | Extra|

I have a table:

+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| idurl  | int(11)          | NO   | PRI | NULL    | auto_increment |
| idsite | int(10) unsigned | NO   | MUL | NULL    |                |
| url    | varchar(2048)    | NO   |     | NULL    |                |
+--------+------------------+------+-----+---------+----------------+

the select statement is:

SELECT idurl,
       url
开发者_C百科  FROM URL
 WHERE idsite = 34
   AND url REGEXP '^https\\://www\\.domain\\.com/checkout/step_one\\.php.*'

The query needs 5 seconds on a table with 1000000 rows. Can I achieve a speedup with indexes or something else?


Looks like a LIKE might suffice. LIKE uses % as a wildcard for any number of characters.

AND url LIKE 'https://www.domain.com/checkout/step_one.php%'

LIKE does not require a starting anchor like ^. Only the second example would match:

'Sherlock and Watson' LIKE 'and%'
'Sherlock and Watson' LIKE '%and%'
'Sherlock and Watson' LIKE '%and'


Any index involving the URL column is likely not going to help you because the database engine still has to walk through the contents of that column to check whether the contents match the regex.

What may help you, depending on how many unique values of IDSITE you have, is to either place an index on IDSITE or do an initial select WHERE IDSITE = 34, and use that subquery as the target of your query on URL.

Something like:

select
    idurl,
    url
from
    (select idurl, url from uwe_url where idsite = 34)
where
    url REGEXP '^https\\://www\\.domain\\.com/checkout/step_one\\.php.*'

But I'm pretty sure you can't get around the text parsing for the URL column match.


You could use the LIKE operator instead of a regular expression. But as your regular expression is simple, this may or may not improve performance.

You could split out the domain into a separate field, index it and use that in your where clause. If the URLs that you store are from many different domains then such an index could improve performance considerably.


Looks like you don't really need that REGEXP.

This clause should suffice:

AND eu.url LIKE 'https://www.domain.com/checkout/step_one.php%'
0

精彩评论

暂无评论...
验证码 换一张
取 消