开发者

Performance wise String Matching

开发者 https://www.devze.com 2022-12-15 08:47 出处:网络
I\'ve a generic DB query function that runs the following checks every time an SQL query is issued: if (preg_match(\'~^(?:UPDATE|DELETE)~i\', $query) === 1)

I've a generic DB query function that runs the following checks every time an SQL query is issued:

  1. if (preg_match('~^(?:UPDATE|DELETE)~i', $query) === 1)
  2. if (preg_match('~^(?:UPDATE|DELETE)~iS', $query) === 1)
  3. if ((stripos($query, 'UPDATE') === 0) || (stripos($query, 'DELETE') === 0))

I know that a simple strpos() call is way faster than doing a preg_match(), however since I'm calling strIpos() twice I'm really not sure which one should perform better.

The S pattern modifier in the second option also brings some confusion in my head, from the manual:

When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.

In this case speed is not critical (otherwise I wouldn't be using this generic query function)开发者_JAVA技巧 but, I still would like to make it run as fast as possible while maintaining it's simplicity.

Which of the above options should I choose?


EDIT: I've run a simple benchmark and still I can't decide which method works better.

Here are the results for 10,000 tries (total time taken, in seconds):

Array
(
    [match] => Array
        (
            [stripos] => 0.0965
            [preg_match] => 0.2445
            [preg_match?] => 0.1227
            [preg_match?S] => 0.0863
        )

    [no-match] => Array
        (
            [stripos] => 0.1165
            [preg_match] => 0.0812
            [preg_match?] => 0.0809
            [preg_match?S] => 0.0829
        )
)

100,000 tries:

Array
(
    [match] => Array
        (
            [stripos] => 1.2049
            [preg_match] => 1.5079
            [preg_match?] => 1.5564
            [preg_match?S] => 1.5857
        )

    [no-match] => Array
        (
            [stripos] => 1.4833
            [preg_match] => 0.8853
            [preg_match?] => 0.8645
            [preg_match?S] => 0.8986
        )
)

1,000,000 tries:

Array
(
    [match] => Array
        (
            [stripos] => 9.4555
            [preg_match] => 8.7634
            [preg_match?] => 9.0834
            [preg_match?S] => 9.1629
        )

    [no-match] => Array
        (
            [stripos] => 13.4344
            [preg_match] => 9.6041
            [preg_match?] => 10.5849
            [preg_match?S] => 8.8814
        )
)

10,000,000 tries:

Array
(
    [match] => Array
        (
            [stripos] => 86.3218
            [preg_match] => 93.6755
            [preg_match?] => 92.0910
            [preg_match?S] => 105.4128
        )

    [no-match] => Array
        (
            [stripos] => 150.9792
            [preg_match] => 111.2088
            [preg_match?] => 100.7903
            [preg_match?S] => 88.1984
        )
)

As you can see the results vary a lot, this makes me wonder if this is the correct way to do a benchmark.


I probably wouldn't use any of those. I can't be sure without benchmarking, but I think a substr() would be a faster option than stripos, as it wouldn't scan the whole string. Assuming UPDATE and DELETE always occur at the start of a query, and even better, they're both exactly 6 characters long, so you could do it in a single substr():

$queryPrefix = strtoupper(substr($query,0,6));
if ($queryPrefix == 'UPDATE' || $queryPrefix == 'DELETE') {

If you needed to, you could add a trim() on there for any prefixed whitespace, but it probably isn't necessary.

If you're doing nested or sub-queries with UPDATE and DELETE, then obviously the above method wouldn't work, and I'd go with the stripos() route. If you can avoid regular expressions in favour of normal string functions, it will be faster and less complicated.


I went with the following regexes since they seem to be faster (on matched and non-matched text):

  1. if (preg_match('~^(?:INSERT|REPLACE)~i', $query) === 1)
  2. else if (preg_match('~^(?:UPDATE|DELETE)~i', $query) === 1)
  3. else if (preg_match('~^(?:SELECT|EXPLAIN)~i', $query) === 1)
0

精彩评论

暂无评论...
验证码 换一张
取 消