开发者

Why are certain types of prepared queries using PDO in PHP with MySQL slow?

开发者 https://www.devze.com 2023-01-28 15:32 出处:网络
When using SELECT * FROM table WHERE Id IN ( .. ) queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than doing the same query using mysqli with p

When using SELECT * FROM table WHERE Id IN ( .. ) queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than doing the same query using mysqli with prepared statements or PDO without using prepared statements.

More strange details:

  • More typical SELECT statements that don't have the WHERE Id IN( ..) clause perform fine even with 100K+ rows. SELECT * FROM table WHERE Id for example is fast.

  • The performance degradation occurs after prepare()/execute() is complete - it's entirely in PDOStatement::fetch() or PDOStatemen开发者_Go百科t::fetchAll(). The MySQL query execution time is tiny in all cases - this isn't a case of a MySQL optimization.

  • Splitting the 10K query into 10 queries with 1K keys is performant.

  • Using mysql, mysqli with prepared statements, or PDO without prepared statements is performant.

  • PDO w/prepared takes ~6 seconds on the example below, while the others take ~0.5s.

  • It gets worse in a non-linear fashion the more keys you have. Try 100K keys.

Sample code:

// $imageIds is an array with 10K keys
$keyCount = count($imageIds);
$keys = implode(', ', array_fill(0, $keyCount, '?'));
$query = "SELECT * FROM images WHERE ImageID IN ({$keys})";
$stmt = $dbh->prepare($query);
$stmt->execute($imageIds);
// until now, it's been fast.  fetch() is the slow part
while ($row = $stmt->fetch()) {
    $rows[] = $row;
}


Make sure you're telling PDO that the value is an integer not a string; if PDO puts it as a string, then MySQL will have to typecast the values for comparison. Depending on how it goes about this, it could cause major slowdowns by causing MySQL to avoid using an index.

I'm not completely sure about the behaviour here, but I have had this problem with Postgres a few years back...


There are some major mistakes on the sample code. So to be more precise.

// $imageIds is an array with 10K keys
$keyCount = count($imageIds);
$keys = implode(', ', array_fill(0, $keyCount, '?'));
$query = "SELECT * FROM images WHERE ImageID IN ({$keys})";

so far the above code will provide something like this...

SELECT * FROM images WHERE ImageID IN (?, ?, ?, ?, ?, ?,...?, ?, ?, ?)

There is no loop for binding... There should be a small loop in which you would bind all of the parameters being passed to MySQL. You go from prepare to execute. When correct binding is primarily what you want.

$stmt = $dbh->prepare($query);
$stmt->execute($imageIds);
// until now, it's been fast.  fetch() is the slow part
while ($row = $stmt->fetch()) {
    $rows[] = $row;
}

Now i have a simple logic question on this part of the question...

When using SELECT * FROM table WHERE Id IN ( .. ) queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than doing the same query using mysqli with prepared statements or PDO without using prepared statements.

Would it not be better if the same query was re-written so that you would not need to pass 10000 keys as parameters?

PDO and MySQLi do not have major differences in timings. Bad written queries do. Very complex Stored Procedures sometimes might turn out slow if they are not well optimized.

Check if another query could fetch the desired result. For example

Create a small table named test

create table `test` (
  `id` int(10) not null,
  `desc` varchar(255)
  ); 
insert into `test` (`id`,`desc`) values (1,'a'),(10,'a1'),(11,'a2'),(12,'a3'),(13,'a4'),(14,'a5'),(15,'a6'),(2,'ab'),(20,'ab1'),(21,'ab2'),(22,'ab3'),(23,'ab4'),(24,'ab5'),(25,'ab6');

Run those simple queries

select * from `test` where `id` rlike '^1$';
select * from `test` where `id` rlike '^1+';
select * from `test` where `id`=1;
select * from `test` where `id` rlike '^1.$';
select * from `test` where `id` rlike '.2$';
select * from `test` where `id` rlike '^2$';
select * from `test` where `id` rlike '.(2|3)'; // Slower
select * from `test` where `id` IN (12,13,22,23); // Faster
select * from `test` where `id` IN ('12,13,22,23'); // Wrong result
select * from `test` where `id` IN ('12','13','22','23'); // Slower

The last 4 queries have the same result in this example. I think that most of the times if you check it on SQLFiddle you would get query times that correspond to label that they have been given.


Don't have any experience with PDO so can't help with that but this method is pretty performant, although it's a bit ugly in places ;)

PHP

<?php

$nums = array(); $max = 10000;

for($i=0;$i<$max*10;$i++) $nums[] = $i;

$conn = new mysqli("127.0.0.1", "vldb_dbo", "pass", "vldb_db", 3306);

$sql = sprintf("call list_products_by_id('%s',0)", implode(",",array_rand($nums, $max)));

$startTime = microtime(true);

$result = $conn->query($sql);

echo sprintf("Fetched %d rows in %s secs<br/>", 
    $conn->affected_rows, number_format(microtime(true) - $startTime, 6, ".", ""));

$result->close();
$conn->close();

?>

Results

select count(*) from product;
count(*)
========
1000000

Fetched 1000 rows in 0.014767 secs
Fetched 1000 rows in 0.014629 secs

Fetched 2000 rows in 0.027938 secs
Fetched 2000 rows in 0.027929 secs

Fetched 5000 rows in 0.068841 secs
Fetched 5000 rows in 0.067844 secs

Fetched 7000 rows in 0.095199 secs
Fetched 7000 rows in 0.095184 secs

Fetched 10000 rows in 0.138205 secs
Fetched 10000 rows in 0.134356 secs

MySQL

drop procedure if exists list_products_by_id;

delimiter #

create procedure list_products_by_id
(
in p_prod_id_csv text,
in p_show_explain tinyint unsigned
)
proc_main:begin

declare v_id varchar(10);
declare v_done tinyint unsigned default 0;
declare v_idx int unsigned default 1;

    create temporary table tmp(prod_id int unsigned not null)engine=memory; 

    -- split the string into tokens and put into a temp table...

    if p_prod_id_csv is not null then
        while not v_done do
            set v_id = trim(substring(p_prod_id_csv, v_idx, 
                if(locate(',', p_prod_id_csv, v_idx) > 0, 
                        locate(',', p_prod_id_csv, v_idx) - v_idx, length(p_prod_id_csv))));

                if length(v_id) > 0 then
                set v_idx = v_idx + length(v_id) + 1;
                        insert ignore into tmp values(v_id);
                else
                set v_done = 1;
                end if;
        end while;
    end if;

    if p_show_explain then

        select count(*) as count_of_tmp from tmp;

        explain
        select p.* from product p
        inner join tmp on tmp.prod_id = p.prod_id order by p.prod_id;

    end if;

    select p.* from product p
        inner join tmp on tmp.prod_id = p.prod_id order by p.prod_id;

    drop temporary table if exists tmp;

end proc_main #

delimiter ;
0

精彩评论

暂无评论...
验证码 换一张
取 消