开发者

PHP/MYSQL: Iterate over every record in a database

开发者 https://www.devze.com 2023-01-24 08:13 出处:网络
I am new to the whole php/mysql thing.I have a weeks worth of server logs (about 300,000 items) and I need to do some analysis.I am planning on reading them all into a mysql db and then analysing them

I am new to the whole php/mysql thing. I have a weeks worth of server logs (about 300,000 items) and I need to do some analysis. I am planning on reading them all into a mysql db and then analysing them with php.

The thing I am not sure about is how to iterate through them. Using java reading a file I would do something like this:

Scanner s = new Scanner(myfile);
while(s.hasNext()){
    String line = s.nextLine();
    ~~ Do something with this record. 
}

How do I iterate through all records in a mysql db using php? I think that something like this will take a stupid amount of memory.

    $query = "SELECT 开发者_运维知识库* FROM mytable";
    $result = mysql_query($query);
    $rows = mysql_num_rows($result);
    for($j = 0; $j < $rows; ++$j){
            $curIndex   = mysql_result($result,$j,"index");
            $curURL     = mysql_result($result,$j,"something");
            ~~ Do something with this record
    }

So I have added a limit to the select statement and I repeat until all records have been cycled through. Is there a more standard way to do this? Is there a built in that will do this?

while($startIndex < $numberOfRows){

    $query = "SELECT * FROM mytable ORDERBY mytable.index LIMIT $startIndex,$endIndex";
    $result = mysql_query($query);
    $rows = mysql_num_rows($result);
    for($j = 0; $j < $rows; ++$j){
            $curIndex   = mysql_result($result,$j,"index");
            $curURL     = mysql_result($result,$j,"something");
            ~~ Do something with this record
    }
    $startIndex = $endIndex + 1;
    $endIndex = $endIndes + 10;
}


You don't want to do a SELECT * FROM MYTABLE if your table is large, you're going to have the whole thing in memory. A trade-off between memory overhead and database calls would be to batch requests. You can get the min and max id's of rows in your table:

SELECT MIN(ID) FROM MYTABLE;
SELECT MAX(ID) FROM MYTABLE;

Now loop from minId to maxId, incrementing by say 10,000 each time. In pseudo-code:

for (int i = minId; i < maxId; i = i + 10000) {
   int x = i;
   int y = i + 10000;
   SELECT * FROM MYTABLE WHERE ID >= x AND ID < y;
}


See here:

http://www.tizag.com/mysqlTutorial/

http://www.tizag.com/mysqlTutorial/mysqlfetcharray.php

<?php
// Make a MySQL Connection
$query = "SELECT * FROM example"; 

$result = mysql_query($query) or die(mysql_error());


while($row = mysql_fetch_array($result)){
    echo $row['name']. " - ". $row['age'];
    echo "<br />";
}
?>

Depending on what you need to do with the resulting rows, you can use a different loops style, whether its 'while', 'for each' or 'for x to x'. Most of the time, a simple 'while' iteration will be great, and is efficient.


Use mysql_fetch_*

$result = mysql_query(...);
while($row = mysql_fetch_assoc($result)) {
 $curIndex = $row['index'];
}

I think that retrieves results in a "streaming" manner, rather than loading them all into memory at once. I'm not sure what exactly mysql_result does.

Side note: Since you're still new, I'd advice to get into good habits right away and immediately skip the mysql_ functions and go for PDO or at least mysqli.


In an ideal world, PHP would generate aggregate queries, send them to MySQL, and only get a small number of rows in return. For instance, if you're counting the number of log items of each severity between two dates:

SELECT COUNT(*), severity 
FROM logs
WHERE date < ? AND date > ?
GROUP BY severity

Doing the work on the PHP side is quite unusual. If you find out that you have needs too complex for SQL queries to handle (which, given that you have control over your database structure, leaves you with a lot of freedom), a better option would be to move to a Map-Reduce database engine like CouchDB.


I strongly believe the batch processing with Doctrine or any kind of iterations with MySQL (PDO or mysqli) are just an illusion.

@dimitri-k provided a nice explanation especially about unit of work. The problem is the miss leading: "$query->iterate()" which doesn't really iterate over the data source. It's just an \Traversable wrapper around already fully fetched data source.

An example demonstrating that even removing Doctrine abstraction layer completely from the picture, we will still run into memory issues:

echo 'Starting with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";

$pdo  = new \PDO("mysql:dbname=DBNAME;host=HOST", "USER", "PW");
$stmt = $pdo->prepare('SELECT * FROM my_big_table LIMIT 100000');
$stmt->execute();

while ($rawCampaign = $stmt->fetch()) {
    // echo $rawCampaign['id'] . "\n";
}

echo 'Ending with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";

Output:

Starting with memory usage: 6 MB 
Ending with memory usage: 109.46875 MB

Here, the disappointing getIterator() method:

namespace Doctrine\DBAL\Driver\Mysqli\MysqliStatement

/**
 * {@inheritdoc}
 */
public function getIterator()
{
    $data = $this->fetchAll();

    return new \ArrayIterator($data);
}

You can use my little library to actually stream heavy tables using PHP Doctrine or DQL or just pure SQL. However you find appropriate: https://github.com/EnchanterIO/remote-collection-stream

0

精彩评论

暂无评论...
验证码 换一张
取 消