开发者

Break A Large File Into Many Smaller Files With PHP

开发者 https://www.devze.com 2023-02-05 05:20 出处:网络
I have a 209MB .txt file with about 95,000 lines that is automatically pushed to my server once a week to update some content on my website. The problem is I cannot allocate enough memory to process s

I have a 209MB .txt file with about 95,000 lines that is automatically pushed to my server once a week to update some content on my website. The problem is I cannot allocate enough memory to process such a large file, so I want to break the large file into smaller files with 5,000 lines each.

I cannot use file() at all until the file is broken into smaller pi开发者_如何转开发eces, so I have been working with SplFileObject. But I have gotten nowhere with it. Here's some pseudocode of what I want to accomplish:

read the file contents

while there are still lines left to be read in the file
    create a new file
    write the next 5000 lines to this file
    close this file

for each file created
    run mysql update queries with the new content

delete all of the files that were created

The file is in csv format.

EDIT: Here is the solution for reading the file by line given the answers below:

function getLine($number) {
    global $handle, $index;
    $offset = $index[$number];
    fseek($handle, $offset);
    return explode("|",fgets($handle));
}

$handle = @fopen("content.txt", "r");

while (false !== ($line = fgets($handle))) {
    $index[] = ftell($handle);
}

print_r(getLine(18437));

fclose($handle);


//MySQL Connection Stuff goes here

$handle = fopen('/path/to/bigfile.txt','r');  //open big file with fopen
$f = 1; //new file number

while(!feof($handle))
{
    $newfile = fopen('/path/to/newfile' . $f . '.txt','w'); //create new file to write to with file number
    for($i = 1; $i <= 5000; $i++) //for 5000 lines
    {
        $import = fgets($handle);
        fwrite($newfile,$import);
        if(feof($handle))
        {break;} //If file ends, break loop
    }
    fclose($newfile);
    //MySQL newfile insertion stuff goes here
    $f++; //Increment newfile number
}
fclose($handle);

This should work, the big file should go through 5000 lines per file, and output files like newfile1.txt, newfile2.txt, etc., can be adjusted by the $i <= 5000 bit in the for loop.

Oh, I see, you want to do insertion on the data from the big file, not store the info about the files. Then just use fopen/fgets and insert until feof.


If your big file is in CSV format, I guess that you need to process it line by line and don't actually need to break it into smaller files. There should be no need to hold 5.000 or more lines in memory at once! To do that, simply use PHP's "low-level" file functions:

$fp = fopen("path/to/file", "r");

while (false !== ($line = fgets($fp))) {
    // Process $line, e.g split it into values since it is CSV.
    $values = explode(",", $line);

    // Do stuff: Run MySQL updates, ...
}

fclose($fp);

If you need random-access, e.g. read a line by line number, you could create a "line index" for your file:

$fp = fopen("path/to/file", "r");

$index = array(0);

while (false !== ($line = fgets($fp))) {
    $index[] = ftell($fp);  // get the current byte offset
}

Now $index maps line numbers to byte offsets and you can navigate to a line by using fseek():

function get_line($number)
{
    global $fp, $index;
    $offset = $index[$number];
    fseek($fp, $offset);
    return fgets($fp);
}

$line10 = get_line(10);

// ... Once you are done:
fclose($fp);

Note that I started line counting at 0, unlike text editors.


This should do the trick for you, I don't have a very large text file, but I tested with a file that is 1300 lines long and it split the file into 3 files:

    // Store the line no:
    $i = 0;
    // Store the output file no:
    $file_count = 1;
    // Create a handle for the input file:
    $input_handle = fopen('test.txt', "r") or die("Can't open output file.");
    // Create an output file:
    $output_handle = fopen('test-'.$file_count.'.txt', "w") or die("Can't open output file.");

    // Loop through the file until you get to the end:
    while (!feof($input_handle)) 
    {
        // Read from the file:
        $buffer = fgets($input_handle);
        // Write the read data from the input file to the output file:
        fwrite($output_handle, $buffer);
        // Increment the line no:
        $i++;
        // If on the 5000th line:
        if ($i==5000)
        {
            // Reset the line no:
            $i=0;
            // Close the output file:
            fclose($output_handle);
            // Increment the output file count:
            $file_count++;
            // Create the next output file:
            $output_handle = fopen('test-'.$file_count.'.txt', "w") or die("Can't open output file.");
        }
    }
    // Close the input file:
    fclose($input_handle);
    // Close the output file:
    fclose($output_handle);

The problem you may now find is the execution time is too long for the script when you are talking about a 200+mb file.


You can use fgets to read line by line.

You'll need to create a function to put the read contents to a new file. Example:

function load(startLine) {
    read the original file from a point startline
    puts the content into new file
}

After this, you can call this function recursively to pass startline on the function in each cicle of reading.


If this is running on a linux server simply have php make the command line execute the following:

split -l 5000 -a 4 test.txt out

Then glob the results for file names which you can fopen on.


I think your algo is awkward, it looks like you're breaking up files for no reason. If you simply fopen the initial data file and read it line-by-line you can still preform the mysql insertion, then just remove the file.

0

精彩评论

暂无评论...
验证码 换一张
取 消