I am using PHPExcel to read data out of Excel files.
With the following code, I am able to read one particular worksheet out of a 3MB Excel file in just a couple seconds. Works nicely.
However, I now have 27MB and 88MB Excel files which I need to get data out of. They are so large that even OpenOffice cannot open them.
I have found that I can use an index number instead of a name when I load a sheet, but this seems inconsistent, e.g. in one particular Excel file setLoadSheetsOnly(0)
gave me the third sheet while setLoadSheetsOnly(1)
gave me an error even though there were four worksheets in the file. So that seems unreliable for some reason.
Is there a way I can read out the names of the worksheets from a large file so that I can access only one of its worksheets at a time?
$objReader = PHPExcel_IOFactory::createReaderForFile("data/" . $file_name);
$objReader->setLoadSheetsOnly(array($sheet_name));
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load("data/" . $file_name);
echo '<table border="1">';
for ($row = 1; $row < $number_of_rows; $row++) {
echo '<tr>';
for ($column = 0; $column < $number_of_columns; $column++) {
$value = $objPHPExcel->setActiveSheetIndex(0)->getCellByColumnAndRow($column, $row)->getValue();
echo '<td>';
echo $value . ' ';
echo '</td>';
}
echo '</tr>';
}
echo '</table>';
die;
ADDENDUM:
I found some code that gets close, but it doesn't seem to always be a开发者_StackOverflow社区ccurate, e.g. here it missed the second worksheet in a 27MB file:
and here it only got the third worksheet and missed 3 others:
$objReader = PHPExcel_IOFactory::createReaderForFile("data/" . $file_name);
$objReader->setLoadSheetsOnly(0);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load("data/" . $file_name);
echo $objPHPExcel->getSheetCount(), ' worksheets<hr/>';
$loadedSheetNames = $objPHPExcel->getSheetNames();
foreach ($loadedSheetNames as $sheetIndex => $loadedSheetName) {
echo $sheetIndex, ' -> ', $loadedSheetName, '<br />';
}
die;
Unfortunately, it isn't possible to read the names of the worksheets without loading the whole file.
Using an index number rather than a name when calling setLoadSheetsOnly() won't give a predictable result: the code logic that performs that check uses in_array() to test if the sheetname that it is about to read is in the array of sheetnames to read. e.g.
// check if sheet should be skipped
if (isset($this->_loadSheetsOnly) && !in_array($sheet['name'], $this->_loadSheetsOnly)) {
continue;
}
I would suspect that the comparison of a string against a numeric value will be giving a true result for 0 == "mySheetName" when executing this test (based on PHP's loose typing and comparison casting rules).
I could probably provide a Reader method that would return a list of worksheet names without actually loading the entire file, although there would be a performance hit involved.
EDIT
If you add the following method to Classes/PHPExcel/Reader/Excel2007.php
/**
* Reads names of the worksheets from a file, without loading the whole file to a PHPExcel object
*
* @param string $pFilename
* @throws Exception
*/
public function listWorksheetNames($pFilename)
{
// Check if file exists
if (!file_exists($pFilename)) {
throw new Exception("Could not open " . $pFilename . " for reading! File does not exist.");
}
$worksheetNames = array();
$zip = new ZipArchive;
$zip->open($pFilename);
$rels = simplexml_load_string($this->_getFromZipArchive($zip, "_rels/.rels")); //~ http://schemas.openxmlformats.org/package/2006/relationships");
foreach ($rels->Relationship as $rel) {
switch ($rel["Type"]) {
case "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument":
$xmlWorkbook = simplexml_load_string($this->_getFromZipArchive($zip, "{$rel['Target']}")); //~ http://schemas.openxmlformats.org/spreadsheetml/2006/main");
if ($xmlWorkbook->sheets) {
foreach ($xmlWorkbook->sheets->sheet as $eleSheet) {
// Check if sheet should be skipped
$worksheetNames[] = (string) $eleSheet["name"];
}
}
}
}
$zip->close();
return $worksheetNames;
}
You can call it by using:
$inputFileType = 'Excel2007';
$inputFileName = 'biostat-behfisk-2005.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);
foreach ($worksheetNames as $sheetName) {
echo $sheetName, '<br />';
}
the returned $worksheetNames should contain an array of all worksheet names as UTF-8 strings. Because it's only reading the absolute minimum from the .xlsx to retrieve these names, it should be fairly fast. I'll do some more tests before checking it into the PHPExcel SVN, but (for now) it seems to do what you need.
EDIT2
Equivalent method for the Excel5 Reader
/**
* Reads names of the worksheets from a file, without loading the whole file to a PHPExcel object
*
* @param string $pFilename
* @throws Exception
*/
public function listWorksheetNames($pFilename)
{
// Check if file exists
if (!file_exists($pFilename)) {
throw new Exception("Could not open " . $pFilename . " for reading! File does not exist.");
}
$worksheetNames = array();
// Read the OLE file
$this->_loadOLE($pFilename);
// total byte size of Excel data (workbook global substream + sheet substreams)
$this->_dataSize = strlen($this->_data);
$this->_pos = 0;
$this->_sheets = array();
// Parse Workbook Global Substream
while ($this->_pos < $this->_dataSize) {
$code = self::_GetInt2d($this->_data, $this->_pos);
switch ($code) {
case self::XLS_Type_BOF: $this->_readBof(); break;
case self::XLS_Type_SHEET: $this->_readSheet(); break;
case self::XLS_Type_EOF: $this->_readDefault(); break 2;
default: $this->_readDefault(); break;
}
}
foreach ($this->_sheets as $sheet) {
if ($sheet['sheetType'] != 0x00) {
// 0x00: Worksheet, 0x02: Chart, 0x06: Visual Basic module
continue;
}
$worksheetNames[] = $sheet['name'];
}
return $worksheetNames;
}
Not as efficient as the Excel2007 Reader version, but should still be faster than parsing the whole .xls file just for the sheet names because I'm only parsing the global stream.
I didn't want to amend phpexcel so I went with this:
public function getWorksheetNames($pFilename) {
$worksheetNames = array ();
$zip = zip_open ( $pFilename );
while ( $entry = zip_read ( $zip ) ) {
$entry_name = zip_entry_name ( $entry );
if ($entry_name == 'xl/workbook.xml') {
if (zip_entry_open ( $zip, $entry, "r" )) {
$buf = zip_entry_read ( $entry, zip_entry_filesize ( $entry ) );
$workbook = simplexml_load_string ( $buf );
foreach ( $workbook->sheets as $sheets ) {
foreach( $sheets as $sheet) {
$attributes=$sheet->attributes();
$worksheetNames[]=$attributes['name'];
}
}
zip_entry_close ( $entry );
}
break;
}
}
zip_close ( $zip );
return $worksheetNames;
}
It only works on excel 2007 or later but did what I needed
精彩评论