I have a CSV parser, that takes Outlook 2010 Contact Export .CSV file, and produces an array of values.
I break each row on the new line symbol, and each column on the comma. It works fine, until someone puts a new line inside a field (typically Address). This new line, which I assume is "\n" or "\r\n", explodes the row where it shouldn't, and the whole file becomes messed up from there on.
In my case, it happens when Business Street is written in two lines:
123 Apple Dr.
Unit A
My code:
$file = file_get_contents("outlook.csv");
$rows = explode("\r\n",$file);
foreach($rows as $ro开发者_如何转开发w)
{
$columns = explode(",",$row);
// Further manipulation here.
}
I have tried both "\n" and "\r\n", same result.
I figured I could calculate the number of columns in the first row (keys), and then find a way to not allow a new line until this many columns have been parsed, but it feels shady.
Is there another character for the new line that I can try, that would not be inside the data fields themselves?
The most common way of handling newlines in CSV files is to "quote" fields which contain significant characters such as newlines or commas. It may be worth looking into whether your CSV generator does this.
I recommend using PHP's fgetcsv() function, which is intended for this purpose. As you've discovered, splitting strings on commas works only in the most trivial cases. In cases, where that doesn't work, a more sophisticated, reportedly RFC4180-compliant parser is available here.
I also recommend fgetcsv()
fgetcsv will also take care of commas inside strings ( between quotes ).
Interesting parsing tutorial
+1 to the previous answer ;)
PS: fgetcsv is a bit slower then opening the file and explode the contents etc. But imo it's worth it.
精彩评论