A customer is sending me a .csv file where the line breaks are made up of the sequence 0xD 0xD 0xA
. As far as I know line breaks are either 0xA
from Mac or Unix or 0xD 0xA
from Windows.
Is the 0xD 0xD 0xA
any known encoding? Is there any known sequence of savings that corrup开发者_运维百科ts a file's line endings that causes this (I think the customer uses a Mac)?
The file doesn't start with any encoding markers, it starts with the text contents directly. The text is displayed correctly if opened with code page 1252.
The CRCRLF is known as result of a Windows XP notepad word wrap bug.
For future reference, here's an extract of relevance from the linked blog:
When you press the Enter key on Windows computers, two characters are actually stored: a carriage return (CR) and a line feed (LF). The operating system always interprets the character sequence CR LF the same way as the Enter key: it moves to the next line. However when there are extra CR or LF characters on their own, this can sometimes cause problems.
There is a bug in the Windows XP version of Notepad that can cause extra CR characters to be stored in the display window. The bug happens in the following situation:
If you have the word wrap option turned on and the display window contains long lines that wrap around, then saving the file causes Notepad to insert the characters CR CR LF at each wrap point in the display window, but not in the saved file.
The CR CR LF characters can cause oddities if you copy and paste them into other programs. They also prevent Notepad from properly re-wrapping the lines if you resize the Notepad window.
You can remove the CR CR LF characters by turning off the word wrap feature, then turning it back on if desired. However, the cursor is repositioned at the beginning of the display window when you do this.
Netscape ANSI encoded files use 0D 0D 0A for their line breaks.
Apple mail has also been known to make an encoding error on text and csv attachments outbound. In essence it replaces line terminators with soft line breaks on each line, which look like =0D in the encoding. If the attachment is emailed to Outlook, Outlook sees the soft line breaks, removes the = then appends real line breaks i.e. 0D0A so you get 0D0D0A (cr cr lf) at the end of each line. The encoding should be =0D= if it is a mac format file (or any other flavour of unix) or =0D0A= if it is a windows format file.
If you are emailing out from apple mail (in at least mavericks or yosemite), making the attachment not a text or csv file is an acceptable workaround e.g. compress it.
The bug also exists if you are running a windows VM under parallels and email a txt file from there using apple mail. It is the email encoding. Form previous comments here, it looks like netscape had the same issue.
This typically stems from a bug in revision control system, or similar. This was a product from CVS, if a file was checked in from Windows to Unix server, and then checked out again...
In other words, it is just broken...
I've seen this in text files produced by the Solidworks 3D CAD program. For example, if you export the equations of a model to a text file (default name is equations.txt), the line endings use 0x0D 0x0D 0x0A
.
I'm sure it's a bug, but it is what it is.
These files (at least those produced by the 2019 version of the software) include the
<0xEF, 0xBB, 0xBF>
UTF-8 representation of the byte order mark as the leading encoding markers at the start of the file.
Just saying, this is also the value (kind of...) that is returned from php upon:
<?php var_dump(urlencode(PHP_EOL)); ?>
// Prints: string '%0D%0A' (length=6)-- used in 5.4.24 at least
精彩评论