开发者

Determine what line ending is used in a text file

开发者 https://www.devze.com 2023-01-02 20:14 出处:网络
Whats the best way in C# to determine the line endings used in a text开发者_JAVA技巧 file (Unix, Windows, Mac)?Notice that text files may have inconsistent line endings. Your program should not choke

Whats the best way in C# to determine the line endings used in a text开发者_JAVA技巧 file (Unix, Windows, Mac)?


Notice that text files may have inconsistent line endings. Your program should not choke on that. Using ReadLine on a StreamReader (and similar methods) will take care of any possible line ending automatically.

If you manually read lines from a file, make sure to accept any line endings, even if inconsistent. In practice, this is quite easy using the following algorithm:

  • Scan ahead until you find either CR or LF.
  • If you read CR, peek ahead at the next character;
  • If the next character is LF, consume it (otherwise, put it back).


Here is some advanced guesswork: read the file, count CRs and LFs

if (CR > LF*2) then "Mac" 
else if (LF > CR*2) then "Unix"
else "Windows"

Also note, that newer Macs (Mac OS X) use Unix line endings


I'd just search the file for the first \r or \n and if it was a \n I'd look at the previous character to see if it's a \r, if so, it's \r\n otherwise it's whichever found.


I would imagine you couldn't know for sure, would have to set this in the editor. You could use some AI, the algorithm would be:

  1. Search for each type of line ending, you'd search those specific characters
  2. Measure the distances between the them.
  3. If one type tends to repeat then you assume that's the type. Count the repeats and use some measure of dispersion.

So, for example, if you had repeats of CRLF at 38, 40, 45, and that was within tolerance you'd default to assuming the line end was CRLF.


If it were me, I'd just read the file one char at a time until I came across the first \r or a \n. This is assuming you have sensical input.


Reading most of textual formats I usually look for \n, and then Trim() the whole string (whitespaces at beginning and end are often redundant).


There is Environment.NewLine though that is only for determining what is used on the current system and won't help with reading files from various sources.

If it's reading I usually look for \n (Edit: apperantly there are some using only \r) and assume that the line ends there.

0

精彩评论

暂无评论...
验证码 换一张
取 消