I am using C to parse a large flat file and output relevant lines into an output file. The output file should be around 70,000 lines.
If I open the file in gedit, it displays exactly as expected, with the correct number of lines and line lengths.
However, running wc -l <file>
returns 13,156. So does grep -c "" <file>
.
tail <file>
returns the last 10 lines that I see in gedit. head <file>
returns the first 10 lines.开发者_如何学运维 But tail -n +8000 | head -n 1
, which should return the 8,000th line, returns the text that I see on line 34,804 in gedit.
I'd expect these results if I was missing newline characters in the file. But gedit doesn't seem to have a problem with it. Additionally, wc -L <file>
, which displays the maximum line length, returns 142 bytes, as expected. The size of the file is a little over 9,000,000 bytes, as also expected.
If wc -L <file>
= 142, and wc -c <file>
= 9046609, then how can can wc -l <file>
= 13156?
Does anyone know what I did wrong when writing to this file?
It's probably some odd combination of return ('\r') and linefeed ('\n') characters.
Assuming you have the GNU Coreutils version of "tr", you can use these commands to count the number of each character in the file:
tr -d -c '\n' FILE | wc -c
tr -d -c '\r' FILE | wc -c
For a normal Unix-style text file, the second command should print 0. For a Windows-style text file, both should print the same number.
The "file" command will also probably tell you something useful.
精彩评论