I have a file 500MB of size. It has some non-ascii characters in it. I just want to find out those characters using Unix command. May it will be better to get the line numbers and p开发者_运维知识库osition at each line.
Thanks :)
Use the answer given in the other solution, but add -n
to grep
.
You know, it's weird. Sometimes I find it faster to code up some quick and dirty C than it is to try and navigate the wilderness of UNIX utility command line options :-)
#include <stdio.h>
int main (void) {
size_t ln = 1;
size_t chpos = 0;
int chr;
while ((chr = fgetc (stdin)) != EOF) {
if (chr == '\n') {
ln++;
chpos = 0;
continue;
}
chpos++;
if (chr > 127) {
printf ("Non-ASCII %02x found at line %d, offset %d\n",
chr, ln, chpos);
}
}
return 0;
}
This will give you both the line number, and the character position within that line, of any characters outside the ASCII range.
精彩评论