I scanned 2,8GB XML file for positions (Index) of particular tags. The I use Seek
method to set a start point in that file. File is UTF-8 encoded.
So indexing is like that:
using(StreamReader sr = new StreamReader(pathToFile)){
long index = 0;
while(!sr.EndOfStream){
string lin开发者_运维技巧e = sr.ReadLine();
index += (line.Length + 2); //remeber of \r\n chars
if(LineHasTag(line)){
SaveIndex(index-line.Length); //need beginning of the line
}
}
}
So afterwards I have in another file indexed positions. But when I use seek it doesn't seem to be good, because the position is set somewhere before it should be.
I have loaded some content of that file into char array and I manually checked the good index of a tag I need. It's the same as I indexed by code above. But still Seek
method on StreamReader.BaseStream
places the pointer earlier in the file. Quite strange.
Any suggestions?
Best regards, ventus
Seek deals in bytes - you're assuming there's one byte per character. In UTF-8, one character in the BMP can take up to three bytes.
My guess is that you've got non-ASCII characters in your file - those will take more than one byte.
I think there may also be a potential problem with the byte order mark, if there is one. I can't remember offhand whether StreamReader
will swallow that automatically - which would put you 3 bytes to start with.
精彩评论