开发者

How to find position of word in file?

开发者 https://www.devze.com 2023-03-26 09:27 出处:网络
for example I have file and word \"test\". file is partially binary but have string \"test\". How to find position of word ( index ) in file without load to memor开发者_开发知识库y this file ?You cann

for example I have file and word "test". file is partially binary but have string "test". How to find position of word ( index ) in file without load to memor开发者_开发知识库y this file ?


You cannot find the position of a text within a file unless you open the file. It is like asking someone to read a newspaper without opening the eye.

To answer the first part of your question, it is relatively simple.

with open('Path/to/file', 'r') as f:
    content = f.read()
    print content.index('test')


You can use memory-mapped files and regular expressions.

Memory-mapped file objects behave like both strings and like file objects. Unlike normal string objects, however, these are mutable. You can use mmap objects in most places where strings are expected; for example, you can use the re module to search through a memory-mapped file. Since they’re mutable, you can change a single character by doing obj[index] = 'a', or change a substring by assigning to a slice: obj[i1:i2] = '...'. You can also read and write data starting at the current file position, and seek() through the file to different positions.

Example

import re
import mmap

f = open('path/filename', 'r+b')
mf = mmap.mmap(f.fileno(), 0)
mf.seek(0) # reset file cursor
m = re.search('pattern', mf)
print m.start(), m.end()
mf.close()
f.close()


Try this:

with open(file_dmp_path, 'rb') as file:
fsize = bsize = os.path.getsize(file_dmp_path)
word_len = len(SEARCH_WORD)
while True:
    p = file.read(bsize).find(SEARCH_WORD)
    if p > -1:
        pos_dec = file.tell() - (bsize - p)
        file.seek(pos_dec + word_len)
        bsize = fsize - file.tell()
    if file.tell() < fsize:
        seek = file.tell() - word_len + 1
        file.seek(seek)
    else:
        break
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号