开发者

Memory-mapped zip file in Java

开发者 https://www.devze.com 2023-02-16 00:45 出处:网络
Here is the problem I\'m trying to solve: I have about 100 binary files (in total 158KB and they are roughly the same size +/- 50% of each other). I need to selectively parse only a few of these file

Here is the problem I'm trying to solve:

I have about 100 binary files (in total 158KB and they are roughly the same size +/- 50% of each other). I need to selectively parse only a few of these files (in the worst case maybe 50, in other cases as little as 1 to 5). This is on an Android device, by the way.

What is the fastest way to do this in Java?

One way could be combining everything into one file and then using file seek to get to the each individual file. That way file open would only need to be called once and that is usually slow. However, in order to know where each file is there would need to be some sort of table in the beginning of the file -- which could be generated using a script -- but the files would also need to be indexed in the table in the order that they were concatenated so file seek wouldn't have to do much work (correct me if I'm wrong).

A better way would be to make the file memory-mapped and then the table wouldn't have to be in sorted order of concatenation because the memory-mapped file would have random access (again correct me if I'm wrong).

Creating that table would be an unnecessary if zip compression was used because zip compression already makes a table. In addition, all the files wouldn't have to be concatenated. I could zip the directory and then access each of the individual files by their entries in the zip file. Problem solved.

Except if the zip file isn't memory-mapped, it will be slower to read, since system calls are slower than direct memory access (correct me if I'm wrong). So I came to the conclusion that the best solution would be to use a memory-mapped zip archive.

However, the ZipFile entries return an 开发者_开发技巧InputStream to read the contents of the entry. And the MappedByteBuffer needs a RandomAccessFile which takes a filename as input, not an InputStream.

Is there anyway to memory-map a zip file for fast reads? Or is there a different solution to this problem of reading a selection of files?

Thanks

EDIT: I tested speeds of open, close, and parsing of the files here are the statistics that I found:

Number of Files: 25 (24 for parse because garbage collection interrupted timing)

Total Open Time: 72ms

Total Close Time: 1ms

Total Parse Time: 515ms

(this is skewed in Parse's favor because Parse is missing a file)

%Total time Open takes: 12%

%Total time Close takes: 0.17%

%Total time Parse takes: 88%

Avg time Open takes per file: 2.88ms

Avg time Close takes per file: 0.04ms

Avg time Parse takes per file: 21.46ms


I would use a simple api like RandomAccessFile for now and revisit the issue if you really need to.

Edit - I didn't know about MappedByteBuffer. That seems like the way to go. Why not do this with separate files first and then later think about combining them later?

0

精彩评论

暂无评论...
验证码 换一张
取 消