开发者

Java : which of these two methods is more efficient?

开发者 https://www.devze.com 2023-02-10 07:50 出处:网络
I have a Huge data file and I only need specific data from this file, and later on, I will be using these data frequently.

I have a Huge data file and I only need specific data from this file, and later on, I will be using these data frequently. So which of these two methods would be more efficient :

  1. save this data in global variables (maybe LinkedList) and use them every time I need
  2. save them in a file, and read the file every time I nee开发者_开发技巧d the data

I should mention that these data could be a huge amount of integers. Which of the mentioned two ways would give better performance with respect to speed and memory ?


If the file I/O overhead is not an issue for you: Save them in a file and create an index file mapping keys to file positions so you do not have to read your huge file.

If the data fits in your RAM and you want to be able to access it quickly - go by the first approach (but maybe without an index file) but read the data into memory at startup or when needed the first time.


As long as it fits in memory, working with memory is surely some orders of magnitude faster. But do not use LinkedList - it has a huge overhead. And do not use any standard Collection at all since it means boxing and blows the memory overhead by a factor 3 at least.

You could use int[] or a specialized collection for primitive types.

I'd recommend using a file via java.nio.IntBuffer. This way the data reside primarily on the disk but get mapped into memory too.


Probably the first one.

But there really isn't enough information there to answer you properly.

Firstly a linked list is fine if you only ever traverse it in order. However, if you need random access to it (5th element, then 100th, then 12th, then 45th...), it's lousy, and you'd be better with an ArrayList or something. Secondly, if you're storing lots of ints, if you use one of the standard Java collections, each int will be boxed, which may present a performance overhead.

Then you haven't said what 'huge' means. Thousands? Millions?

So, yeah, you need to say what kind of numbers you're dealing with, and what the access patterns are likely to be. And is the 'filtering' step a one-off--or is it done quite frequently?


It depends on system spec, if you are designing your app for one machine - the task is simple, elsewhere you should take into account memory and/or disk space limit on client's computer.

I think you cannot compare these two attitudes performance, as each one has it's own benefits and drawbacks. I'm certain that there are some algorithms available that you could further investigate, connected with reading part of a file into the memory, or creating a cache (when you read a number from a file, store it in memory, so next time you load it - it will be stored in memory).

0

精彩评论

暂无评论...
验证码 换一张
取 消