开发者

Is casting an expensive operation?

开发者 https://www.devze.com 2023-01-31 15:54 出处:网络
Scenario: I am parsing a big file (character fi开发者_开发百科le) . For example a .csv file (not exactly my case)

Scenario:

  • I am parsing a big file (character fi开发者_开发百科le) . For example a .csv file (not exactly my case)
  • I cannot hold the entire file in memory . So I must implement a buffer strategy .
  • I want to build a generic handler that will keep a constant number of lines in memory (as Strings) . This handler fetch other lines if necessary while removing the unneeded lines .
  • Over this handler I will build a parser that will transform the lines into Java objects and operate changes on those objects . Once the changes are done (update some fields on the objects) persist the changes back to the file .

Should I:

  • Instead of keep the buffer as an array of strings, should I keep the buffer directly as objects (doing a single cast) ? or...
  • Keep the buffer as lines, every time I need to operate on the buffer, cast the info to the right object, do the changes, persist the changes back to the file . Sequential operations will need supplementary casts .

I will have to keep the things simple . Any suggestions ?


Casting doesn't change the amount of memory an object occupies. It just changes the runtime type.

If you can do those operations on a per-row basis, then just do the operation immediately inside the loop wherein you read a single line.

while ((line = reader.readLine()) != null) {
    line = process(line);
    writer.println(line);
}

This way you effectively end up with only a single line in Java's memory everytime instead of the whole file.

Or if you need to do those operations based on the entire CSV file (i.e., those operations are dependent on all rows), then your most efficient bet is to import the CSV file in a real SQL database and then use SQL statements to alter the data and then export it to CSV file again.


I'd recommend using a MappedByteBuffer (from NIO), that you can use to read a file too big to fit into memory. It maps only a region of the file into memory; once you're done reading this region (say, the first 10k), map the next one, and so on, until you've read the whole file. Memory-efficient and quite easy to implement.


Java Casts: like

Object a = new String();
String b (String) a;

are not expensive. -- No matter if you cast Strings or any other type.


Your real value add will be to read each line as a String, which is pretty easy in Java. After it's in a String, it is trivial to split the string on each comma with

String[] row = parsedRow.split(",");

The you will have a String for each value in the array, which can then be operated on.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号