I am writing a Java application that among other things needs to read a dictionary text file (each line is one word) and store it in a HashSet. Each time I start the application this same file is being read all over again (6 Megabytes unicode file).
That seemed expensive, so I decided to serialize resulting HashSet and store it to a binary file. I expected my application to run 开发者_开发知识库faster after this. Instead it got slower: from ~2,5 seconds before to ~5 seconds after serialization.
Is this expected result? I thought that in similar cases serialization should increase speed.
It's not a question of one serialization mechanism or another, it's a question of the data structure you are serializing.
You have one very efficient, natural representation of these words: a simple list, in the text file. That's fast to read.
You have created a data structure to store them which is different: a hash table. It takes more memory to represent a hash table. However the benefit is that it's very fast to look for a word, compared to a simple list.
But that tradeoff means serialization gets slower as well, since the naive serialization of a hash table will serialize more data and be larger, and therefore slower.
I think you should stick with the simple reading of the text file.
@Sean's answer is correct. Java serialization/deserialization has significant performance overheads. If you need to make the dictionary loading faster (or ...), consider the following approaches:
- Using the
java.nio.*
classes to read the file may speed things up. - If the application doesn't necessarily need the dictionary to be loaded instantly on startup, consider using a separate thread to do the dictionary loading asynchronously. The dictionary loading is no faster, but (for example) the application's GUI starts faster anyway.
精彩评论