I have a program which reads a text file line by line, and creates a Hibernate entity object from each line, and saves them. I have several such text files to process, each of which has about 300,000 lines. I'm finding that my current implementation is excruciatingly slow, and I'm wondering if there's anything I can do to improve things.
My main method processes the text file line by line like so:
// read the file line by line
FileInputStream fileInputStream = new FileInputStream(new File(fileName));
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
int lineCount = 0;
String line = bufferedReader.readLine();
while (line != null)
{
// convert the line into an Observations object and persist it
convertAndPersistObservationsLine(line);
// if the number of lines we've processed has built up to the JDBC batch size then flush
// and clear the session in order to control the size of Hibernate's first level cache
lineCount++;
if (lineCount % JDBC_CACHE_SIZE == 0)
{
observationsDao.flush();
observationsDao.clear();
}
line = bufferedReader.readLine();
}
The convertAndPersistObservationsLine() method just splits the text line into tokens, creates a new entity object, populates the entity's fields with data from the tokens, and then saves the object via a DAO that calls Hibernate's Session.saveOrUpdat开发者_运维技巧e() method. The DAO methods flush() and clear() are direct calls to the corresponding Hibernate Session methods.
The Hibernate property 'hibernate.use_second_level_cache' is set to false, and the Hibernate property 'hibernate.jdbc.batch_size' is set to 50, as is the Java constant JDBC_CACHE_SIZE.
Can someone suggest a better way of going about this, or any tweaks to the above which may improve the performance of this batch loading program?
Thanks in advance for your help.
--James
The code itself and the Hibernate configuration look correct (by correct I mean that they follow the batch insert idiom from the documentation). But here are some additional suggestions:
As already mentioned, make absolutely sure that you aren't using an ID generator that defeats batching like IDENTITY
. When using GenerationType.AUTO
, the persistence provider will pick an appropriate strategy depending on the database so, depending on your database, you might have to change that for a TABLE
or SEQUENCE
strategy (because Hibernate can cache the IDs using an hi-lo algorithm).
Also make sure that Hibernate is batching as expected. To do so, activate logging and monitor the BatchingBatcher
to track the size of the batch it's executing (will be logged).
In your particular case, you might actually consider using the StatelessSession
interface (once the problem will be solved of course).
A few things:
Can you quantify "excruciatingly slow"? How many inserts per second are you achieving? What rate do you think you should have instead? What type of load is the database itself under? Are others reading from the table at the same time?
How are you connecting to the database? Is all of this occurring in a single transaction re-using the same connection?
Are you by any chance using an
identity
identifier? The documentations states that JDBC batching is disabled silently if you are:
Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.
If you are using MySQL, you might want to turn on rewriteBatchedStatements since MySQL does not support batch form of prepare statement parameter bindings. It will rewrite your insert statements to the form as "INSERT INTO YourEntity VALUES (...), (...), (...)".
Please refer to: http://shengchien.blogspot.com/2010/10/hibernate-batch-processing-with-mysql.html
精彩评论