Im creating a hibernate component to interact with large incoming data to persist, both save(create) and update data with volumes in the million of 开发者_如何学Pythonrows.
I am aware of the main differences around flush v commit, for example flush syncing the "dirty" data into the persistable underlying data, and that flush allows you to sync with the underlying persistable data without actually committing so that the transaction can be rolled back if required. Commit essentially commits all persistable data to the database.
Im creating a hibernate component to interact with large incoming data to persist, both save(create) and update data with volumes in the million of rows.
I am aware of the main differences around flush v commit, for example flush syncing the "dirty" data into the persistable underlying data, and that flush allows you to sync with the underlying persistable data without actually committing so that the transaction can be rolled back if required. Commit essentially commits all persistable data to the database.
Whats a reasonable size to do a batch insert? IS 50 the max amount for reasonable performance so something like:
for (i < 1000000)
if(i % 50 ) {
session.flush()
}
I gather 50 should match the value in the hibernate.jdbc.batch_size 50
It depends on your data. The batch size is a balance between the amount of items that hibernate will keep on its session, and the latency that is involved in making roundtrips to the db for flushing. If your batch size is too small, you'll end up making many roundtrips to the db. If your batch size is too large, you will end up holding many objects in hibernate's session - this can be a problem if your objects are fat.
I would say 50 is a low number: 1M / 50 = 20000
round trips. I would say for you to start with a bigger number and measure the performance. By the way, this applies for batch operations only: hibernate.jdbc.batch_size
is 50 is for regular app transactions.
PS don't forget to clear the hibernate session after flush, or else hibernate will hold the persisted objects in memory even after flush.
精彩评论