My application demands archiving database tables between sybase and db2 and vice-a-versa and within(db2 to db2 and sybase to sybase) using java.
I am trying to understand the best strategies around in terms of performance, implementation, ease of use and scalability.
Here is my current process -
- source and destination tables with the acceptable parameters (from java) are defined within xml. [actual query is being placed inside the xml because at sometimes parameters are accepted from java (for a where clause condition for example)]
- the application reads the source and destination configurations and execute them sequentially.
- destination is sometimes optional when source is just deleting data from a specific table or when the source is just calling a stored procedure.
- dataset between source and destination is extremely large (开发者_如何转开发in millions)
From top of my head, it looks like I can define dependencies between multiple source and destination combination and have them execute in parallel in multiple treads. But will this improve any performance(i hope it will)?
Are there any open-source frameworks for data archiving using java? Any other thoughts on the implements side will be really helpful.
Thanks
The most powerful open source framework for Java persistence is Hibernate. You can reverse engineer Java model from existing DB (see Hibernate Tools), and perform a replication using Session.replicate(). You can fine tune performance by using stateless sessions and second level caching where applicable. Documentation is here
Look at some database replication tools (we use Shadowbase). They might have Java API's.
Also, check out this IBM whitepaper:
[IBM] offer a solution using JDBC and the SyncML standard to achieve generic database data replication.
Pentaho Data Integration has robust support for copying data between or from databases. Plus, it's Open Source and allows you write plugins in Java.
Migrate from Oracle to MySQL
The single most important thing you need to do is to disable auto-commit in JDBC, as you would otherwise commit after each insert in the database table. That ruins performance.
But you basically have to figure out your synchronization scheme so you can identify which records need to be copied before you can decide on how to actually do it.
精彩评论