I'm writing an application to analyse a MySQL database, and I need to execute several DMLs simmultaneously; for example:
// In ResultSet rsA: Select * from A;
rsA.beforeFirst();
while (rsA.next()) {
id = rsA.getInt("id");
// Retrieve data from table B: Select * from B where B.Id=" + id;
// Crunch some numbers using the data from B
// Close resultset B
}
I'm declaring an array of data objects, each with its own Connection to the database, which in turn calls several methods for the data analysis. The problem is all threads use the same connection, thus all tasks throw exceptios: "Lock wait timeout exceeded; try restarting transaction"
I believe there is a way to write the code in such a way that any given object has its own connection and executes the required tasks independent from any other object. For example:
DataObject dataObject[0] = new DataObject(id[0]);
DataObject dataObject[1] = new DataObject(id[1]);
DataObject dataObject[2] = new DataObject(id[2]);
...
DataObject da开发者_如何学编程taObject[N] = new DataObject(id[N]);
// The 'DataObject' class has its own connection to the database,
// so each instance of the object should use its own connection.
// It also has a "run" method, which contains all the tasks required.
Executor ex = Executors.newFixedThreadPool(10);
for(i=0;i<=N;i++) {
ex.execute(dataObject[i]);
}
// Here where the problem is: Each instance creates a new connection,
// but every DML from any of the objects is cluttered in just one connection
// (in MySQL command line, "SHOW PROCESSLIST;" throws every connection, and all but
// one are idle).
Can you point me in the right direction?
Thanks
I think the problem is that you've confounded a lot of middle tier, transactional, and persistent logic into one class.
If you're dealing directly with ResultSet, you're not thinking about things in a very object-oriented fashion.
You're smart if you can figure out how to get the database to do some of your calculations.
If not, I'd recommend keeping Connections open for the minimum time possible. Open a Connection, get the ResultSet, map it into an object or data structure, close the ResultSet and Connection in local scope, and return the mapped object/data structure for processing.
You keep persistence and processing logic separate this way. You save yourself a lot of grief by keeping connections short-lived.
If a stored procedure solution is slow it could be due to poor indexing. Another solution will perform equally poorly if not worse. Try running EXPLAIN PLAN and see if any of your queries are using TABLE SCAN. If yes, you have some indexes to add. It could also be due to large rollback logs if your transactions are long-running. There's a lot you could and should do to ensure you've done everything possible with the solution you have before switching. You could go to a great deal of effort and still not address the root cause.
After some time of brain breaking, I figured out my own mistakes... I want to put this new knowledge, so... here I go
I made a very big mistake by declaring the Connection objet as a Static object in my code... so obviously, despite I created a new Connection for each new data object I created, every transaction went through a single, static, connection.
With that first issue corrected, I went back to the design table, and realized that my process was:
- Read an Id from an input table
- Take a block of data related to the Id read in step 1, stored in other input tables
- Crunch numbers: Read the related input tables and process the data stored in them
- Save the results in one or more output tables
- Repeat the process while I have pending Ids in the input table
Just by using a dedicated connection for input reading and a dedicated connection for output writing, the performance of my program increased... but I needed a lot more!
My original approach for steps 3 and 4 was to save into the output each one of the results as soon as I had them... But I found a better approach:
- Read the input data
- Crunch the numbers, and put the results in a bunch of queues (one for each output table)
- A separated thread is checking every second if there's data in any of the queues. If there's data in the queues, write it to the tables.
So, by dividing input and output tasks using different connections, and by redirecting the core process output to a queue, and by using a dedicated thread for output storage tasks, I finally achieved what I wanted: Multithreaded DML execution!
I know there are better approaches to this particular problem, but this one works quite fine.
So... if anyone is stuck with a problem like this... I hope this helps.
精彩评论