I need some help with TimesTen DB query optimization. I made some measures with Java profiler and found the code section that takes most of the time (this code section executes the SQL query). What is strange that this query becomes expensive only for some specific input data.
Here’s the example. We have two tables that we are querying, one represents the objects we want to fetch (T_PROFILEGROUP
), another represents the many-to-many link from some other table (T_PROFILECONTEXT_PROFILEGROUPS
). We are not querying linked table.
These are the queries that I executed with DB profiler running (they are the same except for the ID):
Command> select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272;
< 1169655247309537280 >
< 1169655249792565248 >
< 1464837997699399681 >
3 rows found.
Command> select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928;
< 1169655247309537280 >
1 row found.
This is what I have in the profiler:
12:14:31.147 1 SQL 2L 6C 10825P Preparing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272
12:14:31.147 2 SQL 4L 6C 10825P sbSqlCmdCompile ()(E): (Found already compiled version: refCount:01, bucket:47) cmdType:100, cmdNum:1146695.
12:14:31.147 3 SQL 4L 6C 10825P Opening: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272;
12:14:31.147 4 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272;
12:14:31.148 5 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272;
12:14:31.148 6 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272;
12:14:31.228 7 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272;
12:14:31.228 8 SQL 4L 6C 10825P Closing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272;
12:14:35.243 9 SQL 2L 6C 10825P Preparing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928
12:14:35.243 10 SQL 4L 6C 10825P sbSqlCmdCompile ()(E): (Found already compiled version: refCount:01, bucket:44) cmdType:100, cmdNum:1146697.
12:14:35.243 11 SQL 4L 6C 10825P Opening: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928;
12:14:35.243 12 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928;
12:14:35.243 13 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_开发者_运维问答ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928;
12:14:35.243 14 SQL 4L 6C 10825P Closing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928;
It’s clear that the first query took almost 100ms, while the second was executed instantly. It’s not about queries precompilation (the first one is precompiled too, as same queries happened earlier). We have DB indices for all columns used here: T_PROFILEGROUP.M_ID
, T_PROFILECONTEXT_PROFILEGROUPS.M_ID_OID
and T_PROFILECONTEXT_PROFILEGROUPS.M_ID_EID
.
My questions are:
- Why querying the same set of tables yields such a different performance for different parameters?
- Which indices are involved here?
- Is there any way to improve this simple query and/or the DB to make it faster?
UPDATE: to give the feeling of size:
Command> select count(*) from T_PROFILEGROUP;
< 183840 >
1 row found.
Command> select count(*) from T_PROFILECONTEXT_PROFILEGROUPS;
< 2279104 >
1 row found.
I'm not familiar with TimesTen but assuming it works like other relational DBs (except being in memory), one possible reason is either the absence of an index on the T_PROFILECONTEXT_PROFILEGROUP.M_ID_OID or the T_PROFILEGROUP.M_ID columns or a binary tree index being unbalanced.
Without an index the results will depend on the order of the data as to how quickly it will find them.
In tables with large data I've experienced unbalanced binary tree indexes causing issues as one side of the tree is far larger than the others. In this scenario a rebuild of the index can rebalance the tree.
I can't honestly say if that's applicable to TimesTen having never used it and my web searches are not yielding much information.
I'm not too familiar with the TimesTen database, but since it's less than 1/10th of a second here, could it just be a rounding difference in the two queries or some kind of hiccup?
Here's a link that goes over some of the methods for performance tuning a TimesTen DB. It has some general information (using prepares, etc.) as well as information on tuning specific queries. I hope it helps.
My rough bet is that the first query pulls data from either disk or virtual memory into actual memory, and the second query gets to take advantage of that.
Can you run this with three (or ten?) queries and report back?
精彩评论