This is a follow-up question to Strategy to improve Oracle DELETE performance. To recap, we have a large DB with a hierarchy of tables representing 1D through 4D output data from an optimization system. Reading and writing this data is fast and provides a convenient means for our various systems to utilize the information.
However, deleting unused data has become a bear. The current table hierarchy is below.
/* Metadata tables */
Case(CaseId, DeleteFlag, ...) On Delete Cascade CaseId
OptimizationRun(OptId, CaseId, ...) On Delete Cascade OptId
OptimizationStep(StepId, OptId, ...) On Delete Cascade StepId
/* Data tables */
Files(FileId, CaseId, Blob) /* deletes are near instantateous here */
/* Data per run */
OnedDataX(OptId, ...)
TwoDDataY1(OptId, ...) /* packed representation of a 1D slice */
/* Data not only per run, but per step */
TwoDDataY2(StepId, ...) /* packed representation of a 1D slice */
ThreeDDataZ(StepId, ...) /* packed representation of a 2D slice */
FourDDataZ(StepId, ...) /* packed representation of a 3D slice */
/* ... About 10 or so of these tables exist */
What I am looking for is a means of partitioning the Case
data such that I could drop a partition relating to the case to 开发者_开发问答remove its data. Ideally, OptimizationRun
would have an interval partition based on CaseId
and this would filter down through to its children. However, 11g doesn't support the combination of INTERVAL and REF partitioning.
I'm fairly certain ENABLE ROW MOVEMENT is out of the question based on the DB size and the requirement that the tablespaces live in ASSM. Maybe RANGE partitioning on OptimizationRun
and REF partitioning on the rest?
My guess is with that strategy I would need a trigger that accomplishes something like the following:
CREATE OR REPLACE TRIGGER Case_BeforeInsert_MakePartitions
BEFORE INSERT
ON Case
FOR EACH ROW
DECLARE
v_PartName varchar(64) := 'CASE_OPTPART_' || :new.CaseId;
v_PartRange Case.CaseId%type := :new.CaseId
BEGIN
-- Take :new.CaseId and create the partition
ALTER TABLE OptimizationRun
ADD PARTITION v_PartName
VALUES LESS THAN ( v_PartRange );
END;
And then the requisite trigger for before deletion:
CREATE OR REPLACE TRIGGER Case_BeforeDelete_RemovePartitions
BEFORE DELETE
ON Case
FOR EACH ROW
DECLARE
v_PartName varchar(64) := 'CASE_OPTPART_' || :old.CaseId;
BEGIN
-- Drop the partitions associated with the case
ALTER TABLE OptimizationRun
DROP PARTITION v_PartName;
END;
Good idea? Or is this an idea out of the SNL Bad Idea Jeans commercial?
Update, for size reference:
- 1D data tables ~1.7G
- 2D data tables ~12.5G
- 3D data tables ~117.3G
- 4D data tables ~315.2G
I'm pretty sure that you're on the right track with partitionning to deal with your delete performance problem. However, I don't think you'll be able to mix this with triggers. Complex logic with triggers has always bothered me but aside from this here are the problems you are likely to encounter:
- DDL statements break transaction logic since Oracle performs a commit of the current transaction before any DDL statement.
- Fortunately, you can't commit in a trigger (since Oracle is in the middle of an operation and the DB is not in a consistent state).
- Using autonomous transactions to perform DDL would be a (poor?) workaround for the insert but is unlikely to work for the DELETE since this would probably interfere with the ON DELETE CASCADE logic.
It would be easier to code and easier to maintain procedures that deal with the dropping and creation of partitions such as:
CREATE PROCEDURE add_case (case_id, ...) AS
BEGIN
EXECUTE IMMEDIATE 'ALTER TABLE OptimizationRun ADD partition...';
/* repeat for each child table */
INSERT INTO Case VALUES (...);
END;
Concerning the drop of partitions, you'll have to check if this works with referential integrity. It may be needed to disable the foreign key constraints before dropping a parent table partition in a parent-child table relationship.
Also note that global indexes will be left in an unusable state after a partition drop. You'll have to rebuild them unless you specify UPDATE GLOBAL in your drop statement (obviously this would rebuild them automatically but will take more time).
Not possible - you can't issue DDL like that in a row-level trigger.
[possible design issue commentary redacted, as addressed]
Have you considered parallelizing your script? Rather than a sweeper that's doing relying on delete cascade, instead leverage DBMS_SCHEDULER to parallelize the job. You can run parallel deletes against tables at the same level of the dependency tree safely.
begin
dbms_scheduler.create_program
(program_name => 'snapshot_purge_cases',
program_type => 'PLSQL_BLOCK',
program_action =>
'BEGIN
delete from purge$Case;
insert into purge$Case
select CaseId
from Case
where deleteFlag = 1;
delete from purge$Opt;
insert into purge$Opt
select OptId
from OptimizationRun
where CaseId in (select CaseId from purge$Case);
delete from purge$Step;
insert into purge$Step
select StepId
from OptimizationStep
where OptId in (select OptId from purge$Opt);
commit;
END;',
enabled => true,
comments => 'Program to snapshot keys for purging';
);
dbms_scheduler.create_program
(program_name => 'purge_case',
program_type => 'PLSQL_BLOCK',
program_action => 'BEGIN
loop
delete from Case
where CaseId in (select Case from purge$Case)
where rownum <= 50000;
exit when sql%rowcount = 0;
commit;
end loop;
commit;
END;',
enabled => true,
comments => 'Program to purge the Case Table'
);
-- repeat for each table being purged
end;
/
That only set up the programs. What we need to do next is set up a job chain so we can put them together.
BEGIN
dbms_scheduler.create_chain
(chain_name => 'purge_case_chain');
END;
/
Now we make steps in the job chain using the programs from before:
BEGIN
dbms_scheduler.define_chain_step
(chain_name => 'purge_case_chain',
step_name => 'step_snapshot_purge_cases',
program_name => 'snapshot_purge_cases'
);
dbms_scheduler.define_chain_step
(chain_name => 'purge_case_chain',
step_name => 'step_purge_cases',
program_name => 'purge_case'
);
-- repeat for every table
END;
/
Now we have to link the chain steps together. The jobs would fan out, like so:
- Snapshot the
CaseIds
,OptIds
andStepIds
to purge. - Purge all the tables dependent on
OptimizationStep.
- Purge all the tables dependent on
OptimizationRun.
- Purge all the tables dependent on
Case.
- Purge
Case.
So the code would then be:
begin
dbms_scheduler.define_chain_rule
(chain_name => 'purge_case_chain',
condition => 'TRUE',
action => 'START step_snapshot_purge_cases',
rule_name => 'rule_snapshot_purge_cases'
);
-- repeat for every table dependent on OptimizationStep
dbms_scheduler.define_chain_rule
(chain_name => 'purge_case_chain',
condition => 'step_snapshot_purge_cases COMPLETED',
action => 'START step_purge_TwoDDataY2',
rule_name => 'rule_purge_TwoDDataY2'
);
-- repeat for every table dependent on OptimizationRun
dbms_scheduler.define_chain_rule
(chain_name => 'purge_case_chain',
condition => 'step_purge_TwoDDataY2 COMPLETED and
step_purge_ThreeDDataZ COMPLETED and
... ',
action => 'START step_purge_OnedDataX',
rule_name => 'rule_purge_OnedDataX'
);
-- repeat for every table dependent on Case
dbms_scheduler.define_chain_rule
(chain_name => 'purge_case_chain',
condition => 'step_purge_OneDDataX COMPLETED and
step_purge_TwoDDataY1 COMPLETED and
... ',
action => 'START step_purge_Files',
rule_name => 'rule_purge_Files'
);
dbms_scheduler.define_chain_rule
(chain_name => 'purge_case_chain',
condition => 'step_purge_Files COMPLETED and
step_purge_OptimizationRun COMPLETED and
... ',
action => 'START step_purge_Case',
rule_name => 'rule_purge_Case'
);
-- add a rule to end the chain
dbms_scheduler.define_chain_rule
(chain_name => 'purge_case_chain',
condition => 'step_purge_Case COMPLETED',
action => 'END',
rule_name => 'rule_purge_Case'
);
end;
/
Enable the job chain:
BEGIN
DBMS_SCHEDULER.enable ('purge_case_chain');
END;
/
You can run the chain manually:
BEGIN
DBMS_SCHEDULER.RUN_CHAIN
(chain_name => 'chain_purge_case',
job_name => 'chain_purge_case_run'
);
END;
/
Or create a job to schedule it:
BEGIN
DBMS_SCHEDULER.CREATE_JOB (
job_name => 'job_purge_case',
job_type => 'CHAIN',
job_action => 'chain_purge_case',
repeat_interval => 'freq=daily',
start_date => ...
end_date => ...
enabled => TRUE);
END;
/
精彩评论