开发者

Oracle partitioning solution for DELETE performance problem

开发者 https://www.devze.com 2023-02-28 21:10 出处:网络
This is a follow-up question to Strategy to improve Oracle DELETE performance. To recap, we have a large DB with a hierarchy of tables representing 1D through 4D output data from an optimization syste

This is a follow-up question to Strategy to improve Oracle DELETE performance. To recap, we have a large DB with a hierarchy of tables representing 1D through 4D output data from an optimization system. Reading and writing this data is fast and provides a convenient means for our various systems to utilize the information.

However, deleting unused data has become a bear. The current table hierarchy is below.

/* Metadata tables */
Case(CaseId, DeleteFlag, ...) On Delete Cascade CaseId
OptimizationRun(OptId, CaseId, ...) On Delete Cascade OptId
OptimizationStep(StepId, OptId, ...) On Delete Cascade StepId

/* Data tables */
Files(FileId, CaseId, Blob) /* deletes are near instantateous here */

/* Data per run */
OnedDataX(OptId, ...)
TwoDDataY1(OptId, ...) /* packed representation of a 1D slice */

/* Data not only per run, but per step */
TwoDDataY2(StepId, ...)  /* packed representation of a 1D slice */
ThreeDDataZ(StepId, ...) /* packed representation of a 2D slice */
FourDDataZ(StepId, ...)  /* packed representation of a 3D slice */
/* ... About 10 or so of these tables exist */

What I am looking for is a means of partitioning the Case data such that I could drop a partition relating to the case to 开发者_开发问答remove its data. Ideally, OptimizationRun would have an interval partition based on CaseId and this would filter down through to its children. However, 11g doesn't support the combination of INTERVAL and REF partitioning.

I'm fairly certain ENABLE ROW MOVEMENT is out of the question based on the DB size and the requirement that the tablespaces live in ASSM. Maybe RANGE partitioning on OptimizationRun and REF partitioning on the rest?

My guess is with that strategy I would need a trigger that accomplishes something like the following:

CREATE OR REPLACE TRIGGER Case_BeforeInsert_MakePartitions
BEFORE INSERT
    ON Case
    FOR EACH ROW
DECLARE
    v_PartName varchar(64)       := 'CASE_OPTPART_' || :new.CaseId;
    v_PartRange Case.CaseId%type := :new.CaseId
BEGIN
    -- Take :new.CaseId and create the partition
    ALTER TABLE OptimizationRun
        ADD PARTITION v_PartName
        VALUES LESS THAN ( v_PartRange );
END;

And then the requisite trigger for before deletion:

CREATE OR REPLACE TRIGGER Case_BeforeDelete_RemovePartitions
BEFORE DELETE
    ON Case
    FOR EACH ROW
DECLARE
    v_PartName varchar(64) := 'CASE_OPTPART_' || :old.CaseId;
BEGIN
    -- Drop the partitions associated with the case
    ALTER TABLE OptimizationRun
        DROP PARTITION v_PartName;
END;

Good idea? Or is this an idea out of the SNL Bad Idea Jeans commercial?

Update, for size reference:

  • 1D data tables ~1.7G
  • 2D data tables ~12.5G
  • 3D data tables ~117.3G
  • 4D data tables ~315.2G


I'm pretty sure that you're on the right track with partitionning to deal with your delete performance problem. However, I don't think you'll be able to mix this with triggers. Complex logic with triggers has always bothered me but aside from this here are the problems you are likely to encounter:

  • DDL statements break transaction logic since Oracle performs a commit of the current transaction before any DDL statement.
  • Fortunately, you can't commit in a trigger (since Oracle is in the middle of an operation and the DB is not in a consistent state).
  • Using autonomous transactions to perform DDL would be a (poor?) workaround for the insert but is unlikely to work for the DELETE since this would probably interfere with the ON DELETE CASCADE logic.

It would be easier to code and easier to maintain procedures that deal with the dropping and creation of partitions such as:

CREATE PROCEDURE add_case (case_id, ...) AS
BEGIN
   EXECUTE IMMEDIATE 'ALTER TABLE OptimizationRun ADD partition...';
   /* repeat for each child table */
   INSERT INTO Case VALUES (...);
END;

Concerning the drop of partitions, you'll have to check if this works with referential integrity. It may be needed to disable the foreign key constraints before dropping a parent table partition in a parent-child table relationship.

Also note that global indexes will be left in an unusable state after a partition drop. You'll have to rebuild them unless you specify UPDATE GLOBAL in your drop statement (obviously this would rebuild them automatically but will take more time).


Not possible - you can't issue DDL like that in a row-level trigger.

[possible design issue commentary redacted, as addressed]

Have you considered parallelizing your script? Rather than a sweeper that's doing relying on delete cascade, instead leverage DBMS_SCHEDULER to parallelize the job. You can run parallel deletes against tables at the same level of the dependency tree safely.

begin
  dbms_scheduler.create_program
    (program_name => 'snapshot_purge_cases',
     program_type => 'PLSQL_BLOCK',
     program_action => 
      'BEGIN
         delete from purge$Case;
         insert into purge$Case
         select CaseId 
           from Case
          where deleteFlag = 1;

         delete from purge$Opt;
         insert into purge$Opt
         select OptId 
           from OptimizationRun
          where CaseId in (select CaseId from purge$Case);

         delete from purge$Step;
         insert into purge$Step
         select StepId 
           from OptimizationStep
          where OptId in (select OptId from purge$Opt);

         commit;
       END;',
     enabled => true,
     comments => 'Program to snapshot keys for purging';           
    );

  dbms_scheduler.create_program 
    (program_name => 'purge_case',
     program_type => 'PLSQL_BLOCK',
     program_action => 'BEGIN 
                          loop
                            delete from Case 
                             where CaseId in (select Case from purge$Case)
                            where rownum <= 50000;
                            exit when sql%rowcount = 0;
                            commit;
                          end loop;
                          commit;
                        END;',
     enabled => true,
     comments => 'Program to purge the Case Table'
    );

  -- repeat for each table being purged

end;
/

That only set up the programs. What we need to do next is set up a job chain so we can put them together.

BEGIN
  dbms_scheduler.create_chain 
   (chain_name => 'purge_case_chain');
END;
/

Now we make steps in the job chain using the programs from before:

BEGIN
  dbms_scheduler.define_chain_step
   (chain_name => 'purge_case_chain',
    step_name  => 'step_snapshot_purge_cases',
    program_name => 'snapshot_purge_cases'
   );

  dbms_scheduler.define_chain_step
   (chain_name => 'purge_case_chain',
    step_name  => 'step_purge_cases',
    program_name => 'purge_case'
   );

  -- repeat for every table
END;
/

Now we have to link the chain steps together. The jobs would fan out, like so:

  1. Snapshot the CaseIds, OptIds and StepIds to purge.
  2. Purge all the tables dependent on OptimizationStep.
  3. Purge all the tables dependent on OptimizationRun.
  4. Purge all the tables dependent on Case.
  5. Purge Case.

So the code would then be:

begin
  dbms_scheduler.define_chain_rule
   (chain_name => 'purge_case_chain',
    condition  => 'TRUE',
    action     => 'START step_snapshot_purge_cases',
    rule_name  => 'rule_snapshot_purge_cases'
   );

  -- repeat for every table dependent on OptimizationStep
  dbms_scheduler.define_chain_rule
   (chain_name => 'purge_case_chain',
    condition  => 'step_snapshot_purge_cases COMPLETED',
    action     => 'START step_purge_TwoDDataY2',
    rule_name  => 'rule_purge_TwoDDataY2'
   );

  -- repeat for every table dependent on OptimizationRun     
  dbms_scheduler.define_chain_rule
   (chain_name => 'purge_case_chain',
    condition  => 'step_purge_TwoDDataY2  COMPLETED and
                   step_purge_ThreeDDataZ COMPLETED and
                   ... ',
    action     => 'START step_purge_OnedDataX',
    rule_name  => 'rule_purge_OnedDataX'
   );

  -- repeat for every table dependent on Case  
  dbms_scheduler.define_chain_rule
   (chain_name => 'purge_case_chain',
    condition  => 'step_purge_OneDDataX  COMPLETED and
                   step_purge_TwoDDataY1 COMPLETED and
                   ... ',
    action     => 'START step_purge_Files',
    rule_name  => 'rule_purge_Files'
   );

  dbms_scheduler.define_chain_rule
   (chain_name => 'purge_case_chain',
    condition  => 'step_purge_Files           COMPLETED and
                   step_purge_OptimizationRun COMPLETED and 
                   ... ',
    action     => 'START step_purge_Case',
    rule_name  => 'rule_purge_Case'
   );

  -- add a rule to end the chain
  dbms_scheduler.define_chain_rule
   (chain_name => 'purge_case_chain',
    condition  => 'step_purge_Case COMPLETED',
    action     => 'END',
    rule_name  => 'rule_purge_Case'
   );

end;
/

Enable the job chain:

BEGIN
  DBMS_SCHEDULER.enable ('purge_case_chain');
END;
/

You can run the chain manually:

BEGIN
  DBMS_SCHEDULER.RUN_CHAIN
   (chain_name => 'chain_purge_case',
    job_name   => 'chain_purge_case_run'
   );
END;
/

Or create a job to schedule it:

BEGIN
  DBMS_SCHEDULER.CREATE_JOB (
    job_name        => 'job_purge_case',
    job_type        => 'CHAIN',
    job_action      => 'chain_purge_case',
    repeat_interval => 'freq=daily',
    start_date      => ...
    end_date        => ...
    enabled         => TRUE);
END;
/
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号