I have a postgresql database with about 150 tables(it's a Django 1.2 project). Django adds ON DELETE NO ACTION
and ON UPDATE NO ACTION
to foreign keys at the time of table creation.
Now I need to bulk delete data (about 800,000 records) from a bunch of tables based on certain condition.
Using Model.objects.filter().delete()
is not an options because data is huge and it takes a开发者_C百科 lot of time.
Only sanest options seems a cascading delete, but since Django has add "ON DELETE NO ACTION" it seem like a no option.
So my question: Is there any way to change all foreing keys to ON DELETE CASCADE in an easy way(there are many of them) or something similar.
(I am aware that I can manually write the SQL queries for each table, but that would be a monumental and difficult to maintain task.)
https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.ForeignKey.on_delete
As pointed out in the link which comprises Andrew's answer, if you set this to CASCADE
in Django, then Django will go and do the deletes "retail". If it is set to NO ACTION
you can create a database-level foreign key definition to handle things. That sounds like a reasonable plan to me.
Be sure you have an index defined on the referencing set of columns for every foreign key; otherwise you're going to see very slow performance. Some database products will automatically create such an index when you define a foreign key, but there are situations where that is not advantageous, so PostgreSQL puts the matter in your hands to optimize as you see fit. (Just as one example, it might not be worth the cost of maintaining the index during normal operations, but be worth building it before a purge and dropping it after.)
One note: ON DELETE CASCADE performs miserably on bulk operations. The reason is that this is done as a trigger. Consequently the way it looks from an algorithmic perspective is:
for row in delete_set:
for dependent row in (scan for referencing rows):
delete dependent row
If you are deleting 800000 rows in a parent table this translates into 800000 separate delete scans on the dependent tables. Even at your best case, with indexes usable 800000 separate index scans will be much slower than one sequential scan.
A better way to do this is to use a writeable common table expression in 9.1 or higher, or to just do separate delete statements in the same transaction. Something like:
WITH rows_to_delete (id) AS (
SELECT id FROM mytable WHERE where_condition
),
deleted_rows (id) AS (
DELETE FROM referencing_table WHERE mytable_id IN (select id FROM rows_to_delete)
RETURNING mytable_id
),
DELETE FROM mytable WHERE id IN (select id FROM deleted_rows);
This Reduces to something like, algorithmically:
scan for rows to delete as delete_set for dependent in scan for rows dependent to delete: delete dependent for to_delete in scan for rows referenced by deleted dependents: delete to_delete
Getting rid of the forced nested loop scan will greatly speed things up.
精彩评论