First of all, I am using Postgres 9.1.
I have a table named filepaths and other tables that have rows that point to the id of their corresponding filepaths. Is there a common design approach for reference counting the filepaths so that when other rows are dropped and there are no longer any references to a particular filepath row, it can be dropped as well?
Example:
[filepaths]
1 | c:\windows\system32\test.exe
2 | c:\windows\calc.exe
[events_2011_08_30]
[1][timestamp][other data] [ filepaths = 1]
[2][timestamp][other data] [ filepaths = 2]
[events_2011_08_31]
[1][timestamp][other data] [ filepaths = 1]
So I will be storing data in tables to partition it, and I want to delete old tables when they are older than say 30 days (will still have them archived). In the above example, let's assume开发者_开发问答 there are only those two events_ tables. If I delete the 2011_08_30, I would desire a way to know that nothing is pointing to filepaths '2' and therefore remove it, but know that a row is still pointing to filepaths '1' and therefore retain it.
Any thoughts, suggestions, etc? I believe some of the strategies I have read, at least for postgres and triggers, still have race conditions about which thread got a lock first on the primary key column and other related issues.
Thanks!
First, I would challenge the desire to partition your data over one table per date. I would suggest, instead, simply having a column in your table named effective_date
or similar.
In terms of having a reference count on filepaths
, that can be accomplished in several ways, but to ensure encapsulation of code, I'd recommend one of the following two approaches...
1. A Stored Procedure API
By ensuring that all INSERT, UPDATE and DELETE operations are processed through Stored Procedures you can then also encapsulate increment and decrement of the reference counter on filepaths
. No user/login then needs access to the tables for write activities, instead they just use the Stored Procedure.
2. Triggers
A trigger can be created on each events
table to encapsulate the reference counter increment and decrement code. Whenever the table contents are modified, the trigger fires and propagates the consequences of your business logic.
Many people shy away from triggers as they feel they are hidden away, and can have complex interactions if over used. I, however, think they are extremely useful.
In either case, everything is MUCH simplified by having just a single table, rather than multiple tables.
Additionally, most SQL Design patterns do not account for "deleting a table" to be a Business As Usual action. Creating or Removing tables, in my opinion, should be treated as design changes, not data activities.
You don't need reference counting. (you could implement it using triggers, if you really want it) What you want is called a foreign key [constraint]
The simplest way to clean up the filepath table is by using a NOT EXISTS construct, eg
DELETE FROM filepaths fp
WHERE NOT EXISTS ( SELECT * FROM events ev where ev.file_id = fp.file_id)
AND NOT EXISTS (SELECT * FROM events_version_xxx ev where ev.file_id = fp.file_id)
AND NOT EXISTS (SELECT * FROM events_version_yyy ev where ev.file_id = fp.file_id)
...
;
This is ugly. But the database model is ugly, too.
精彩评论