开发者

Database - data versioning in single table

开发者 https://www.devze.com 2023-03-30 19:45 出处:网络
I\'m developing a CMS which has some version control features. It\'s based on a MySQL Db. The idea is to show public site visitors a \"certain revision\" of the data and backoffice users a preview o

I'm developing a CMS which has some version control features. It's based on a MySQL Db.

The idea is to show public site visitors a "certain revision" of the data and backoffice users a preview of the "latest revision". Publishing something just means to set the "certain revision" equal to the latest one (and maybe deleting data of old revisions).

I've read some Q&As about the topic on SO, most of them suggest that holding "old" and "new" rows in the same table is bad. But, since I need to join tables, all of them "versioned", splitting old and new in different tables isn't ideal either (how should the app know if "content" from one revision is old or new, and hence to be found in a "_history" table or not?).

So I decided to use just one table for each "content type".

The design I used: every table holds a "revision INT NOT NULL" column (part of primary key, together with an ID column).

Modifying something means inserting a new row with the modified values, an incremented revision, but the same ID.

Inserting something means inserting a new row with incremented ID and incremented revision.

Deleting something means inserting an empty row with same ID, incremented revision and a "thumbstone" flag set to "true".

Example: there are pages and there are "views" ("view not in MVC sense, view in an application specific meaning). "views" are versioned. One page has many views. This is (part of) "Views".

CREATE TABLE `_views` (
  `_id` int(11) NOT NULL,
  `_rev` int(11) NOT NULL,
  `_ts` BIT(1) DEFAULT b'0',
  `page` int(11) NOT NULL,
  `order` int(11) NOT NULL,
  PRIMARY KEY (`_id`,`_rev`)
)

I need to select all views that a page contains, up to a "certain revision", in the order specified by "order".

This query works开发者_StackOverflow社区:

SELECT * FROM (
 SELECT *
 FROM `_views`
 WHERE `page` = :page
 AND `_rev` <= :revision
 ORDER BY `_rev` DESC
) AS `all`
GROUP BY `_id`
HAVING `_ts` = 0
ORDER BY `order`

the subquery selects all views of a page, that were once "published" (which revision is less than or equal to the "published" revision). The outer query groups them to their latest revision, removes the groups that have a thumbstone and orders them by application specific criteria.

Since for a CMS scalability and performance is crucial, isn't there a better, more elegant, way than subqueries?

... or should I just focus on caching?


Using subqueries to determine the current revision is not the best approach; you really don't want to go there.

A simpler method is to add a flag which tells you about the most current revision:

   `_rev` int(11) NOT NULL,
   `_current` BIT(1),

This requires a manual UPDATE to set the _current flag whenever a new revision is added or the _ts flag changed. But at least that avoids executing the subquery on each page display.

As alternative you could still split your data into a _current and _history table. You'd then instead just create a view on both for those cases were you need to join result sets again:

 CREATE VIEW pages_all AS
      SELECT * FROM pages_current
      UNION ALL SELECT * FROM pages_history

Likewise it might be possible to create a subtable of all active (non-thumbstoned) revisions, if you need to group them frequently. Albeit that would incur even more manual micromanagement than a _current flag, or just a view over the _history table.

0

精彩评论

暂无评论...
验证码 换一张
取 消