Consider a blog application which has tables for posts, categories and a lookup table linking a post with one or more categories. Categories are hierarchical. Posts can be assigned to any category, not just leaf nodes.
The categories table has a post_count
field which caches the number of posts assigned to the specific category. It also has parent_id
, lft
and rght
columns for MPTT.
But it also has a under_post_count
field which caches the number of distinct posts assigned to it or any of it's child categories. This is useful so you can display a hierarchical list of categories with the number of posts assigned to it, or one of its children, next to it.
My application has got to the point where after a post is created with categories, or it开发者_运维问答's categories are edited or one which had categories is deleted, I have a list of category IDs of the old and new categories, whose post_count
field needs updating. What I was hoping I could do next is in a single query, update the under_post_count
fields for all those categories identified, and all their parents, with the number of distinct posts assigned to each category or any of it's children.
Here's the SQL required to create the tables and some test data for categories:
CREATE TABLE `categories` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`lft` int(11) DEFAULT NULL,
`rght` int(11) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`post_count` int(11) NOT NULL DEFAULT '0',
`under_post_count` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
CREATE TABLE `categories_posts` (
`category_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`category_id`,`post_id`)
) ENGINE=MyISAM;
INSERT INTO `categories` (`id`, `parent_id`, `lft`, `rght`, `name`) VALUES
(1, NULL, 1, 8, 'Cat 1'),
(4, 1, 2, 3, 'Cat 1.1'),
(5, 1, 4, 5, 'Cat 1.2'),
(6, 1, 6, 7, 'Cat 1.3'),
(2, NULL, 9, 16, 'Cat 2'),
(7, 2, 10, 11, 'Cat 2.1'),
(8, 2, 12, 13, 'Cat 2.2'),
(9, 2, 14, 15, 'Cat 2.3'),
(3, NULL, 17, 24, 'Cat 3'),
(10, 3, 18, 19, 'Cat 3.1'),
(11, 3, 20, 21, 'Cat 3.2'),
(12, 3, 22, 23, 'Cat 3.3');
Run this a few times to create some test data for the categories_posts
table:
INSERT IGNORE INTO `categories_posts` (`category_id`, `post_id`)
SELECT `id`, CEILING(10 * RAND()) FROM `categories` ORDER BY RAND() LIMIT 6
Can anyone figure this out, your help would be much appreciated?
well there are a few ways to skin a cat here (assuming 5.1 and triggers)
you can update everything from application layer
you can trigger updates to
post_count
fromcategories_posts
and trigger updates (cascade) tounder_post_count
fromcategories
finally, you can trigger all updates from
categories_posts
Also depending on the actual number of categories, you might not need to denormalize under_post_count
as it should be rather easy and inexpensive to fetch it with
SELECT c.id, SUM(cc.post_count)
FROM categories c
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght
GROUP BY c.id;
Fetching the actual counts on exact match is
SELECT c.id, COUNT(*)
FROM categories c
LEFT JOIN categories_posts cp ON c.id = cp.post_id
GROUP BY c.id;
Combining the two gives the counts including the hierarchies
SELECT c.id, COUNT(*)
FROM categories c
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght
LEFT JOIN categories_posts cp ON cc.id = cp.post_id
GROUP BY c.id;
EDIT
Constructing update statements from the above should not be so hard
UPDATE categories
SET post_count = (SELECT COUNT(*)
FROM categories_posts cp
WHERE cp.post_id = categories.id)
should work for the post_count
Situation for the under_post_count
is different since mysql does not like to hear that target table is mentioned in the where part, therefore you have to do some monstrosity like this
UPDATE categories LEFT JOIN
(SELECT c.id, COUNT(*) AS result
FROM categories c
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght
INNER JOIN categories_posts cp ON cc.id = cp.post_id
GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result
EDIT2
Actually there is an error in all of the above queries - whenever I joined categories and posts I should have joined on cc.id = cp.category_id
and not cp.post_id
, which then I didn't check. Don't feel like correcting... but only in this last query
UPDATE categories LEFT JOIN
(SELECT c.id, COUNT(*) AS result
FROM categories c
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght
INNER JOIN categories_posts cp ON cc.id = cp.category_id
INNER JOIN posts p ON cp.post_id = p.id
WHERE p.status = 'published'
GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result,
post_count = (SELECT COUNT(*)
FROM categories_posts cp
WHERE cp.category_id = categories.id)
EDIT3
Just few notes:
- the above query will fix the
under_post_count
andpost_count
regardless of the state of the data, - there are queries that are cheaper which if you data access layers are properly abstracted, secured and if you can assure the atomicity - these queries would only do
post_count = post_count +/- 1
on appropriate records in statuses (similary forunder_post_count
), - in case you can not reliably emulate triggers from application level it might still be cheaper to check if you need to run the above queries (even though mysql is quite good in this respect, but if you want to be DB agnostic), or adopt some strategy that you normally just increment/decrement the counters and only periodically recalculate the number.
精彩评论