开发者

Update many-to-many counter cache for multiple rows and all their parents in 1 query

开发者 https://www.devze.com 2023-01-26 18:00 出处:网络
Consider a blog application which has tables forposts, categories and a lookup table linking a post with one or more categories. Categories are hierarchical. Posts can be assigned to any category, not

Consider a blog application which has tables for posts, categories and a lookup table linking a post with one or more categories. Categories are hierarchical. Posts can be assigned to any category, not just leaf nodes.

The categories table has a post_count field which caches the number of posts assigned to the specific category. It also has parent_id, lft and rght columns for MPTT.

But it also has a under_post_count field which caches the number of distinct posts assigned to it or any of it's child categories. This is useful so you can display a hierarchical list of categories with the number of posts assigned to it, or one of its children, next to it.

My application has got to the point where after a post is created with categories, or it开发者_运维问答's categories are edited or one which had categories is deleted, I have a list of category IDs of the old and new categories, whose post_count field needs updating. What I was hoping I could do next is in a single query, update the under_post_count fields for all those categories identified, and all their parents, with the number of distinct posts assigned to each category or any of it's children.

Here's the SQL required to create the tables and some test data for categories:

CREATE TABLE `categories` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent_id` int(11) DEFAULT NULL,
  `lft` int(11) DEFAULT NULL,
  `rght` int(11) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  `post_count` int(11) NOT NULL DEFAULT '0',
  `under_post_count` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM;

CREATE TABLE `categories_posts` (
  `category_id` int(11) NOT NULL,
  `post_id` int(11) NOT NULL,
  PRIMARY KEY (`category_id`,`post_id`)
) ENGINE=MyISAM;

INSERT INTO `categories` (`id`, `parent_id`, `lft`, `rght`, `name`) VALUES
(1, NULL, 1, 8, 'Cat 1'),
(4, 1, 2, 3, 'Cat 1.1'),
(5, 1, 4, 5, 'Cat 1.2'),
(6, 1, 6, 7, 'Cat 1.3'),
(2, NULL, 9, 16, 'Cat 2'),
(7, 2, 10, 11, 'Cat 2.1'),
(8, 2, 12, 13, 'Cat 2.2'),
(9, 2, 14, 15, 'Cat 2.3'),
(3, NULL, 17, 24, 'Cat 3'),
(10, 3, 18, 19, 'Cat 3.1'),
(11, 3, 20, 21, 'Cat 3.2'),
(12, 3, 22, 23, 'Cat 3.3');

Run this a few times to create some test data for the categories_posts table:

INSERT IGNORE INTO `categories_posts` (`category_id`, `post_id`) 
SELECT `id`, CEILING(10 * RAND()) FROM `categories` ORDER BY RAND() LIMIT 6

Can anyone figure this out, your help would be much appreciated?


well there are a few ways to skin a cat here (assuming 5.1 and triggers)

  • you can update everything from application layer

  • you can trigger updates to post_count from categories_posts and trigger updates (cascade) to under_post_count from categories

  • finally, you can trigger all updates from categories_posts

Also depending on the actual number of categories, you might not need to denormalize under_post_count as it should be rather easy and inexpensive to fetch it with

SELECT c.id, SUM(cc.post_count) 
FROM categories c 
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
GROUP BY c.id;

Fetching the actual counts on exact match is

SELECT c.id, COUNT(*) 
FROM categories c 
LEFT JOIN categories_posts cp ON c.id = cp.post_id 
GROUP BY c.id;

Combining the two gives the counts including the hierarchies

SELECT c.id, COUNT(*) 
FROM categories c 
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
LEFT JOIN categories_posts cp ON cc.id = cp.post_id
GROUP BY c.id;

EDIT

Constructing update statements from the above should not be so hard

UPDATE categories 
SET post_count = (SELECT COUNT(*) 
                  FROM categories_posts cp 
                  WHERE cp.post_id = categories.id)

should work for the post_count

Situation for the under_post_count is different since mysql does not like to hear that target table is mentioned in the where part, therefore you have to do some monstrosity like this

UPDATE categories LEFT JOIN 
       (SELECT c.id, COUNT(*) AS result 
        FROM categories c 
        LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
        INNER JOIN categories_posts cp ON cc.id = cp.post_id
        GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result

EDIT2
Actually there is an error in all of the above queries - whenever I joined categories and posts I should have joined on cc.id = cp.category_id and not cp.post_id, which then I didn't check. Don't feel like correcting... but only in this last query

UPDATE categories LEFT JOIN 
       (SELECT c.id, COUNT(*) AS result 
        FROM categories c 
        LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
        INNER JOIN categories_posts cp ON cc.id = cp.category_id
        INNER JOIN posts p ON cp.post_id = p.id
        WHERE p.status = 'published'
        GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result,
    post_count = (SELECT COUNT(*) 
                  FROM categories_posts cp 
                  WHERE cp.category_id = categories.id)

EDIT3
Just few notes:

  • the above query will fix the under_post_count and post_count regardless of the state of the data,
  • there are queries that are cheaper which if you data access layers are properly abstracted, secured and if you can assure the atomicity - these queries would only do post_count = post_count +/- 1 on appropriate records in statuses (similary for under_post_count),
  • in case you can not reliably emulate triggers from application level it might still be cheaper to check if you need to run the above queries (even though mysql is quite good in this respect, but if you want to be DB agnostic), or adopt some strategy that you normally just increment/decrement the counters and only periodically recalculate the number.
0

精彩评论

暂无评论...
验证码 换一张
取 消