Join operation duplication_问答_开发者_运维开发者技术经验分享

Let's imagine we have two tables: Users (UserId, UserName, UserPhoto) and Articles (ArticleId, UserId, Artic开发者_开发百科leText). Now we execute inner join query to retrieve users with articles:

SELECT UserId, UserName, UserPhoto, ArticleId, ArticleText 
FROM Users as u INNER JOIN Articles as a ON u.UserId = a.UserId

The structure of the query result will be the following:

UserId1 UserName1 UserPhoto1 ArticleId1 ArticleText1
UserId1 UserName1 UserPhoto1 ArticleId2 ArticleText2

So for the first user we have two articles and UserName1 and UserPhoto1 are duplicated. And what if UserPhoto stores several gigabytes blob?

I hope database protocols have some optimizations for such situations (may be some mapping telling that UserPhoto is equal for first and second lines) but I never met any notes about this. So I just want to be sure that such kind of optimization exists and I don't need to workaround it by myself

First, create a third table for Photos and associate UserId with Photo. Second, you'll need to run two separate queries in order to retrieve:

Each photo submitted by a user
Each article associated with a specific user/photo

You'll loop over all user/photo pairs, and query the articles inside your loop.

You could run two queries, one to get the User data (so each photo will travel once):

SELECT u.UserId
     , u.UserName
     , u.UserPhoto
FROM Users as u

and another to get the rest (Article) data:

SELECT a.UserId               <--- only UserId this time
     , a.ArticleId
     , a.ArticleText 
FROM Users as u
  INNER JOIN Articles as a
    ON u.UserId = a.UserId

Finally, combine the results in your application code, using the userids.

You can avoid fetching the photos multiple times like this:

SELECT * FROM (
  SELECT UserId, UserName, UserPhoto, ArticleId, ArticleText 
    FROM Users as u INNER JOIN Articles as a ON u.UserId = a.UserId
    WHERE ArticleId IN (SELECT MIN(ArticleId) FROM Articles GROUP BY UserId)
  UNION ALL
  SELECT UserId, UserName, NULL, ArticleId, ArticleText
  FROM Users as u INNER JOIN Articles as a ON u.UserId = a.UserId
  WHERE ArticleId NOT IN (SELECT MIN(ArticleId) FROM Articles GROUP BY UserId)
) base
ORDER BY ArticleId;  // UserId,ArticleId will also work if you want it sorted by users.

This only fetches the photo with the first article fetched, and returns NULL for subsequent articles. Your application can cache the photo on first read.

1) No matter how many times the photoblob appears in your result set it will be read(from Disk to memory in the server) only once, There are optimizations built in to make sure this is happening.

2) However it can be transported(from server to client) multiple times, there are no optimization built in for that.

3) The best solution would be to wrap this as a stored procedure that returns 2 record sets, and you do the join in the clinet code, this approach is different from running 2 queries which needs two round trip.

4) if you dont want to do that you can get all the article ids of the user in a CSV format, and then you can easily split the csv into separate strings in the client code.

Here is the sample output

UserId  UserName  UserPhoto   CSV_ArticleId               CSV_ArticleText
------- --------- ----------  ------------------------    ----------------------------
UserId1 UserName1 UserPhoto1  ",ArticleId1,ArticleId2"    ",ArticleText1,ArticleText2"
UserId2 UserName2 UserPhoto2  ",ArticleId3"               ",ArticleText3"

here is how you can do it. Run the code verbatim on a test database and you can see the result

CREATE TABLE Users(UserId int , UserName nvarchar(256), UserPhoto nvarchar(256))

CREATE TABLE Articles (ArticleId int , UserId int , ArticleText nvarchar(256))

INSERT INTO Users(UserId,UserName,UserPhoto)
VALUES (2,'2a','2pa')
INSERT INTO Users(UserId,UserName,UserPhoto)
VALUES (1,'a','pa')

INSERt INTO Articles (ArticleId, UserId, ArticleText)
VALUES (2,2,'text2')
INSERt INTO Articles (ArticleId, UserId, ArticleText)
VALUES (1,2,'text1')

;WITH tArticles AS (SELECT ArticleId, UserId, ArticleText FROM Articles)
SELECT 
    UserId, 
    UserName, 
    UserPhoto,
    (SELECT TOP 1 LTRIM(
                        (SELECT ',' + CONVERT(nvarchar(256),A.ArticleId) FROM Articles A WHERE U.UserId = A.UserId ORDER BY A.ArticleId FOR XML PATH(''))
                        )) as CSV_ArticleId,
    (SELECT TOP 1 LTRIM(
                        (SELECT ',' + CONVERT(nvarchar(256),A.ArticleText) FROM Articles A WHERE U.UserId = A.UserId ORDER BY A.ArticleId FOR XML PATH(''))
                        )) as CSV_ArticleText                   

FROM Users U