Is there a way to get different results for the same SQL query if the data stays the same?_问答_开发者

I get a different result set for this query intermittently when I run it...sometimes it gives 1363, sometimes 1365 and sometimes 1366 results. The data doesn't change. What could be causing this and is there a way to prevent it? Query looks something like this:

SELECT * 
FROM 
    (
            SELECT  
                        RC.UserGroupId,
                        RC.UserGroup,
                        RC.ClientId AS CLID,                     
                        CASE WHEN T1.MultipleClients = 1 THEN RC.Salutation1 ELSE RC.DisplayName1 END AS szDisplayName,
                        T1.MultipleClients,
                        RC.IsPrimaryRecord,
                        RC.RecordTypeId,
                        RC.ClientTypeId,
                        RC.ClientType,
                        RC.IsDeleted,
                        RC.IsCompany,                                            
                        RC.KnownAs,
                        RC.Salutation1,
                        RC.FirstName, 
                        RC.Surname,                   
                        Relationship, 
                        C.DisplayName Client,
                        RC.DisplayName RelatedClient, 
                        E.Email,                                                            
                        RC.DisplayName  + ' is the ' + R.Relationship + ' of ' + C.DisplayName Description,
                        ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id


            FROM 
                        SSDS.Client.ClientExtended C 
                                                                              INNER JOIN 
                        SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID 
                                                                              INNER JOIN 
                        SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
                                                                              LEFT OUTER JOIN
                        SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId                 
                                                                              LEFT OUTER JOIN 
                        SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId 
                                                                              INNER JOIN                                                                 

                        (
                              SELECT 
                                          E.Email, 
                                          CASE WHEN (COUNT(DISTINCT RC.DisplayName) > 1) THEN 1 ELSE 0 END AS MultipleClients 

                              FROM
                                          SSDS.Client.ClientExtended C 
                                                                                                INNER JOIN 
                                         开发者_JAVA技巧 SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID 
                                                                                                INNER JOIN 
                                          SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
                                                                                                LEFT OUTER JOIN
                                          SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId                 
                                                                                                LEFT OUTER JOIN 
                                          SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId

                              WHERE 
                                          Relationship IN ('z-Group Principle', 'z-Group Member ')           
                                          AND E.Email IS NOT NULL

                              GROUP BY E.Email 

                        ) T1 ON E.Email = T1.Email


            WHERE 

                                                Relationship IN ('z-Group Principle', 'z-Group Member ')           
                                                AND E.Email IS NOT NULL                                         
    ) T


WHERE       
            sequence_id = 1
            AND T.UserGroupId IN (Select * from iCentral.dbo.GetSubUserGroups('471b9cbd-2312-4a8a-bb20-35ea53d30340',0))         
            AND T.IsDeleted = 0           
            AND T.RecordTypeId = 1 
            AND T.ClientTypeId IN
            (
                        '1',              --Client
                        '-1652203805'    --NTU                      
            )        

        AND T.CLID NOT IN
          (
            SELECT DISTINCT  
                               UDDF.CLID
            FROM
                   SLacsis_SLM.dbo.T_UserDef UD WITH (NOLOCK) 
                          INNER JOIN
                   SLacsis_SLM.dbo.T_UserDefData UDDF WITH (NOLOCK)
                   ON UD.UserDef_ID = UDDF.UserDef_ID
                          INNER JOIN
                   SLacsis_SLM.dbo.T_Client CLL WITH (NOLOCK)
                   ON CLL.CLID = UDDF.CLID AND CLL.UserGroup_CLID = UD.UserID

            WHERE 
                           UD.UserDef_ID in    
                                    (
                                    'F68F31CE-525B-4455-9D50-6DA77C66FEE5',                                    
                                    'A7CECB03-866C-4F1F-9E1A-CEB09474FE47'  
                                    )

                           AND UDDF.Data = 'NO'
           ) 


ORDER BY T.Surname

EDIT:

I have removed all NOLOCK's (including the ones in views and UDFs) and I'm still having the same issue. I get the same results every time for the nested select (T) and if I put the result set of T into a temp table in the beginning of the query and join onto the temp table instead of the nested select then the final result set is the same every time I run the query.

EDIT2:

I have been doing some more reading on ROW_NUMBER()...I'm partitioning by email (of which there are duplicates) and ordering by Relationship (where there are only 1 of 2 relationships). Could this cause the query to be non-deterministic and would there be a way to fix that?

EDIT3:

Here are the actual execution plans if anyone is interested http://www.mediafire.com/?qo5gkh5dftxf0ml. Is it possible to see that it is running as read committed from the execution plan? I've compared the files using WinMerge and the only differences seem to be the counts (ActualRows="").

EDIT4:

This works:

SELECT * FROM 
(
   SELECT *, 
             ROW_NUMBER() OVER (PARTITION BY B.Email ORDER BY Relationship DESC) AS sequence_id
             FROM
            (

                                    SELECT DISTINCT  
                                    RC.UserGroupId,
                                    ...
            ) B...

EDIT5:

When running the same ROW_NUMBER() query (T in the original question, just selecting RC.DisplayName and ROW_NUMBER) twice in a row I get different rank for some people:

Is there a way to get different results for the same SQL query if the data stays the same?

Does anyone have a good explanation/example of why or how ROW_NUMBER() over a result set that contains duplicates can rank differently each time it is run and ultimately change the number of results?

EDIT6:

Ok I think this makes sense to me now. This occurs when people 2 people have the same email address (e.g a husband and wife pair) and relationship. I guess in this case their ROW_NUMBER() ranking is arbitrary and can change every time it is run.

Your use of NOLOCK all over means you are doing dirty reads and will see uncommitted data, data that will be rolled back, transient and inconsistent data etc

Take these off, try again, report back pleas

Edit: some options with NOLOCKS removed

Data is really changing
Some parameter or filter is changing (eg GETDATE)
Some float comparisons running on different cores each time
See this on dba.se https://dba.stackexchange.com/q/4810/630
Embedded NOLOCKs in udfs or views (eg iCentral.dbo.GetSubUserGroups)
...

As I said yesterday in the comments the row numbering for rows with duplicate E.Email, Relationship values will be arbitrary.

To make it deterministic you would need to do PARTITION BY B.Email ORDER BY Relationship DESC, SomeUniqueColumn . Interesting that it changes between runs though using the same execution plan. I assume this is a consequence of the hash join.

I think your problem is the first row over the partition is not deterministic. I suspect that Email and Relationship is not unique.

    ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id

Later you examine the first row of the partition.

    WHERE   T.sequence_id = 1
        AND T.UserGroupId ...

If that first row is arbitrary then you are going to get an arbitrary where comparison. You need to add to the ORDER BY to include a complete unique key. If there is no unique key then you need to make one or live with arbitrary results. Even on a table with a clustered PK the select row order is not guaranteed unless the entire PK is in the sort clause.

This probably has to do with ordering. You have a sequence_id defined as a row_number ordered by Relationship. You'll always get a sensible order by relationship, but other than that your row_number will be random. So you can get different rows with sequence_id 1 each time. That in turn will affect your where clause, and you can get different numbers of results. To fix this to get a consistent result, add another field to your row_number's order by. Use a primary key to be certain of consistent results.

There's a recent KB that addresses problems with ROW_NUMBER() ... see FIX: You receive an incorrect result when you run a query that uses the row_number function in SQL Server 2008 for the details.

However this KB indicates that it's a problem when parallelism is invoked for execution, and looking at your execution plans I can't see this kicking in. But the fact that MS have found a problem with it in one situation makes me a little bit wary - i.e., could the same issue occur for a sufficiently complicated query (and your execution plan does look sufficiently large).

So it may be worth checking your patch levels of SQL Server 2008.

U Must only use

Order by

without prtition by.

ROW_NUMBER() OVER (ORDER BY Relationship DESC) AS sequence_id

Is there a way to get different results for the same SQL query if the data stays the same?

精彩评论

关注公众号

热门标签

图文推荐

Is there a way to get different results for the same SQL query if the data stays the same?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：