I have a question on why the output of these two queries differ. I would have expected them to work the same.
Query 1:
declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));
declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');
while @i1 <= ASCII('9')
begin
insert into @table1 (c)
select (CHAR(@i1))
set @i1 = @i1 +1;
end
insert into @cache (originalValue, obfuscateValue)
select [firstname],
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
from Customer
where [firstname] is not null
select * from @cache;
Query 2:
declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));
declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');
while @i1 <= ASCII('9')
begin
insert into @table1 (c)
select (CHAR(@i1))
set @i1 = @i1 +1;
end
insert into @cache (originalValue)
select [firstname]
from Customer
where [firstname] is not null
update c
set c.obfuscateValue = t.Value
from @cache c
join
(
select originalValue,
(
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
) as Value
from @cache
) t on t.originalValue = c.originalValue
select * from @cache;
They should do the same, but first query returns following results:
Jonathon 73
Everett 73
Janet 73
Andy 73
Shauna 73
And second:
Jonathon 82
Evere开发者_JS百科tt 40
Janet 68
Andy 79
Shauna 29
As you noticed, the second column in second result has different values, while first - same values.
It looks like in first query the
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
is called only once.
Can someone explain this mystery?
I think random values can be generated in another way.
This is how to generated [a-zA-Z]{3,6}
declare @min int, @max int;
declare @alpha varchar(max)
set @min = 3;
set @max = 6;
set @alpha = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));
insert into @cache (originalValue, obfuscateValue)
select [firstname], LEFT(t.Value, case when t.maxLen < @min then @min else t.maxLen end)
from Customer
join
(
select ABS(CHECKSUM(NEWID()))%@max + 1 as maxLen,
SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) as Value
)t on t.Value is not null
where [firstname] is not null
select * from @cache;
One line?
SELECT
RIGHT( --number of zeros to match expected max length. Or use REPLICATE.
'000000' + CAST(
--The 2 newid() expression means we'll get a larger number
--less chance of using leading static zeroes
CAST(CHECKSUM(NEWD_ID()) as bigint) * CAST(CHECKSUM(NEWD_ID()) as bigint)
as varchar(30))
--The 3 gives us the desired mask. Currently 3 digits.
, 3)
You are correct in your assumption that the first query is only running the ‘select top’ once. The behavior is happening because of how the optimizer chose to optimize the query. It decided because the subqueries (the select top queries) are self-contained and are not correlated with the outside select query it uses a Tablespool (Lazy Spool) operator in the execution plan. This causes the select top value to be placed in the tempdb for reuse.
Because the optimizer chooses to use a Nested Loops operator to bring all the data together no rebinding is needed, the spooled value is used instead of reapplying the queries for each input outer row.
During the second query the optimizer chose not to use a Tablespool operator (I believe do to fact the input table being from tempdb). So you have the select top subqueries being reapplied for each input row from the temporary table.
If needed, you may be able to use a table/query hints if you want to force the execution plan to perform as desired.
精彩评论