开发者

SQL Server 2008 and "with" and "unpivot" keywords

开发者 https://www.devze.com 2023-04-01 14:32 出处:网络
We work with some very large databases (300Gb - 1Tb). Tables can contain from 10M to 5B records. We do some not very complex data transformation 开发者_C百科involving some with and unpivot statements.

We work with some very large databases (300Gb - 1Tb). Tables can contain from 10M to 5B records. We do some not very complex data transformation 开发者_C百科involving some with and unpivot statements. The problem is that the data log file and tempdb grows huge and eventually server stops working.

Now I'm leaning to an idea that with and even unpivot constructions are expensive in terms of resource usage and we should consider some simplifications here:

  • splitting into several steps with temp tables instead of using with
  • using union instead of unpivot

Does anybody have experience like this?


UNION is much faster than UNPivot, also you can periodically delete the log file programmatically, if you do not need it often. The performance of PIVOT will also increase if you do not use join(in selection criteria), instead you can use union . these links will help(especially the second link: http://www.sql-server-performance.com/2007/ssis-pivot-unpivot/ http://www.cs.toronto.edu/vldb04/protected/eProceedings/contents/pdf/IND1P2.PDF http://www.fotia.co.uk/fotia/DY.17.TheUnpivotOperator.aspx take care :)


Sincere thanks to everyone. Now its pretty obvious for me that it was UNPIVOT misusage. Indeed, CTEs are just views, so they don't hurt that much unless you use them improperly.

So the basic of our problem was that our server (32 Gb RAM, 8 CPUs, 2Tb HDD) was simply unable to manage a big amount of records that UNPIVOT produced.

Let's say we have HugeTable with fields (F1, F2, F3, F4, F5, F6). RecordCount = 1,000,000,000

We use it this way (pseudocode):

with tmp as (select unpivot HugeTable)
select * from tmp
join ...
where FX is not null
and ...

The query plan estimates that our UNPIVOT produces 6,000,000,000 records to be processed by our where clause. It becomes even worse with the fact that in reality we join some additional tables and do extra filterings. All this occurs 6 billion times. The transaction log and tempdb were still untouched - rather small. I've found no information that UNPIVOT/JOINS(hashjoins to be presize) uses RAM only to manage its operations but from what we experienced I understand, that our SQL Server 2008 R2 Enterprise was simply trying to fit that bulk recordset in RAM, but as we didn't have 1Tb RAM the operating system was doing huge swapping operations.

The interesting thing here is that it may start up very quickly and process about 1,800,000,000 records for first 6 hours, but then hangs (well, it produces 100K records per 24 hours, which is not acceptable at all)

If we turn it into manual UNION ALL like this:

with tmp as (
  select F1 from HugeTable where F1 is not null
  union all select F2 from HugeTable where F2 is not null
  union all select F3 from HugeTable where F3 is not null
  union all select F4 from HugeTable where F4 is not null
  union all select F5 from HugeTable where F5 is not null
  union all select F6 from HugeTable where F6 is not null
)
select ... from tmp
join ...
where ...

the query plan showed that CTE produced about 2 billion records. So all further joins had to be done against much smaller recordset than in 1st case. This took less than 10 hours to do the job (against days in 1st case).

BTW, we use SSIS/VS2008 environment to process our data loadings.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号