开发者

Best way to optimize SQL for Pivoting a table in Sybase Ase

开发者 https://www.devze.com 2023-01-27 07:52 出处:网络
I am looking to optimize the following SQL statement that is creating a Pivoted Result set from a Historical Table.This may already be the most per-formant way of doing it but I keep thinking that the

I am looking to optimize the following SQL statement that is creating a Pivoted Result set from a Historical Table. This may already be the most per-formant way of doing it but I keep thinking that there has to be a more per-formant way of doing this.

SQL Statement that I am trying to optimize

select Col1, Col2,
Max(case when TypeId = 1 then ColValue end) as Pivot1,
Max(case when TypeId = 2 then ColValue end) as Pivot2,
Max(case when TypeId = 3 then ColValue end) as Pivot3,
Max(case when TypeId = 4 then ColValue end) as Pivot4,
Max(case when TypeId = 5 then ColValue end) as Pivot5,
Max(ca开发者_开发问答se when TypeId = 6 then ColValue end) as Pivot6,
Max(case when TypeId = 7 then ColValue end) as Pivot7,
Max(case when TypeId = 8 then ColValue end) as Pivot8,
Max(case when TypeId = 9 then ColValue end) as Pivot9,
Max(case when TypeId = 10 then ColValue end) as Pivot10,
Max(case when TypeId = 11 then ColValue end) as Pivot11
from RowTable
group by Col1, Col2

UPDATE: Below is the table definition

CREATE TABLE dbo.RowTable  ( 
    Id                  int NOT NULL,
    Col1                char(8) NOT NULL,
    Col2                tinyint NOT NULL,
    TypeId              int NOT NULL,
    ColValue            datetime NOT NULL,
    CreatedBy           varchar(50) NOT NULL,
    Rowstamp            timestamp NOT NULL 
    )
LOCK DATAROWS
GO
ALTER TABLE dbo.RowTable
    ADD CONSTRAINT ukRowTable
    UNIQUE (Col1, Col2, TypeId)
    WITH max_rows_per_page = 0, reservepagegap = 0


Response to Original Question

1. Performance with Table As Is.
Well, before anyone can evaluate that code, perfomance-wise, we need the create table statement, including the indices.

  1. Higher Class of Performance.
    Pivoting is a function of expressing the data available in rows, in columns. If the database (table) is say, normalised to 3NF or 5NF, which is row-oriented, then performing columnar functions on row objects is going to be slow. Nothing to do with the product. If you want columnar access at speed (for Pivoting or any other columnar function), you need the data in 6NF. That also happens to make the SQL required for the task much more straight-forward.

    If your data modeller prepared the table for pivoting (typically data warehouse type usage; Dimension-Fact structure), then it may not be true 6NF, but at least it will be better than 5NF, and easier to extract Pivoted values. When I see the DDL, I will be able to determine what it is (true 6NF; better than 5NF but not 6NF). Then I can determine if you are using the best code to obtain what you need.

    It is only slow or "expensive" when the table is not in 6NF.

  2. At this stage, from your code, it does not even look like a Pivot (using the standard meaning of the term), it looks like a MAX()of various values (calling the resulting column Pivotx does not make it a Pivot); and you are reading every row, once. That is, you have a procedural mindset, not a Pivoting or set-oriented mindset. Therefore, the code is likely not to get the values you require (whether it performs well or not, is a separate issue).

    Your use of GROUP BY confirms the procedural approach to the non-procedural set, and that is going to be slow (creates worktables; which will be huge if your data is huge), and the same info can be obtained much faster via the Dimensions. Why don't you use the dimension tables for this pivotable table ? Post either the DDL for all Dimension tables that are related to this table, or the Data Model.

Response to Comments

I am trying to help you, but there are two obstacles. First, 19 days between interactions. Second, your posted SQL will not work: for each row, it returns the same ColValue in 11 columns; I cannot figure out the purpose of your use of MAX(). Ok, The MAX() is required to beat the GROUP BY into submission. Therefore I am still at a loss as to what you intend (not what you have coded). The obfuscation is fair enough, but here we have lost the meaning.

Yes, there are faster ways, but I need to understand the intent, and the parent tables (eg. do you have a table where (Col1, Col2) is Unique ? If it is a database, then tables do not stand alone, they are related, and the relations have some purpose. I realise you do not think they are relevant, but that limitation has produced the code you have posted; the solution is beyond that limitation.

Anyway, in order to avoid further delay, please try this code. It is just a guess, doesn't appear correct to me, because (Col1, Col2, TypeId) is Unique; therefore there will be only one set of TypeId (column heading in the result set) for each Col1, Col2 result row:

[Superceded, refer below]

And perhaps your can give me feedback on that.

Response to Updated Question

Ok, now we have one unnormalised table. New set of steps. It is a constructed result set, using Correlated Subqueries which return Scalars. It is not a re-arrangement of rows vs columns; it is not a standard Pivot (therefore the code provided is not a pivot). Dead easy. You may wish to change the Question Heading, because people are looking for a true Pivot. And yes, this will perform much better (assuming your DDL is a true representation of the real tables).

To be clear, a Pivot (ala MS SQLPIVOT function) is a different animal. I can provide an ugly-and-slow Pivot for an Unnormalised database; or a clean-but-slow Pivot from a 5NF database; or a clean-and-fast Pivot from a 6NF database. This is not it.

  1. Let's assume it is a Relational Database. Given the DDL provided, there would be a ParentTable in which (Col1, Col2) is Unique.

  2. The code:

    SELECT  Col1, 
        Col2,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 1 ) as Latest_1,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 2 ) as Latest_2,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 3 ) as Latest_3,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 4 ) as Latest_4,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 5 ) as Latest_5,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 6 ) as Latest_6,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 7 ) as Latest_7,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 8 ) as Latest_8,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 9 ) as Latest_9,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId=10 ) as Latest_10,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId=11 ) as Latest_11
    FROM ParentTable  OUTER

  3. If there is not a ParentTable (ie. it is not a Relational Database), create one on the fly with SELECT-INTO, or use a Derived Table:

    SELECT  Col1, 
        Col2,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 1 ) as Latest_1,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 2 ) as Latest_2,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 3 ) as Latest_3,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 4 ) as Latest_4,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 5 ) as Latest_5,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 6 ) as Latest_6,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 7 ) as Latest_7,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 8 ) as Latest_8,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId= 9 ) as Latest_9,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId=10 ) as Latest_10,
        ( SELECT ColValue FROM RowTable WHERE Col1=OUTER.Col1 AND Col2=OUTER.Col AND TypeId=11 ) as Latest_11
    FROM (
        SELECT DISTINCT
                Col1,
                Col2
            FROM RowTable
        )  OUTER

  4. You can get rid of the Id column in RowTable, it is a 100% redundant column and index, that serves no purpose.


Pivoting is an inherently expensive operation. I don't think that this can be optimized.

0

精彩评论

暂无评论...
验证码 换一张
取 消