SELECT TerritoryId,
[2003-01],[2003-02],[2003-03],[2003-04],[2003-05],[2003-06],
[2003-07],[2003-08],[2003-09],[2003-10],[2003-11],[2003-12],
[2004-01],[2004-02],[2004-03],[2004-04],[2004-05],[2004-06],
[2004-07]
from ( select TerritoryId,
CONVERT(CHAR(7), h.OrderDate, 120) AS theDate, d.LineTotal
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
) p
pivot (
sum(LineTotal) for theDate in ([2003-01],[2003-02],[2003-03],[2003-04],[2003-05],[2003-06],
[2003-07],[2003-08],[2003-09],[2003-10],[2003-11],[2003-12],
[2004-01],[2004-02],[2004-03],[2004-04],[2004-05],[2004-06],[2004-07])
) pvt
order by TerritoryId
The code is based on a sample from Pro SQL2005 (Apress). The book is great BTW.
While modifying it for a testdrive I think I made an error but where. Inner select executes in less than a second. However applying the pivot it takes 5-6 seconds.
Total row count is around 121K. All it does with PIVOT is to sum and crosstab. If I do this taking raw data from inner select outside and do pivoting myself it's much faster. So is it my code here or PIVOT is really an unoptimized addition?
TIA
Some more statistics on this process. If execute this pivot and then just inner select (after a few times) here is client statistics from SSMS:
*********************************************************************
Client Execution Time 18:32:27 18:31:33
Query Profile Statistics
Number of INSERT, DELETE and UPDATE statements 0 0 0.0000
Rows affected by INSERT, DELETE, or UPDATE statements 0 0 0.0000
Number of SELECT statements 2 2 2.0000
Rows returned by SELECT statements 121318 11 60664.5000
Number of transactions 0 0 0.0000
Network Statistics
Number of server roundtrips 3 3 3.0000
TDS packets sent from client 3 3 3.0000
TDS packets received from server 635 41 338.0000
Bytes sent from client 538 1528 1033.0000
Bytes received from server 2591244 157727 1374486.0000
Time Statistics
Client processing time 375 0 187.5000
Total execution time 390 5187 2788.5000
Wait time on server replies 15 5187 2601.0000
*********************************************************************
To summarize it, SQL server selects the raw data in 390 milliseconds then spends considerable time on pivoting for a total time of 5187 milliseconds.
Here is the timings if I instead just ask for the inner select data (121317 rows) via ODBC, sum and pivot myself externally, again creating the same result table:
For same period range:
Got data-elapsed:457
XTabbed-total elapsed:729
For all 37 months:
Got data-elapsed:455
XTabbed-total elapsed:741
Timings are in milliseconds.
I believe I'm using PIVOT syntax wrong.
|||I don't see that much performance difference between PIVOT and custom query. The PIVOT operator translates to pretty much same expressions. There is little bit additional cost but not significant. And the IO is identical. I tried this on SQL Server 2005 build# 2153 (with cumulative hotfix for SP1).
use AdventureWorks
go
set statistics io on
set statistics time on
go
SELECT TerritoryID,
[2003-01],[2003-02],[2003-03],[2003-04],[2003-05],[2003-06],
[2003-07],[2003-08],[2003-09],[2003-10],[2003-11],[2003-12],
[2004-01],[2004-02],[2004-03],[2004-04],[2004-05],[2004-06],
[2004-07]
from ( select TerritoryID,
CONVERT(CHAR(7), h.OrderDate, 120) AS theDate, d.LineTotal
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
) p
pivot (
sum(LineTotal) for theDate in ([2003-01],[2003-02],[2003-03],[2003-04],[2003-05],[2003-06],
[2003-07],[2003-08],[2003-09],[2003-10],[2003-11],[2003-12],
[2004-01],[2004-02],[2004-03],[2004-04],[2004-05],[2004-06],[2004-07])
) pvt
order by TerritoryID;
SELECT TerritoryID
,sum(case theDate when '2003-01' then LineTotal end) as [2003-01]
,sum(case theDate when '2003-02' then LineTotal end) as [2003-02]
,sum(case theDate when '2003-03' then LineTotal end) as [2003-03]
,sum(case theDate when '2003-04' then LineTotal end) as [2003-04]
,sum(case theDate when '2003-05' then LineTotal end) as [2003-05]
,sum(case theDate when '2003-06' then LineTotal end) as [2003-06]
,sum(case theDate when '2003-07' then LineTotal end) as [2003-07]
,sum(case theDate when '2003-08' then LineTotal end) as [2003-08]
,sum(case theDate when '2003-09' then LineTotal end) as [2003-09]
,sum(case theDate when '2003-10' then LineTotal end) as [2003-10]
,sum(case theDate when '2003-11' then LineTotal end) as [2003-11]
,sum(case theDate when '2003-12' then LineTotal end) as [2003-12]
,sum(case theDate when '2004-01' then LineTotal end) as [2004-01]
,sum(case theDate when '2004-02' then LineTotal end) as [2004-02]
,sum(case theDate when '2004-03' then LineTotal end) as [2004-03]
,sum(case theDate when '2004-04' then LineTotal end) as [2004-04]
,sum(case theDate when '2004-05' then LineTotal end) as [2004-05]
,sum(case theDate when '2004-06' then LineTotal end) as [2004-06]
,sum(case theDate when '2004-07' then LineTotal end) as [2004-07]
from ( select TerritoryID,
CONVERT(CHAR(7), h.OrderDate, 120) AS theDate, d.LineTotal
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
) p
group by TerritoryID
order by TerritoryID;
go
set statistics io off
set statistics time off
go
|||Thanks for trying. I thought my PIVOT syntax was wrong.
The point is that I really didn't expect PIVOT to be only a syntax enhancement over all those many sum(case ....) coding but as "how it should be done" logic as well. In other words I expected it would create a more clever algorithm behind the scenes and return the results fast. I'm not good at T-SQL but even with my limited knowledge check this one:
use AdventureWorks
go
with myPivot as
(
select territoryID, theDate, sum(LineTotal) as LineTotal
from ( select TerritoryID,
CONVERT(CHAR(7), h.OrderDate, 120) AS theDate, d.LineTotal
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
) p
group by TerritoryID, theDate
)
SELECT TerritoryID
,sum(case theDate when '2003-01' then LineTotal end) as [2003-01]
,sum(case theDate when '2003-02' then LineTotal end) as [2003-02]
,sum(case theDate when '2003-03' then LineTotal end) as [2003-03]
,sum(case theDate when '2003-04' then LineTotal end) as [2003-04]
,sum(case theDate when '2003-05' then LineTotal end) as [2003-05]
,sum(case theDate when '2003-06' then LineTotal end) as [2003-06]
,sum(case theDate when '2003-07' then LineTotal end) as [2003-07]
,sum(case theDate when '2003-08' then LineTotal end) as [2003-08]
,sum(case theDate when '2003-09' then LineTotal end) as [2003-09]
,sum(case theDate when '2003-10' then LineTotal end) as [2003-10]
,sum(case theDate when '2003-11' then LineTotal end) as [2003-11]
,sum(case theDate when '2003-12' then LineTotal end) as [2003-12]
,sum(case theDate when '2004-01' then LineTotal end) as [2004-01]
,sum(case theDate when '2004-02' then LineTotal end) as [2004-02]
,sum(case theDate when '2004-03' then LineTotal end) as [2004-03]
,sum(case theDate when '2004-04' then LineTotal end) as [2004-04]
,sum(case theDate when '2004-05' then LineTotal end) as [2004-05]
,sum(case theDate when '2004-06' then LineTotal end) as [2004-06]
,sum(case theDate when '2004-07' then LineTotal end) as [2004-07]
from myPivot
group by TerritoryID
order by TerritoryID
go
Or this one (that I based on how you wrote the comparison T-SQL):
with myPivot as
( select territoryID, theDate, sum(LineTotal) as LineTotal
from (
select TerritoryID,
CONVERT(CHAR(7), h.OrderDate, 120) AS theDate, d.LineTotal
FROM Sales.SalesOrderHeader h JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
) p
group by TerritoryID, theDate
)
SELECT TerritoryID,
[2003-01],[2003-02],[2003-03],[2003-04],[2003-05],[2003-06],
[2003-07],[2003-08],[2003-09],[2003-10],[2003-11],[2003-12],
[2004-01],[2004-02],[2004-03],[2004-04],[2004-05],[2004-06],
[2004-07]
from myPivot
pivot (
sum(LineTotal) for theDate in ([2003-01],[2003-02],[2003-03],[2003-04],[2003-05],[2003-06],
[2003-07],[2003-08],[2003-09],[2003-10],[2003-11],[2003-12],
[2004-01],[2004-02],[2004-03],[2004-04],[2004-05],[2004-06],[2004-07])
) pvt
order by TerritoryID;
They both now have closer timing to my "get the raw data via ODBC and pivot yourself" approach (those 2 execute in 511 and 500 milliseconds vs to my 470 milliseconds which I accept to be the same). That leaded me to think why I need to specify the aggregate in PIVOT if I have to do aggregation myself outside for it to perform faster. Actually I think it would be way cooler if PIVOT didn't need an aggregate function but simply made a row to columns transposion.
However I checked my version and it's built 1399. Maybe it was corrected in 2153. I'll update and retest.
|||I applied SP1 and now my version reads 2047 (not 2153).
Microsoft SQL Server Management Studio 9.00.2047.00
Microsoft Analysis Services Client Tools 2005.090.2047.00
Microsoft Data Access Components (MDAC) 2000.085.1117.00 (xpsp_sp2_rtm.040803-2158)
Microsoft MSXML 2.6 3.0 4.0 5.0 6.0
Microsoft Internet Explorer 6.0.2900.2180
Microsoft .NET Framework 2.0.50727.42
Operating System 5.1.2600
No improvement with this one. Probably pivot was only meant to be a somewhat less typing syntax. OK I was concerned if my syntax was the culprit.
|||Yes, currently PIVOT only provides syntax level simplification. It does pretty much the same as hand coded SQL statement. However, this may change in a future version of SQL Server. So if you have a situation where you can use PIVOT then use it. Currently, you cannot pivot on multiple expressions so for that you can go the traditional SQL route.|||Thank you very much. Knowing how it works is sufficient:)|||BTW somewhere I saw you were collecting enhancement feature ideas. Even if the PIVOT stays same I'd like a version that looks like:
-- rowColumnList is typically group by columns (like TerritoryID) except the one that would be new columns (like theDate)
PIVOT ( (rowColumnList) (valueColumn) for filterColumn in (filter_columnNameList) )
and/or:
-- Same as above but doesn't need a columnnames list ( all unique values of theDate would be new columns)
PIVOT ( (rowColumnList) (valueColumn) on (columnNamesColumn) )
with no aggregation function in pivot_clause. PIVOT then merely do a row to column transposition only. Aggregation would be in select that pivot uses if need be (so users would not need tricks like max() on character/datetime data when there is really nothing to aggregate).
Thanks.
No comments:
Post a Comment