I am trying to build one procedure to take a large amount of data and create 5 range buckets to display the data. the buckets ranges will have to be set according to the result开发者_高级运维s.
Here is my existing SP
GO
/****** Object: StoredProcedure [dbo].[sp_GetRangeCounts] Script Date: 03/28/2010 19:50:45 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[sp_GetRangeCounts]
@idMenu int
AS
declare
@myMin decimal(19,2),
@myMax decimal(19,2),
@myDif decimal(19,2),
@range1 decimal(19,2),
@range2 decimal(19,2),
@range3 decimal(19,2),
@range4 decimal(19,2),
@range5 decimal(19,2),
@range6 decimal(19,2)
SELECT @myMin=Min(modelpropvalue), @myMax=Max(modelpropvalue)
FROM xmodelpropertyvalues where modelPropUnitDescriptionID=@idMenu
set @myDif=(@myMax-@myMin)/5
set @range1=@myMin
set @range2=@myMin+@myDif
set @range3=@range2+@myDif
set @range4=@range3+@myDif
set @range5=@range4+@myDif
set @range6=@range5+@myDif
select @myMin,@myMax,@myDif,@range1,@range2,@range3,@range4,@range5,@range6
select t.range as myRange, count(*) as myCount
from (
select case
when modelpropvalue between @range1 and @range2 then 'range1'
when modelpropvalue between @range2 and @range3 then 'range2'
when modelpropvalue between @range3 and @range4 then 'range3'
when modelpropvalue between @range4 and @range5 then 'range4'
when modelpropvalue between @range5 and @range6 then 'range5'
end as range
from xmodelpropertyvalues where modelpropunitDescriptionID=@idmenu) t
group by t.range order by t.range
This calculates the min and max value from my table, works out the difference between the two and creates 5 buckets. The problem is that if there are a small amount of very high (or very low) values then the buckets will appear very distorted - as in these results...
range1 2806
range2 296
range3 75
range5 1
Basically I want to rebuild the SP so it creates buckets with equal amounts of results in each. I have played around with some of the following approaches without quite nailing it...
SELECT modelpropvalue, NTILE(5) OVER (ORDER BY modelpropvalue) FROM xmodelpropertyvalues - this creates a new column with either 1,2,3,4 or 5 in it
ROW_NUMBER()OVER (ORDER BY modelpropvalue) between @range1 and @range2
ROW_NUMBER()OVER (ORDER BY modelpropvalue) between @range2 and @range3
- or maybe i could allocate every record a row number then divide into ranges from this?
You can use the ranking function ntile
to split a result set in equal parts. This example creates a table with values 1...100, and splits them in 5 ranges:
set nocount on
declare @t table (value int)
declare @i int
set @i = 0
while @i < 100
begin
insert into @t (value) values (@i)
set @i = @i + 1
end
select
NTILE(5) over (order by value) as range
, value
from @t
By using ntile
in a subquery, you can do groups and aggregate math on the ranges. For example, to print the sum of numbers 1..19, 20..39, 40..59, etc.
select range, SUM(value)
from (
select
NTILE(5) over (order by value) as range
, value
from @t
) sub
group by range
You can use top 20%
to get the first fifth of the records, that will get you the end of the first range:
select @range1 = max(modelpropvalue)
from (
select top 20% modelpropvalue
from xmodelpropertyvalues where modelPropUnitDescriptionID = @idMenu
order by modelpropvalue
) x
Then you can use that value to exclude the first range, and use top 25%
to get the next range:
select @range2 = max(modelpropvalue)
from (
select top 25% modelpropvalue
from xmodelpropertyvalues where modelPropUnitDescriptionID = @idMenu
where modelpropvalue > @range1
order by modelpropvalue
) x
And so on with 33.3333% and 50% for the third and fourth ranges.
Note that to get the correct count, you should not use between
. As both the start and end values are inclusive, you will count the edge values for both the range where it ends and the range where it starts.
select t.range as myRange, count(*) as myCount
from (
select case
when modelpropvalue <= @range1 then 'range1'
when modelpropvalue <= @range2 then 'range2'
when modelpropvalue <= @range3 then 'range3'
when modelpropvalue <= @range4 then 'range4'
else 'range5'
end as range
from xmodelpropertyvalues where modelpropunitDescriptionID=@idmenu) t
group by t.range order by t.range
(Well, actually the case
would protect you in this case as it would pick the first match, but if you would count them individually you would count some records twice. To get a single range you would exclude the first value and include the second: where modelpropvalue > @range1 and modelpropvalue <= @range2
.)
OK, I got this working using both approaches but the problem seems to be with my data. In certain areas I have huge amounts of records with the same value on the field I am querying (it is the weight of trucks and one particular model is extremely popular), so there is really no way to divide this up evenly!
I have decided to go back to the original stored procedure, which calculated the ranges simply by dividing min and max value by 5, as it ran much quicker. However, as this SP is run up to 8 times in the page (for weight, fuel capacity, engine size, etc.), I could do with speeding it up a bit, as it takes about 0.5 secs to run. Here is the full SP - any ideas on how I can optimize its speed would be greatly appreciated...
ALTER PROCEDURE [dbo].[sp_GetRangeCounts]
@idMenu int,
@myFilters varchar(5000),
@myStore int,
@myLabel varchar(50) OUTPUT,
@myUnit varchar(50) OUTPUT,
@range1 int OUTPUT,
@range2 int OUTPUT,
@range3 int OUTPUT,
@range4 int OUTPUT,
@range5 int OUTPUT,
@range6 int OUTPUT,
@range1count int OUTPUT,
@range2count int OUTPUT,
@range3count int OUTPUT,
@range4count int OUTPUT,
@range5count int OUTPUT
AS
declare
@myMin int,
@myMax int,
@myDif int
declare @myInfoTable table(
myMin integer,
myMax integer,
myLabel varchar(50),
myUnit varchar(50)
)
insert @myInfoTable (myMin,myMax,myLabel,myUnit) exec('SELECT Min(ConvertedValue) as myMin, Max(ConvertedValue) as myMax,unitDescriptionTrans as myLabel,unitUnitTransDescription as myUnit
FROM LenPropValsView where UnitDescriptionID='+@idMenu+' and xStoreID='+@myStore+@myFilters+' group by unitdescriptionTrans,unitUnitTransDescription')
select @myMin=myMin-1 from @myInfoTable
select @myMax=myMax+1 from @myInfoTable
select @myLabel=myLabel from @myInfoTable
select @myUnit=myUnit from @myInfoTable
set @myDif=(@myMax-@myMin)/5
set @range1=@myMin
set @range2=@myMin+@myDif
set @range3=@range2+@myDif
set @range4=@range3+@myDif
set @range5=@range4+@myDif
set @range6=@myMax
select @myLabel,@myUnit,@myMin,@myMax,@myDif,@range1,@range2,@range3,@range4,@range5,@range6
declare @myData table(
myRange varchar(50),
myCount integer
)
insert @myData(myRange,myCount)
exec ('select t.range as myRange, count(*) as myCount
from (
select case
when ConvertedValue <='+@range2+' then ''range1''
when ConvertedValue >'+@range2+' and ConvertedValue<='+@range3+' then ''range2''
when ConvertedValue >'+@range3+' and ConvertedValue<='+@range4+' then ''range3''
when ConvertedValue >'+@range4+' and ConvertedValue<='+@range5+' then ''range4''
else ''range5''
end as range
from LenPropValsView where unitDescriptionID='+@idmenu+' and xStoreID='+@myStore+@myFilters+') t
group by t.range order by t.range')
select @range1count=myCount from @myData where myRange='range1'
select @range2count=myCount from @myData where myRange='range2'
select @range3count=myCount from @myData where myRange='range3'
select @range4count=myCount from @myData where myRange='range4'
select @range5count=myCount from @myData where myRange='range5'
select @range1count,@range2count,@range3count,@range4count,@range5count
精彩评论