I've written two simple calculations with Ruby which match the way that Microsoft Excel calculates the upper and lower quartiles for a given set of data - which is not the same as the generally accepted method (surprise).
My question is - how much and how best can these methods be refactored for maximum DRYness?
# Return an upper quartile value on the same basis as Microsoft Excel (Freund+Perles method) def excel_upper_qua开发者_如何学编程rtile(array) return nil if array.empty? sorted_array = array.sort u = (0.25*(3*sorted_array.length+1)) if (u-u.truncate).is_a?(Integer) return sorted_array[(u-u.truncate)-1] else sample = sorted_array[u.truncate.abs-1] sample1 = sorted_array[(u.truncate.abs)] return sample+((sample1-sample)*(u-u.truncate)) end end # Return a lower quartile value on the same basis as Microsoft Excel (Freund+Perles method) def excel_lower_quartile(array) return nil if array.empty? sorted_array = array.sort u = (0.25*(sorted_array.length+3)) if (u-u.truncate).is_a?(Integer) return sorted_array[(u-u.truncate)-1] else sample = sorted_array[u.truncate.abs-1] sample1 = sorted_array[(u.truncate.abs)] return sample+((sample1-sample)*(u-u.truncate)) end end
I'll start by generalizing a little and provide one method to handle both instances.
def excel_quartile(array, quartile)
# Returns nil if array is empty and covers the case of array.length == 1
return array.first if array.length <= 1
sorted = array.sort
# The 4th quartile is always the last element in the sorted list.
return sorted.last if quartile == 4
# Source: http://mathworld.wolfram.com/Quartile.html
quartile_position = 0.25 * (quartile*sorted.length + 4 - quartile)
quartile_int = quartile_position.to_i
lower = sorted[quartile_int - 1]
upper = sorted[quartile_int]
lower + (upper - lower) * (quartile_position - quartile_int)
end
Then you can make convenience methods of:
def excel_lower_quartile(array)
excel_quartile(array, 1)
end
def excel_upper_quartile(array)
excel_quartile(array, 3)
end
Note: the excel_quartile
method matches expectations for quartile in { 1, 2, 3, 4}
. Anything else, I guarantee failure.
Update:
The formula I used is not expressly given at the website I cited, but it is the abstraction for the Freund and Perles method of calculating the quartile position.
Further update:
There is an error in your original code, though you should never encounter it: u - u.trunc
is always within the interval [0.0, 1.0), thus the only time it would resemble an integer is when u - u.trunc = 0
. However, (u - u.trunc)
is still an instance of a Float whenever u
is a Float, so your code never happens upon the miscalculated index. Incidentally, if u - u.trunc were an integer, your method would return the last element of the array.
Some might disagree on the refactoring, but here's how I'd handle it:
def excel_quartile(extreme,array)
return nil if array.empty?
sorted_array = array.sort
u = case extreme
when :upper then 3 * sorted_array.length + 1
when :lower then sorted_array.length + 3
else raise "ArgumentError"
end
u *= 0.25
if (u-u.truncate).is_a?(Integer)
return sorted_array[(u-u.truncate)-1]
else
sample = sorted_array[u.truncate.abs-1]
sample1 = sorted_array[(u.truncate.abs)]
return sample+((sample1-sample)*(u-u.truncate))
end
end
def excel_upper_quartile(array)
excel_quartile(:upper, array)
end
def excel_lower_quartile(array)
excel_quartile(:lower, array)
end
精彩评论