开发者

MySQL: limit results by a calculated step interval

开发者 https://www.devze.com 2023-03-10 21:23 出处:网络
I have a need to return a specific number of rows from a query within a given start and stop time at a dynamically calculated step interval.

I have a need to return a specific number of rows from a query within a given start and stop time at a dynamically calculated step interval.

I've kept it simple here with a table consisting of a unix timestamp and a corresponding integer value.

In my example, I need to have 200 rows returned with an INCLUSIVE start time of 1307455099 and and an INCLUSIVE end time of 1307462455.

Here's the current query I've developed so far. It uses the modulus of the total rows to calculate the step interval:

SELECT timestamp, value FROM soh_data
WHERE timestamp % (CAST((1307462455 - 1307455099)/200 AS SIGNED INTEGER)) = 0
AND timestamp BETWEEN 1307455099 AND 1307462455 
ORDER BY timestamp;

The first problem is that because I'm using a modulus, the start and end times aren't always inclusive (that's solvable with an extra query... I'm fine with that).

The second, and more difficult issue to开发者_如何学C tackle, is that the total rows returned in this case is only 196. In most queries, it's n-1.

FYI, this is on a MySQL database with millions of rows of data.

Any insights?


Since I'm fine with throwing away a few rows, but I'm not alright with too little data, I've come up with two different approaches.

First: I've decided to adapt my query to use FLOOR instead of CAST. In my example, the quotient of the division was 21.805. SQL rounded that up to 22. The right step interval for gathering more than 200 results was 21 (yielding 205 results). Using FLOOR will give me the step number of 21 I need. Unfortunately, I haven't fully tested this to ensure consistent results across larger sets:

SELECT DISTINCT timestamp FROM soh_data 
WHERE timestamp % (FLOOR((1307459460 - 1307455099)/200)) = 0 
AND timestamp BETWEEN 1307455099 AND 1307459460 
ORDER BY timestamp;

The more reliable solution is to pre-calculate the step in code. This way, I can zero in on the step programmatically. In the following example, I use Ruby for readability, but my ultimate solution will be coded in C++:

lower = 1307455099
upper = 1307459460

limit = 200
range = lower..upper
matches = 0
stepFactor = ((upper-1) - (lower+1))/limit

while (matches <= (limit - 2)) do
    matches = 0

    range.each { |ts| matches += 1 if (ts % stepFactor == 0) }

    stepFactor -= 1 # For the next attempt

    puts "Step factor = #{stepFactor+1}"
    puts "Matches = #{matches}"
end


The number of rows returned would depend entirely on how many timestamps match your condition, of course. Let's say your step value comes out to 2, so your modulo math boils down to 'only even numbered timestamps'. If by chance all items in your table have odd time stamps, then you're going to get 0 rows returned, even though there's (say) 500+ items within the time range.

If you need exactly 200, you'd probably be better off using LIMIT in some way.

0

精彩评论

暂无评论...
验证码 换一张
取 消