开发者

Why doesn't this sum of percentages add up to 100%?

开发者 https://www.devze.com 2023-03-18 04:09 出处:网络
I have a series of calculation times in a DB2 SQL DB that are stored as float with a default value of 0.0.

I have a series of calculation times in a DB2 SQL DB that are stored as float with a default value of 0.0.

The table being updated is as follows:

CREATE TABLE MY_CALC_DATA_TABLE
(
    CALCDATE                 TIMESTAMP,
    INDIV_CALC_DURATION_IN_S FLOAT WITH DEFAULT 0.0,
    CALC_TIME_PERCENTAGE     FLOAT WITH DEFAULT 0.0
)

Using a sproc. I am calculating the sum as follows:

CREATE OR REPLACE PROCEDURE MY_SCHEMA.MY_SPROC (IN P_DATE TIMESTAMP)
    LANGUAGE SQL
    NO EXTERNAL ACTION
BEGIN
    DECLARE V_TOTAL_CALC_TIME_IN_S FLOAT DEFAULT 0.0;

    -- other stuff setting up and joining data

    -- Calculate the total time taken to perform the
    -- individual calculations

    SET V_TOTAL_CALC_TIME_IN_S =
        (
         开发者_运维知识库   SELECT
                SUM(C.INDIV_CALC_DURATION_IN_S)
            FROM
                MY_SCHEMA.MY_CALC_DATA_TABLE C
            WHERE
                C.CALCDATE = P_DATE
        )

    -- Now calculate each individual calculation's percentage
    -- of the toal time.

    UPDATE
        MY_SCHEMA.MY_CALC_DATA_TABLE C
    SET
        C.CALC_TIME_PERCENTAGE =
            (C.INDIV_CALC_DURATION_IN_S / V_TOTAL_CALC_TIME_IN_S) * 100
    WHERE
        C.CALCDATE = P_DATE;

END@

Trouble is, when I do a sum of all the CALC_TIME_PERCENTAGE values for the specified CALC_DATE it is always less than 100% with the sum being values like 80% or 70% for different CALC_DATES.

We are talking between 35k and 55k calculations here with the maximum individual calculation's percentage of the total, as calculated above, being 11% and lots of calculations in the 0.00000N% range.

To calculate the total percentage I am using the simple query:

SELECT
    SUM(C.CALC_TIME_PERCENTAGE)
FROM
    MY_SCHEMA.MY_CALC_DATA_TABLE C
WHERE
    C.CALCDATE = P_DATE;

Any suggestions?

Update: Rearranging the calc. as suggested fixed the problem. Thanks. BTW In DB2 FLOAT and DOUBLE are the same type. And now to read that suggested paper on floats.


If the field C.INDIV_CALC_DURATION_IN_S were Integer, I would assume it's a rounding error. Reading again, that is not the problem as the datatype is FLOAT.

You can still try using this. I wouldn't be surprised if this yielded (slighly) different results than the previous method:

SET
    C.CALC_TIME_PERCENTAGE =
        (C.INDIV_CALC_DURATION_IN_S * 100.0 / V_TOTAL_CALC_TIME_IN_S)

But you mention that there a lot of rows in a calculation for a certain date, so it may be a rounding error due to that. Try with DOUBLE datatype in both fields (or at least the CALC_TIME_PERCENTAGE field) and see if the difference from 100% gets smaller.

I'm not sure if DB2 has DECIMAL(x,y) datatype. It may be more appropriate in this case.


Another problem is how you find the sum of CALC_TIME_PERCENTAGE. I suppose you (and everyone else) would use the:

        SELECT
            P_DATE, SUM(CALC_TIME_PERCENTAGE)
        FROM
            MY_SCHEMA.MY_CALC_DATA_TABLE C
        GROUP BY P_DATE

This way, you have no way to determine in what order the summation will be done. It may not be even possible to determine that but you can try:

        SELECT
            P_DATE, SUM(CALC_TIME_PERCENTAGE)
        FROM
          ( SELECT
                P_DATE, CALC_TIME_PERCENTAGE
            FROM
                MY_SCHEMA.MY_CALC_DATA_TABLE C
            ORDER BY P_DATE
                   , CALC_TIME_PERCENTAGE ASC
          ) AS tmp
        GROUP BY P_DATE

The optimizer may disregard the interior ORDER BY but it's worth a shot.


Another possibility for this big difference is that rows are deleted from the table between the UPDATE and the SHOW percent SUM operations.

You can test if that happens by running the calculations (without UPDATE) and summing up:

        SELECT
            P_DATE
          , SUM( INDIV_CALC_DURATION_IN_S * 100.0 / T.TOTAL )                   
            AS PERCENT_SUM
        FROM
            MY_SCHEMA.MY_CALC_DATA_TABLE C
          , ( SELECT SUM(INDIV_CALC_DURATION_IN_S) AS TOTAL
              FROM MY_SCHEMA.MY_CALC_DATA_TABLE
            ) AS TMP
        GROUP BY P_DATE


Might be a rounding problem. Try C.INDIV_CALC_DURATION_IN_S * 100 / V_TOTAL_CALC_TIME_IN_S instead.


If C.INDIV_CALC_DURATION_IN_S is very small but you have a large number of rows (and thus V_TOTAL_CALC_TIME_IN_S becomes large in comparison) then

(C.INDIV_CALC_DURATION_IN_S / V_TOTAL_CALC_TIME_IN_S) * 100

is likely to lose precision, especially if you're using FLOATs.

If this is the case, then changing the calculation (as mentioned elsewhere) to

(C.INDIV_CALC_DURATION_IN_S * 100) / V_TOTAL_CALC_TIME_IN_S 

should increase the total, although it may not get you all the way to 100%

If that's the case and a lot of the measurements are small fractions of a second, I'd consider looking beyond this procedure: could the times be recorded in, say, milli- or micro-seconds? Either would give you some headroom for additional significant digits.

0

精彩评论

暂无评论...
验证码 换一张
取 消