开发者

Choose a preferred row out of partially duplicate data

开发者 https://www.devze.com 2023-03-27 13:38 出处:网络
I have the following query: select mb.id as meter_id ,ds.mydate as mydate ,mb.name as metergroup ,sum(ms.stand) as measured_cum_value

I have the following query:

select 
  mb.id as meter_id
  ,ds.mydate as mydate
  ,mb.name as metergroup
  ,sum(ms.stand) as measured_cum_value 
  ,me.name as energy_medium
  ,e.name as unit_of_measure
  ,min(ms.source) as source
  ,count(*) as debugcount
FROM datumselect ds                            <<-- mem table with dates to query.
INNER JOIN metergroup mb ON (mb.building_id = 1)   
INNER JOIN meter m ON (m.metergroup_id = mb.id)  <<-- meters are grouped
INNER JOIN medium me ON (me.id = mb.medium_id)   <<-- lookuptables for normalization
INNER JOIN unit e ON (e.id = mb.unit_id)         <<-- ditto
INNER JOIN meterstand ms ON (ms.meter_id = m.id AND ms.mydate = ds.mydate)
group by ds.mydate, mb.id, ms.source  <<-- this is prob. broken.
having source = MIN(ms.source)   <<-- this `having` does not work !
ORDER BY mb.id, ds.mydate 

I'm selecting from the following table:

CREATE TABLE meterstand(
  id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  meter_id INT(11) UNSIGNED NOT NULL,
  mydate DATETIME NOT NULL,
  stand DECIMAL(16, 5) NOT NULL,
  source ENUM('calculated', 'read', 'manual') NOT NULL DEFAULT 'read',
  PRIMARY KEY (id),
  INDEX FK_meterstand_meter_id (meter_id),
  UNIQUE INDEX UK_meterstand (datum, meter_id, bron),
  CONSTRAINT FK_meterstand_meter_id FOREIGN KEY (meter_id)
  REFERENCES vaanstermeters.meter (id) ON DELETE RESTRICT ON UPDATE CASCADE
)
ENGINE = INNODB
开发者_如何学运维AUTO_INCREMENT = 181
AVG_ROW_LENGTH = 105
CHARACTER SET latin1
COLLATE latin1_swedish_ci;

A simpler query given the below data would be:

SELECT 
  meter_id
  , mydate
  , sum(stand)
  , count(*) as debugcount
FROM meterstand
WHERE mydate IN (list_of_dates_im_interested_in)
GROUP BY meter_id, my_date
HAVING the_best(source) 

Given the current data debugcount should be 1 always, but if there are multiple meter's in a group in the above query debugcount should be the number of meters in the group.

I can choose between values from different source, I have:

- manual source, this is golden;

- read sources from a datasource, a meter in a building somewhere;

- calculated data, interpolated to make up for missing data.

A single datapoint having the same meter_id+mydate can have multiple sources.

The query should favour manual sources over read and only select calculated data if no other data is available.

Here is a sample of the data in meterstand:

id  meter_id mydate stand       source
------------------------------------------------------
179 6   1-12-2010   94,75886    calculated
180 7   1-12-2010   256,02618   calculated
164 7   1-1-2011    285,41800   manual <<--- Query should only consider this row.
183 7   1-1-2011    0,00000     read   <<-- and forget about this one

What's the proper query syntax to use to select the best data points?


From the looks of it, MySQL defines sort ordering for enums as being the order in which they were listed in the definition. Given that you've defined the order as the reverse of which they are to appear, I believe the following will work as expected (no instance to test against, though):

SELECT * 
FROM meterstand as a
JOIN (SELECT meter_id, mydate, MAX(source) as source
      FROM meterstand
      GROUP BY meter_id, mydate) as b
ON b.meter_id = a.meter_id
AND b.mydate = a.mydate
AND b.source = a.source

(Assuming that [meter_id, mydate, source] is unique, of course).

It does look like there was a bug at one point that was causing the enums to be sorted by their string values (which won't help you at all, given the strings).
If it sill exists (or you want a little more control over use-order), you may want to define a table:

Meter_Reading_Type
========================
Id   Description   Priority
1    Manual        10
2    Calculated    30
3    Read          20

Then reference it as a fk and sort by (min) priority.

0

精彩评论

暂无评论...
验证码 换一张
取 消