Thread: simple LEFT JOIN giving wrong results ...
I've got to be missing something obvious here ... I have two tables, on containing traffic stats, the other disk usage ... I want to do a JOIN in one query to give me the sum of traffic and average of storage ... seperately, the results are right .. put together, traffic values are way off, while storage is still correct ... So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to what/how .. :( ams=# select ct.ip_id, sum(ct.bytes) as traffic, avg(cs.bytes)::bigint as storage from company_00186.trafficct left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND month_trunc(cs.runtime) = '2003-12-01') where month_trunc(ct.runtime) = '2003-12-01' group by ct.ip_id;ip_id| traffic | storage -------+--------------+------------- 1194 | 76761728 | 1839676259 1226 | 5744576925 | 1134 | 17042528 | 247945531089 | 311779796360 | 10814211187 1200 | 82535202840 | 3165073628 1088 | 1969333472 | 2119206061 1227 | 44816947957| 4891683299 1179 | 3867502285 | (8 rows) where, individually, the results should be: ams=# select ip_id, avg(bytes)::bigint from company_00186.storage where month_trunc(runtime) = '2003-12-01' group byip_id;ip_id | avg -------+------------- 1227 | 4891683299 1255 | 0 1134 | 24794553 1194 | 1839676259 1089 | 10814211187 1088| 2119206061 1200 | 3165073628 (7 rows) and ams=# select ip_id, sum(bytes) from company_00186.traffic where month_trunc(runtime) = '2003-12-01' group by ip_id;ip_id| sum -------+------------- 1194 | 9595216 1226 | 5744576925 1134 | 2130316 1089 | 38972474545 1200 | 10316900355 1088| 246166684 1227 | 44816947957 1179 | 3867502285 (8 rows) the storage/avg values come out right in the JOIN, but the traffic/sum values get royally screwed ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@hub.org> writes: > So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to > what/how .. :( > ams=# select ct.ip_id, sum(ct.bytes) as traffic, > avg(cs.bytes)::bigint as storage > from company_00186.traffic ct > left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND > month_trunc(cs.runtime) = '2003-12-01') > where month_trunc(ct.runtime) = '2003-12-01' group by ct.ip_id; I suspect you do not want the month_trunc constraint to be included in the JOIN ON condition, only in WHERE. regards, tom lane
On Thu, 11 Dec 2003, Tom Lane wrote: > "Marc G. Fournier" <scrappy@hub.org> writes: > > So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to > > what/how .. :( > > > ams=# select ct.ip_id, sum(ct.bytes) as traffic, > > avg(cs.bytes)::bigint as storage > > from company_00186.traffic ct > > left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND > > month_trunc(cs.runtime) = '2003-12-01') > > where month_trunc(ct.runtime) = '2003-12-01' group by ct.ip_id; > > I suspect you do not want the month_trunc constraint to be included > in the JOIN ON condition, only in WHERE. 'k, but then would that take in all storage for all dates, since I'm only then joining on the ip_id? right now, I only have storage #s for Dec, so it wouldn't make any differences for this one, but .. results are still way off though, even with removing it: ip_id | traffic | storage -------+--------------+------------- 1088 | 1979325872 | 2119206061 1200 | 84004842024 | 3165073628 1227 | 45591571353| 4891683299 1179 | 3893192839 | 1194 | 77360968 | 1839676259 1134 | 17357504 | 24794553 1226 | 5836213601 | 1089 | 315424415080 | 10814211187 (8 rows) By changing the query to: ams=# select ip_id, sum(bytes), (select avg(bytes) from company_00186.storage cs wheremonth_trunc(runtime) = '2003-12-01' and cs.ip_id = ct.ip_id)::bigint as storage from company_00186.trafficct where month_trunc(runtime) = '2003-12-01' group by ip_id;ip_id | sum | storage -------+-------------+------------- 1194 | 9670121 | 1839676259 1134 | 2169688 | 24794553 1226 | 5836213601| 1089 | 39428051885 | 10814211187 1088 | 247415734 | 2119206061 1200 | 10500605253 | 3165073628 1227 | 45591571353| 4891683299 1179 | 3893192839 | (8 rows) I can get the right results again, it jus doesn't seem as clean ;( ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Thu, 11 Dec 2003, Marc G. Fournier wrote: > > I've got to be missing something obvious here ... I have two tables, on > containing traffic stats, the other disk usage ... I want to do a JOIN in > one query to give me the sum of traffic and average of storage ... > seperately, the results are right .. put together, traffic values are way > off, while storage is still correct ... > > So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to > what/how .. :( > > ams=# select ct.ip_id, sum(ct.bytes) as traffic, > avg(cs.bytes)::bigint as storage > from company_00186.traffic ct > left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND > month_trunc(cs.runtime) = '2003-12-01') > where month_trunc(ct.runtime) = '2003-12-01' group by ct.ip_id; If there are multiple cs rows that match a particular ct row with those constraints you're going to be getting a multiple of the ct.bytes value with sum because you're going to have an extra ct.bytes for each matching cs row. If you had two traffic rows with ip_id 1 with bytes 10 and 5 and two storage rows with ip_id 1 with bytes 5 and 3 (and assuming that they both are in the right month), the join should give you a set like: ((ip_id=1, ct.bytes=10, cs.bytes=5), (ip_id=1, ct.bytes=10, cs.bytes=3), (ip_id=1, ct.bytes=5, cs.bytes=5), (ip_id=1, ct.bytes=5, cs.bytes=3)) I don't think a join between the two tables is what you really want. Maybe a join between the two group by querys, something like: select ip_id, traffic, storagefrom (select ip_id, sum(ct.bytes) as traffic from company_00186.traffic ct where month_trunc(runtime)='2003-12-01')a left outer join(select ip_id, avg(ct.bytes) as storage from company_00186.storage ctwhere month_trunc(runtime)='2003-12-01') b using (ip_id);