Thread: simple LEFT JOIN giving wrong results ...

simple LEFT JOIN giving wrong results ...

From
"Marc G. Fournier"
Date:
I've got to be missing something obvious here ... I have two tables, on
containing traffic stats, the other disk usage ... I want to do a JOIN in
one query to give me the sum of traffic and average of storage ...
seperately, the results are right .. put together, traffic values are way
off, while storage is still correct ...

So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to
what/how .. :(

ams=# select ct.ip_id, sum(ct.bytes) as traffic,            avg(cs.bytes)::bigint as storage       from
company_00186.trafficct  left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND
         month_trunc(cs.runtime) = '2003-12-01')      where month_trunc(ct.runtime) = '2003-12-01' group by
ct.ip_id;ip_id|   traffic    |   storage
 
-------+--------------+------------- 1194 |     76761728 |  1839676259 1226 |   5744576925 | 1134 |     17042528 |
247945531089 | 311779796360 | 10814211187 1200 |  82535202840 |  3165073628 1088 |   1969333472 |  2119206061 1227 |
44816947957|  4891683299 1179 |   3867502285 |
 
(8 rows)

where, individually, the results should be:

ams=# select ip_id, avg(bytes)::bigint from company_00186.storage      where month_trunc(runtime) = '2003-12-01' group
byip_id;ip_id |     avg
 
-------+------------- 1227 |  4891683299 1255 |           0 1134 |    24794553 1194 |  1839676259 1089 | 10814211187
1088|  2119206061 1200 |  3165073628
 
(7 rows)

and

ams=# select ip_id, sum(bytes) from company_00186.traffic      where month_trunc(runtime) = '2003-12-01' group by
ip_id;ip_id|     sum
 
-------+------------- 1194 |     9595216 1226 |  5744576925 1134 |     2130316 1089 | 38972474545 1200 | 10316900355
1088|   246166684 1227 | 44816947957 1179 |  3867502285
 
(8 rows)


the storage/avg values come out right in the JOIN, but the traffic/sum
values get royally screwed ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: simple LEFT JOIN giving wrong results ...

From
Tom Lane
Date:
"Marc G. Fournier" <scrappy@hub.org> writes:
> So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to
> what/how .. :(

> ams=# select ct.ip_id, sum(ct.bytes) as traffic,
>              avg(cs.bytes)::bigint as storage
>         from company_00186.traffic ct
>    left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND
>                                            month_trunc(cs.runtime) = '2003-12-01')
>        where month_trunc(ct.runtime) = '2003-12-01' group by ct.ip_id;

I suspect you do not want the month_trunc constraint to be included
in the JOIN ON condition, only in WHERE.
        regards, tom lane


Re: simple LEFT JOIN giving wrong results ...

From
"Marc G. Fournier"
Date:
On Thu, 11 Dec 2003, Tom Lane wrote:

> "Marc G. Fournier" <scrappy@hub.org> writes:
> > So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to
> > what/how .. :(
>
> > ams=# select ct.ip_id, sum(ct.bytes) as traffic,
> >              avg(cs.bytes)::bigint as storage
> >         from company_00186.traffic ct
> >    left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND
> >                                            month_trunc(cs.runtime) = '2003-12-01')
> >        where month_trunc(ct.runtime) = '2003-12-01' group by ct.ip_id;
>
> I suspect you do not want the month_trunc constraint to be included
> in the JOIN ON condition, only in WHERE.

'k, but then would that take in all storage for all dates, since I'm only
then joining on the ip_id?  right now, I only have storage #s for Dec, so
it wouldn't make any differences for this one, but ..

results are still way off though, even with removing it:
ip_id |   traffic    |   storage
-------+--------------+------------- 1088 |   1979325872 |  2119206061 1200 |  84004842024 |  3165073628 1227 |
45591571353|  4891683299 1179 |   3893192839 | 1194 |     77360968 |  1839676259 1134 |     17357504 |    24794553 1226
|  5836213601 | 1089 | 315424415080 | 10814211187
 
(8 rows)

By changing the query to:

ams=# select ip_id, sum(bytes),            (select avg(bytes)               from company_00186.storage cs
wheremonth_trunc(runtime) = '2003-12-01'                and cs.ip_id = ct.ip_id)::bigint as storage       from
company_00186.trafficct      where month_trunc(runtime) = '2003-12-01' group by ip_id;ip_id |     sum     |   storage
 
-------+-------------+------------- 1194 |     9670121 |  1839676259 1134 |     2169688 |    24794553 1226 |
5836213601| 1089 | 39428051885 | 10814211187 1088 |   247415734 |  2119206061 1200 | 10500605253 |  3165073628 1227 |
45591571353|  4891683299 1179 |  3893192839 |
 
(8 rows)

I can get the right results again, it jus doesn't seem as clean ;(
----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: simple LEFT JOIN giving wrong results ...

From
Stephan Szabo
Date:
On Thu, 11 Dec 2003, Marc G. Fournier wrote:

>
> I've got to be missing something obvious here ... I have two tables, on
> containing traffic stats, the other disk usage ... I want to do a JOIN in
> one query to give me the sum of traffic and average of storage ...
> seperately, the results are right .. put together, traffic values are way
> off, while storage is still correct ...
>
> So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to
> what/how .. :(
>
> ams=# select ct.ip_id, sum(ct.bytes) as traffic,
>              avg(cs.bytes)::bigint as storage
>         from company_00186.traffic ct
>    left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND
>                                            month_trunc(cs.runtime) = '2003-12-01')
>        where month_trunc(ct.runtime) = '2003-12-01' group by ct.ip_id;

If there are multiple cs rows that match a particular ct row with those
constraints you're going to be getting a multiple of the ct.bytes value
with sum because you're going to have an extra ct.bytes for
each matching cs row.

If you had two traffic rows with ip_id 1 with bytes 10 and 5 and two
storage rows with ip_id 1 with bytes 5 and 3 (and assuming that they both
are in the right month), the join should give you a set like:
((ip_id=1, ct.bytes=10, cs.bytes=5), (ip_id=1, ct.bytes=10, cs.bytes=3),
(ip_id=1, ct.bytes=5, cs.bytes=5), (ip_id=1, ct.bytes=5, cs.bytes=3))
I don't think a join between the two tables is what you really want.

Maybe a join between the two group by querys, something like:
select ip_id, traffic, storagefrom (select ip_id, sum(ct.bytes) as traffic from company_00186.traffic ct where
month_trunc(runtime)='2003-12-01')a left outer join(select ip_id, avg(ct.bytes) as storage from company_00186.storage
ctwhere month_trunc(runtime)='2003-12-01') b using (ip_id);