Re: Weird join result - Mailing list pgsql-general

From Stephan Szabo
Subject Re: Weird join result
Date
Msg-id 20060816102308.Q7680@megazone.bigpanda.com
Whole thread Raw
In response to Weird join result  (Peter Nixonn <listuser@peternixon.net>)
List pgsql-general
On Wed, 16 Aug 2006, Peter Nixonn wrote:

> I am getting a result for an JOIN that I think is wrong. Maybe its my
> understanding that is wrong here however, so please be gentle :-)
>
> The "phones" table contains a list of phone numbers and an associated
> customer ID. The radacct table contains a list of all calls made (RADIUS
> Accounting records).
>
> I am doing the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 GROUP BY
> phones.CALLINGSTATIONID;
>
> This query as expected returns 1386 rows (for customer ID 1) which includes
> a number of rows which have a NULL sum as they have not ever connected.
>
> Now, what I want to do is to return the same result set (of 1386 phones),
> but only for a particular time period. I therefore do the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and radacct.ACCTSTARTTIME
> BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' GROUP BY
> phones.CALLINGSTATIONID;
>
> This returns 1280 rows, none of which are have a NULL value for sum. This
> surprised me at first as I thought the WHERE clause should apply before the
> OUTER JOIN but apparently not.

No, in fact it explicitly happens after the join (the order of evaluation
in the theoretical model is basically evaluate the from clause as a table
then apply where on that table to make a new table and so on). You can
either use a subselect in from list or put the condition into the ON to
make its evaluation earlier.

> I then tried the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and
> (radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16
> 15:16:42' OR radacct.ACCTSTARTTIME ISNULL) GROUP BY
> phones.CALLINGSTATIONID;
>
> This query returns 1368 rows, which includes some NULL values for sum,
> however still short of the 1386 rows I am looking for. Close, but no cigar!

Imagine you had
phones (callingstationid = 1, custid = 1)
phones (callingstationid = 2, custid = 1)
phones (callingstationid = 3, custid = 1)
radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00)
radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

Now, I believe the outer join should above give you
(callingstationid = 1, custid = 1, acctstarttime 2006-05-10 00:00:00)
(callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid = 1, acctstarttime NULL)

Then apply the where clause
(callingstationid = 2, custid =1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid =1, acctstarttime NULL)

---

With the subselect in from you'd have
 phones (callingstationid = 1, custid = 1)
 phones (callingstationid = 2, custid = 1)
 phones (callingstationid = 3, custid = 1)
 radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00)
 radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

The first radacct row doesn't pass, so it's not in the subselect output
which should then look like:
 subsel_radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

Then, I believe the outer join should give you
(callingstationid = 1, custid = 1, acctstarttime NULL)
(callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid = 1, acctstarttime NULL)

And then you apply the where clause again and all the rows go through.

pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: limitations of constraints, possible bugs
Next
From: "Harpreet Dhaliwal"
Date:
Subject: Triggers invoking a stored procedure or a C function