Thread: Weird join result

Weird join result

From
Peter Nixonn
Date:
Hi Guys

I am getting a result for an JOIN that I think is wrong. Maybe its my
understanding that is wrong here however, so please be gentle :-)

The "phones" table contains a list of phone numbers and an associated
customer ID. The radacct table contains a list of all calls made (RADIUS
Accounting records).

I am doing the following:

SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 GROUP BY
phones.CALLINGSTATIONID;

This query as expected returns 1386 rows (for customer ID 1) which includes
a number of rows which have a NULL sum as they have not ever connected.

Now, what I want to do is to return the same result set (of 1386 phones),
but only for a particular time period. I therefore do the following:

SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and radacct.ACCTSTARTTIME
BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' GROUP BY
phones.CALLINGSTATIONID;

This returns 1280 rows, none of which are have a NULL value for sum. This
surprised me at first as I thought the WHERE clause should apply before the
OUTER JOIN but apparently not. I then tried the following:

SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and
(radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16
15:16:42' OR radacct.ACCTSTARTTIME ISNULL) GROUP BY
phones.CALLINGSTATIONID;

This query returns 1368 rows, which includes some NULL values for sum,
however still short of the 1386 rows I am looking for. Close, but no cigar!

Can someone please point out to me what I (or alternatively Postgresql) is
doing wrong so that I can get a list of all my customer's phones with the
usage for the month beside them (and NULL/Zero if they haven't been used
that month)

select version();
                                       version
-------------------------------------------------------------------------------------
 PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.0
(SUSE Linux)


Thanks in Advance
--

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc

Re: Weird join result

From
"Harald Armin Massa"
Date:
Peter,

This returns 1280 rows, none of which are have a NULL value for sum. This
surprised me at first as I thought the WHERE clause should apply before the
OUTER JOIN but apparently not. I then tried the following:

what gives you the impression that WHERE clauses should be applied before the JOINs ?
Clearly that is different. First joins, then where

propably you want sth like:


SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) 
FROM phones
LEFT OUTER JOIN
(select * from raddact where radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' OR radacct.ACCTSTARTTIME ISNULL)  raddact
ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID ) WHERE 
phones.CUSTID = 1

GROUP BY phones.CALLINGSTATIONID;

best wishes Harald


--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all.

Re: Weird join result

From
Stephan Szabo
Date:
On Wed, 16 Aug 2006, Peter Nixonn wrote:

> I am getting a result for an JOIN that I think is wrong. Maybe its my
> understanding that is wrong here however, so please be gentle :-)
>
> The "phones" table contains a list of phone numbers and an associated
> customer ID. The radacct table contains a list of all calls made (RADIUS
> Accounting records).
>
> I am doing the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 GROUP BY
> phones.CALLINGSTATIONID;
>
> This query as expected returns 1386 rows (for customer ID 1) which includes
> a number of rows which have a NULL sum as they have not ever connected.
>
> Now, what I want to do is to return the same result set (of 1386 phones),
> but only for a particular time period. I therefore do the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and radacct.ACCTSTARTTIME
> BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' GROUP BY
> phones.CALLINGSTATIONID;
>
> This returns 1280 rows, none of which are have a NULL value for sum. This
> surprised me at first as I thought the WHERE clause should apply before the
> OUTER JOIN but apparently not.

No, in fact it explicitly happens after the join (the order of evaluation
in the theoretical model is basically evaluate the from clause as a table
then apply where on that table to make a new table and so on). You can
either use a subselect in from list or put the condition into the ON to
make its evaluation earlier.

> I then tried the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and
> (radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16
> 15:16:42' OR radacct.ACCTSTARTTIME ISNULL) GROUP BY
> phones.CALLINGSTATIONID;
>
> This query returns 1368 rows, which includes some NULL values for sum,
> however still short of the 1386 rows I am looking for. Close, but no cigar!

Imagine you had
phones (callingstationid = 1, custid = 1)
phones (callingstationid = 2, custid = 1)
phones (callingstationid = 3, custid = 1)
radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00)
radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

Now, I believe the outer join should above give you
(callingstationid = 1, custid = 1, acctstarttime 2006-05-10 00:00:00)
(callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid = 1, acctstarttime NULL)

Then apply the where clause
(callingstationid = 2, custid =1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid =1, acctstarttime NULL)

---

With the subselect in from you'd have
 phones (callingstationid = 1, custid = 1)
 phones (callingstationid = 2, custid = 1)
 phones (callingstationid = 3, custid = 1)
 radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00)
 radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

The first radacct row doesn't pass, so it's not in the subselect output
which should then look like:
 subsel_radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

Then, I believe the outer join should give you
(callingstationid = 1, custid = 1, acctstarttime NULL)
(callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid = 1, acctstarttime NULL)

And then you apply the where clause again and all the rows go through.