Thread: Weird join result
Hi Guys I am getting a result for an JOIN that I think is wrong. Maybe its my understanding that is wrong here however, so please be gentle :-) The "phones" table contains a list of phone numbers and an associated customer ID. The radacct table contains a list of all calls made (RADIUS Accounting records). I am doing the following: SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 GROUP BY phones.CALLINGSTATIONID; This query as expected returns 1386 rows (for customer ID 1) which includes a number of rows which have a NULL sum as they have not ever connected. Now, what I want to do is to return the same result set (of 1386 phones), but only for a particular time period. I therefore do the following: SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' GROUP BY phones.CALLINGSTATIONID; This returns 1280 rows, none of which are have a NULL value for sum. This surprised me at first as I thought the WHERE clause should apply before the OUTER JOIN but apparently not. I then tried the following: SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and (radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' OR radacct.ACCTSTARTTIME ISNULL) GROUP BY phones.CALLINGSTATIONID; This query returns 1368 rows, which includes some NULL values for sum, however still short of the 1386 rows I am looking for. Close, but no cigar! Can someone please point out to me what I (or alternatively Postgresql) is doing wrong so that I can get a list of all my customer's phones with the usage for the month beside them (and NULL/Zero if they haven't been used that month) select version(); version ------------------------------------------------------------------------------------- PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.0 (SUSE Linux) Thanks in Advance -- Peter Nixon http://www.peternixon.net/ PGP Key: http://www.peternixon.net/public.asc
Peter,
what gives you the impression that WHERE clauses should be applied before the JOINs ?
Clearly that is different. First joins, then where
propably you want sth like:
best wishes Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all.
This returns 1280 rows, none of which are have a NULL value for sum. This
surprised me at first as I thought the WHERE clause should apply before the
OUTER JOIN but apparently not. I then tried the following:
what gives you the impression that WHERE clauses should be applied before the JOINs ?
Clearly that is different. First joins, then where
propably you want sth like:
SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets)FROM phones
LEFT OUTER JOIN
(select * from raddact where radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' OR radacct.ACCTSTARTTIME ISNULL) raddact
ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID ) WHERE
phones.CUSTID = 1
GROUP BY phones.CALLINGSTATIONID;
ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID ) WHERE
phones.CUSTID = 1
GROUP BY phones.CALLINGSTATIONID;
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all.
On Wed, 16 Aug 2006, Peter Nixonn wrote: > I am getting a result for an JOIN that I think is wrong. Maybe its my > understanding that is wrong here however, so please be gentle :-) > > The "phones" table contains a list of phone numbers and an associated > customer ID. The radacct table contains a list of all calls made (RADIUS > Accounting records). > > I am doing the following: > > SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones > LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = > radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 GROUP BY > phones.CALLINGSTATIONID; > > This query as expected returns 1386 rows (for customer ID 1) which includes > a number of rows which have a NULL sum as they have not ever connected. > > Now, what I want to do is to return the same result set (of 1386 phones), > but only for a particular time period. I therefore do the following: > > SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones > LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = > radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and radacct.ACCTSTARTTIME > BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' GROUP BY > phones.CALLINGSTATIONID; > > This returns 1280 rows, none of which are have a NULL value for sum. This > surprised me at first as I thought the WHERE clause should apply before the > OUTER JOIN but apparently not. No, in fact it explicitly happens after the join (the order of evaluation in the theoretical model is basically evaluate the from clause as a table then apply where on that table to make a new table and so on). You can either use a subselect in from list or put the condition into the ON to make its evaluation earlier. > I then tried the following: > > SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones > LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = > radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and > (radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 > 15:16:42' OR radacct.ACCTSTARTTIME ISNULL) GROUP BY > phones.CALLINGSTATIONID; > > This query returns 1368 rows, which includes some NULL values for sum, > however still short of the 1386 rows I am looking for. Close, but no cigar! Imagine you had phones (callingstationid = 1, custid = 1) phones (callingstationid = 2, custid = 1) phones (callingstationid = 3, custid = 1) radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00) radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00) Now, I believe the outer join should above give you (callingstationid = 1, custid = 1, acctstarttime 2006-05-10 00:00:00) (callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00) (callingstationid = 3, custid = 1, acctstarttime NULL) Then apply the where clause (callingstationid = 2, custid =1, acctstarttime 2006-08-10 00:00:00) (callingstationid = 3, custid =1, acctstarttime NULL) --- With the subselect in from you'd have phones (callingstationid = 1, custid = 1) phones (callingstationid = 2, custid = 1) phones (callingstationid = 3, custid = 1) radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00) radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00) The first radacct row doesn't pass, so it's not in the subselect output which should then look like: subsel_radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00) Then, I believe the outer join should give you (callingstationid = 1, custid = 1, acctstarttime NULL) (callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00) (callingstationid = 3, custid = 1, acctstarttime NULL) And then you apply the where clause again and all the rows go through.