Re: Weird join result - Mailing list pgsql-general
From | Stephan Szabo |
---|---|
Subject | Re: Weird join result |
Date | |
Msg-id | 20060816102308.Q7680@megazone.bigpanda.com Whole thread Raw |
In response to | Weird join result (Peter Nixonn <listuser@peternixon.net>) |
List | pgsql-general |
On Wed, 16 Aug 2006, Peter Nixonn wrote: > I am getting a result for an JOIN that I think is wrong. Maybe its my > understanding that is wrong here however, so please be gentle :-) > > The "phones" table contains a list of phone numbers and an associated > customer ID. The radacct table contains a list of all calls made (RADIUS > Accounting records). > > I am doing the following: > > SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones > LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = > radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 GROUP BY > phones.CALLINGSTATIONID; > > This query as expected returns 1386 rows (for customer ID 1) which includes > a number of rows which have a NULL sum as they have not ever connected. > > Now, what I want to do is to return the same result set (of 1386 phones), > but only for a particular time period. I therefore do the following: > > SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones > LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = > radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and radacct.ACCTSTARTTIME > BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' GROUP BY > phones.CALLINGSTATIONID; > > This returns 1280 rows, none of which are have a NULL value for sum. This > surprised me at first as I thought the WHERE clause should apply before the > OUTER JOIN but apparently not. No, in fact it explicitly happens after the join (the order of evaluation in the theoretical model is basically evaluate the from clause as a table then apply where on that table to make a new table and so on). You can either use a subselect in from list or put the condition into the ON to make its evaluation earlier. > I then tried the following: > > SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones > LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID = > radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and > (radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 > 15:16:42' OR radacct.ACCTSTARTTIME ISNULL) GROUP BY > phones.CALLINGSTATIONID; > > This query returns 1368 rows, which includes some NULL values for sum, > however still short of the 1386 rows I am looking for. Close, but no cigar! Imagine you had phones (callingstationid = 1, custid = 1) phones (callingstationid = 2, custid = 1) phones (callingstationid = 3, custid = 1) radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00) radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00) Now, I believe the outer join should above give you (callingstationid = 1, custid = 1, acctstarttime 2006-05-10 00:00:00) (callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00) (callingstationid = 3, custid = 1, acctstarttime NULL) Then apply the where clause (callingstationid = 2, custid =1, acctstarttime 2006-08-10 00:00:00) (callingstationid = 3, custid =1, acctstarttime NULL) --- With the subselect in from you'd have phones (callingstationid = 1, custid = 1) phones (callingstationid = 2, custid = 1) phones (callingstationid = 3, custid = 1) radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00) radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00) The first radacct row doesn't pass, so it's not in the subselect output which should then look like: subsel_radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00) Then, I believe the outer join should give you (callingstationid = 1, custid = 1, acctstarttime NULL) (callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00) (callingstationid = 3, custid = 1, acctstarttime NULL) And then you apply the where clause again and all the rows go through.
pgsql-general by date: