Re: BUG #1528: Rows returned that should be excluded by WHERE clause - Mailing list pgsql-bugs

From Peter Wright
Subject Re: BUG #1528: Rows returned that should be excluded by WHERE clause
Date
Msg-id 20050311054518.GA5375@cartman.flooble.net.au
Whole thread Raw
In response to Re: BUG #1528: Rows returned that should be excluded by WHERE clause  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #1528: Rows returned that should be excluded by WHERE clause  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hi Tom, others,

First I must say that I appreciate the effort you've invested already
into finding the best "correct" solution. It's very encouraging. :)

I think I understand your analysis of the problem being that HAVING is
erroneously optimised/simplified to WHERE in some cases - and so the
initial "bug" I reported is technically the correct behaviour(?).

....Okay, maybe I'm not completely sure I've understood you correctly. :)

On 08/03 03:07:13, Tom Lane wrote:
> "Peter Wright" <pete@flooble.net> writes:
> > Description:        Rows returned that should be excluded by WHERE clause
>
> Interesting point.  The view and union don't seem to be the issue;

I think the view _is_ the issue (well, at least for me and my
(limited) understanding of how things should work :)). See below.

> I think the problem can be expressed as
[ snip ]
> Now, if this were a WHERE clause, I think the answer would be right:
>
> regression=# select 2 as id, max(b) from t2 where 2 = 1;
>  id | max
> ----+-----
>   2 |
> (1 row)
>
> but since it's HAVING I think this is probably wrong.
[ ... ]

On 08/03 12:14:35, Tom Lane wrote:
> "Gill, Jerry T." <JTGill@west.com> writes:
> > Just an interesting side note here, this behavior is identical to
> > DB2. I am not sure if that makes it correct or not, but here is an
> > example.
> > [gill@c2n2 gill]$ db2 "select 2 as id, max(apn3) from phoenix.client where 2 =1"
>
> > ID          2
> > ----------- ------
> >           2      -
>
> >   1 record(s) selected.
>
> In the WHERE case I think there's no question that the above is
> correct: WHERE is defined to filter rows before application of
> aggregates, so zero rows arrive at the MAX aggregate, and that means
> it produces a NULL.

Now this _does_ make sense - but in the case of a view (or
sub-select), how can it be correct that a WHERE _outside_ the view can
affect the behaviour of that view?

At the very least I'd call that grossly anti-intuitive:

----------------------------------------------------------------------
test1=# create table tab1 ( a integer, b integer );
CREATE TABLE
test1=# insert into tab1 values ( 1, 1 );
INSERT 118421921 1
test1=# insert into tab1 values ( 2, 2 );
INSERT 118421922 1
test1=# create view qry1 as select 2 as id, max(b) as b from tab1;
CREATE VIEW
test1=# create table tab2 ( id integer, b integer );
CREATE TABLE
test1=# insert into tab2 values ( 2, 2);
INSERT 118421931 1
test1=# select * from tab2;
 id | b
----+---
  2 | 2
(1 row)

test1=# select * from qry1;
 id | b
----+---
  2 | 2
(1 row)

test1=# select * from qry1 where id = 1;
 id | b
----+---
  2 |
(1 row)

test1=# select * from tab2 where id = 1;
 id | b
----+---
(0 rows)

test1=#
----------------------------------------------------------------------

You say, "WHERE is defined to filter rows before application of
aggregates", but I'd _think_ that should be interpreted to apply only
to aggregates in the _current_ query (ie. not in sub-queries).

In my example just above, I'd _expect_ the view should be fully
evaluated and the results (of that view) treated as though it were
just another table.

Perhaps I'm just showing my limited experience with database theory
here :-), but if you can explain why it makes sense that WHERE must be
applied before aggregation in _all_ subqueries, that'd be good *wry grin*.

> But HAVING is supposed to filter after aggregation, so I think
> probably there should be no row out in that case.

I have no problem with this.

>             regards, tom lane

Thanks again for your efforts with this issue, Tom.

Currently I'm working around it by adding an extra kludge-clause,
effectively "AND b is not null", but it'd be preferable to have
Postgres do the Right Thing(tm), whatever that might be.... :)

Pete.
--
http://akira.apana.org.au/~pete/
And anyway, we know that 2 + 2 = 5, for very large values of two...

pgsql-bugs by date:

Previous
From: John R Pierce
Date:
Subject: Re: We are not following the spec for HAVING without GROUP
Next
From: Tom Lane
Date:
Subject: Re: BUG #1528: Rows returned that should be excluded by WHERE clause