Re: Rules and Views - Mailing list pgsql-hackers

From Zeugswetter Andreas SB SD
Subject Re: Rules and Views
Date
Msg-id 46C15C39FEB2C44BA555E356FBCD6FA4961E38@m0114.s-mxs.net
Whole thread Raw
In response to Rules and Views  (Curt Sampson <cjs@cynic.net>)
List pgsql-hackers
> Hmm.  I think this consideration boils down to whether the WHERE clause
> can give different results for rows that appear equal under the rules of
> UNION/EXCEPT/INTERSECT.  If it gives the same result for any two such
> rows, then it's safe to push down; otherwise not.
>
> It's not too difficult to come up with examples.  I invite you to play
> with
>
> select z,length(z) from
> (select 'abc    '::char(7) as z intersect
> select 'abc    '::char(8) as z) ss;
>
> and contemplate the effects of pushing down a qual involving
> length(z).

I guess that is why e.g. Informix returns 3 for both of them. Imho that
makes a lot of sense. The trailing spaces in char's are supposed to be
irrellevant. (But iirc this has already been discussed and rejected)

> Whether this particular case is very important in the real world is hard
> to say.  But there might be more-important cases out there.
>
> And yet, I think we can do it anyway.  The score card looks
> like this to
> me:
>
> UNION ALL: always safe to push down, since the rows will be passed
> independently to the outer WHERE anyway.

Yes, that would imho also be the most important optimization.

> UNION: it's unspecified which of a set of "equal" rows will be returned,
> and therefore the behavior would be unspecified anyway if the outer
> WHERE can distinguish the rows - you might get 1 row of the set out or
> none.  If we push down, then we create a situation where the returned
> row will always be one that passes the outer WHERE, but that
> is a legal behavior.
>
> INTERSECT: again it's unspecified which of a set of "equal" rows will be
> returned, and so you might get 1 row out or none.  If we push down then
> it's still unspecified whether you get a row out (example: if the outer
> WHERE will pass only for rows of the left table and not the right, then
> push down will result in no rows of the "equal" set being emitted, but
> that's a legal behavior).
>
> INTERSECT ALL: if a set of "equal" rows contains M rows from the left
> table and N from the right table, you're supposed to get min(M,N) rows
> of the set out of the INTERSECT ALL.  Again you can't say which of the
> set you will get, so the outer WHERE might let anywhere between 0 and
> min(M,N) rows out.  With push down, M and N will be reduced by the WHERE
> before we do the intersection, so you still have 0 to
> min(M,N) rows out.
> The behavior will change, but it's still legal per spec AFAICT.
>
> EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

The imho difficult question is, which select locks down the datatype to use
for this column. In a strict sense char(6) and char(7) are not the same
type. Since I would certainly not want to be that strict, it imho has to be
decided what type the union/intersect... is supposed to use.
Informix converts them both to the longer char. I do not think it is
valid to return variable length char's.

e.g.:
create table atab1 (a char(6));
create table atab2 (a char(8));
insert into atab1 values ('abc');
insert into atab2 values ('abc');
create view aview as select * from atab1 union all select * from atab2;
select '<'||a||'>' from aview;
Informix:
(expression)
<abc     >
<abc     >
PostgreSQL: ?column?
------------<abc   ><abc     >

I am not sure eighter answer is strictly correct. I would probably have
expected <abc   > <abc   > (char(6)) since the first select is supposed to
lock down the type, no ?

> In short, it looks to me like the spec was carefully designed to allow
> push down.  Pushing down a condition of this sort *does* change the
> behavior, but the new behavior is still within spec.

I think this would be a great performance boost for views and thus
worth a change in results that are within spec.
Would you want to push down always ? There could be outer where clauses,
that are so expensive that you would not want to do them twice.
If it is all or nothing, I do think pushing down always is better than not.

Andreas


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Open 7.3 items
Next
From: Hannu Krosing
Date:
Subject: Re: Why is MySQL more chosen over PostgreSQL?