Thread: Rules and Views

Rules and Views

From
Curt Sampson
Date:
I'm having a weird problem on my " PostgreSQL 7.2.1 on i386--netbsdelf,
compiled by GCC 2.95.3" system. Executing these commands:

CREATE TABLE test_one (id int PRIMARY KEY, value_one text);
CREATE TABLE test_two (id int PRIMARY KEY, value_two text);
CREATE VIEW test AS   SELECT test_one.id, value_one, value_two   FROM test_one   JOIN test_two USING (id);
CREATE RULE test_insert AS   ON INSERT TO test   DO (INSERT INTO test_one (id, value_one) VALUES (NEW.id,
NEW.value_one);INSERTINTO test_two (id, value_two) VALUES (NEW.id, NEW.value_two););
 
INSERT INTO test VALUES (1, 'one', 'onemore');

returns "ERROR:  Cannot insert into a view without an appropriate rule"
for that last statement. The rule does show up in pg_rules, though.

What am I doing wrong here? Is there a bug?

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Rules and Views

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> CREATE VIEW test AS ...
> CREATE RULE test_insert AS
>     ON INSERT TO test
>     DO  ...
> INSERT INTO test VALUES (1, 'one', 'onemore');
> ERROR:  Cannot insert into a view without an appropriate rule

> What am I doing wrong here? Is there a bug?

Make that "ON INSERT DO INSTEAD".  As coded, the rule leaves the
original insertion into the view still active.

Perhaps the error message could be phrased better --- any thoughts?
        regards, tom lane


Re: Rules and Views

From
Curt Sampson
Date:
On Tue, 30 Jul 2002, Tom Lane wrote:

> Curt Sampson <cjs@cynic.net> writes:
> > CREATE VIEW test AS ...
> > CREATE RULE test_insert AS
> >     ON INSERT TO test
> >     DO  ...
> > INSERT INTO test VALUES (1, 'one', 'onemore');
> > ERROR:  Cannot insert into a view without an appropriate rule
>
> > What am I doing wrong here? Is there a bug?
>
> Make that "ON INSERT DO INSTEAD".  As coded, the rule leaves the
> original insertion into the view still active.

Ah, I see! My model of how this was working was wrong.

> Perhaps the error message could be phrased better --- any thoughts?

Maybe a message that says something along the lines of "cannot insert
into views; you need to override this behaviour with a rule"? Also, some
examples in the manual would be helpful.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Rules and Views

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> ERROR:  Cannot insert into a view without an appropriate rule

>> Perhaps the error message could be phrased better --- any thoughts?

> Maybe a message that says something along the lines of "cannot insert
> into views; you need to override this behaviour with a rule"?

Well, to my mind that's what the error message says now.  The reason
it didn't help you was that you *did* have a rule ... but it didn't
completely override the view insertion.

I'm not sure how to phrase a more useful message.  Note that the place
where the error can be detected doesn't have any good way to know that
a non-INSTEAD rule was in fact processed, so we can't say anything quite
as obvious as "You needed to use INSTEAD in your rule, luser".  Can we
cover both the no-rule-at-all case and the had-a-rule-but-it-wasn't-
INSTEAD case in a single, reasonably phrased error message?  (Just
to make life interesting, there's also the case where you made an
INSTEAD rule but it's conditional.)

> Also, some examples in the manual would be helpful.

Aren't there several already?  But feel free to contribute more...
        regards, tom lane


Re: Rules and Views

From
Curt Sampson
Date:
On Wed, 31 Jul 2002, Tom Lane wrote:

> Well, to my mind that's what the error message says now.  The reason
> it didn't help you was that you *did* have a rule ... but it didn't
> completely override the view insertion.

Right, like I said, my model was wrong. I didn't think of the error
message as being an "insert behaviour" that had to be overridden; I
thought of it as a "there is no behaviour right now" message.

Maybe it's just me not reading the docs all that well; I wouldn't worry
about this if it's not been a problem for others.

> > Also, some examples in the manual would be helpful.
>
> Aren't there several already?  But feel free to contribute more...

Yeah, but nothing showing these rules on a view across two tables.
I'll try to work it out and send it here for comments.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Rules and Views

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> On Wed, 31 Jul 2002, Tom Lane wrote:
>> Well, to my mind that's what the error message says now.  The reason
>> it didn't help you was that you *did* have a rule ... but it didn't
>> completely override the view insertion.

> Right, like I said, my model was wrong. I didn't think of the error
> message as being an "insert behaviour" that had to be overridden; I
> thought of it as a "there is no behaviour right now" message.

Hm.  How about

ERROR:  Cannot insert into a viewYou need an unconditional ON INSERT DO INSTEAD rule
        regards, tom lane


Re: Rules and Views

From
Curt Sampson
Date:
On Wed, 31 Jul 2002, Tom Lane wrote:

> ERROR:  Cannot insert into a view
>     You need an unconditional ON INSERT DO INSTEAD rule

Sounds great to me!

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Rules and Views

From
Hannu Krosing
Date:
On Wed, 2002-07-31 at 10:22, Tom Lane wrote:
> Curt Sampson <cjs@cynic.net> writes:
> > On Wed, 31 Jul 2002, Tom Lane wrote:
> >> Well, to my mind that's what the error message says now.  The reason
> >> it didn't help you was that you *did* have a rule ... but it didn't
> >> completely override the view insertion.
> 
> > Right, like I said, my model was wrong. I didn't think of the error
> > message as being an "insert behaviour" that had to be overridden; I
> > thought of it as a "there is no behaviour right now" message.
> 
> Hm.  How about
> 
> ERROR:  Cannot insert into a view
>     You need an unconditional ON INSERT DO INSTEAD rule

Seems more accurate, but actually you may also have two or more
conditional rules that cover all possibilities if taken together.

Maybe

ERROR:  Cannot insert into a view       You need an ON INSERT DO INSTEAD rule that matches your INSERT

Which covers both cases.

-----------------
Hannu



Re: Rules and Views

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> On Wed, 2002-07-31 at 10:22, Tom Lane wrote:
>> Hm.  How about
>> 
>> ERROR:  Cannot insert into a view
>> You need an unconditional ON INSERT DO INSTEAD rule

> Seems more accurate, but actually you may also have two or more
> conditional rules that cover all possibilities if taken together.
> Maybe
> ERROR:  Cannot insert into a view
>         You need an ON INSERT DO INSTEAD rule that matches your INSERT
> Which covers both cases.

Actually not: the system insists that you provide an unconditional
DO INSTEAD rule.  The other would require trying to prove (during
rule expansion) a theorem that the conditions of the available
conditional rules cover all possible cases.

Alternatively we could move the test for insertion-into-a-view out of
the rewriter and into a low level of the executor, producing an error
message only if some inserted tuple actually gets past the rule
conditions.  I don't much care for that answer because (a) it turns a
once-per-query overhead check into once-per-tuple overhead, and
(b) if you fail to span the full space of possibilities in your rule
conditions, you might not find out about it until your application goes
belly-up in production.  There's some version of Murphy's Law that says
rare conditions arise with very low probability during testing, and very
high probability as soon as you go live...
        regards, tom lane


Re: Rules and Views

From
"Zeugswetter Andreas SB SD"
Date:
> > Seems more accurate, but actually you may also have two or more
> > conditional rules that cover all possibilities if taken together.
> > Maybe
> > ERROR:  Cannot insert into a view
> >         You need an ON INSERT DO INSTEAD rule that matches your INSERT
> > Which covers both cases.
>
> Actually not: the system insists that you provide an unconditional
> DO INSTEAD rule.  The other would require trying to prove (during
> rule expansion) a theorem that the conditions of the available
> conditional rules cover all possible cases.
>
> Alternatively we could move the test for insertion-into-a-view out of
> the rewriter and into a low level of the executor, producing an error
> message only if some inserted tuple actually gets past the rule
> conditions.  I don't much care for that answer because (a) it turns a
> once-per-query overhead check into once-per-tuple overhead, and

Since I see a huge benefit in allowing conditional rules for a view,
I think it is worth finding a solution.

The current rewriter test could still catch the case where no instead rule
exists at all.

The utility is "Table Partitioning by expression".

Basically you have a union view like:
create view history as
select * from history2000 where yearcol=2000
union all
select * from history2001 where yearcol=2001

You get the idea.
Now you need conditional insert and update rules to act on the
correct table.

Maybe we would also need additional intelligence in the planner
to eliminate the history2000 table in a select * from history where
yearcol=2001.

But that is all you need for a really useful feature for large databases.

> (b) if you fail to span the full space of possibilities in your rule
> conditions, you might not find out about it until your application goes
> belly-up in production.  There's some version of Murphy's Law that says
> rare conditions arise with very low probability during testing, and very
> high probability as soon as you go live...

This is true for other db's table partitioning capabilities as well, and they
still implement the feature.

Andreas


Re: Rules and Views

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> Since I see a huge benefit in allowing conditional rules for a view,
> I think it is worth finding a solution.

We do allow conditional rules for a view.  You just have to write an
unconditional one too (which can be merely DO INSTEAD NOTHING).
        regards, tom lane


Re: Rules and Views

From
"Zeugswetter Andreas SB SD"
Date:
> > Since I see a huge benefit in allowing conditional rules for a view,
> > I think it is worth finding a solution.
>
> We do allow conditional rules for a view.  You just have to write an
> unconditional one too (which can be merely DO INSTEAD NOTHING).

Hmm, but you cannot then trow an error, but that is prbbly a minor issue.
Good that we can do Table Partitioning :-)

Andreas


Re: Rules and Views

From
Curt Sampson
Date:
On Wed, 31 Jul 2002, Zeugswetter Andreas SB SD wrote:

> The utility is "Table Partitioning by expression".
>
> Basically you have a union view like:
> create view history as
> select * from history2000 where yearcol=2000
> union all
> select * from history2001 where yearcol=2001

You want to be careful with this sort of stuff, since the query planner
sometimes won't do the view as efficiently as it would do the fully
specified equivalant query. I've posted about this here before.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Rules and Views

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> You want to be careful with this sort of stuff, since the query planner
> sometimes won't do the view as efficiently as it would do the fully
> specified equivalant query. I've posted about this here before.

Please provide an example.  AFAIK a view is a query macro, and nothing
else.
        regards, tom lane


Re: Rules and Views

From
Curt Sampson
Date:
On Thu, 1 Aug 2002, Tom Lane wrote:

> Curt Sampson <cjs@cynic.net> writes:
> > You want to be careful with this sort of stuff, since the query planner
> > sometimes won't do the view as efficiently as it would do the fully
> > specified equivalant query. I've posted about this here before.
>
> Please provide an example.  AFAIK a view is a query macro, and nothing
> else.

I already did provide an example, and you even replied to it. :-)
See the appended message.

BTW, this page
   http://archives.postgresql.org/pgsql-general/2002-06/threads.php

does not display in Navigator 4.78. Otherwise I would have provided a
reference to the thread in the archive.

Maybe we need a web based form for reporting problem pages in the archives.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Rules and Views

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> On Thu, 1 Aug 2002, Tom Lane wrote:
>> Curt Sampson <cjs@cynic.net> writes:
> You want to be careful with this sort of stuff, since the query planner
> sometimes won't do the view as efficiently as it would do the fully
> specified equivalant query. I've posted about this here before.
>> 
>> Please provide an example.  AFAIK a view is a query macro, and nothing
>> else.

> I already did provide an example, and you even replied to it. :-)

But that isn't an "equivalent query".  You've manually transformed   SELECT * FROM (SELECT something UNION SELECT
somethingelse)WHERE foo;
 
into   (SELECT something WHERE foo) UNION (SELECT somethingelse WHERE foo);
As has been pointed out repeatedly, it's not entirely obvious whether
this is a valid transformation in the general case.  (The knee-jerk
reaction that it's obviously right should be held in check, since SQL's
three-valued notion of boolean logic tends to trip up the intuition.)
If you can provide a proof that it's always safe, or that it's safe
under such-and-such conditions, I'll see what I can do about making it
happen.
        regards, tom lane


Re: Rules and Views

From
Curt Sampson
Date:
On Thu, 1 Aug 2002, Tom Lane wrote:

> But that isn't an "equivalent query".  You've manually transformed
>     SELECT * FROM (SELECT something UNION SELECT somethingelse) WHERE foo;
> into
>     (SELECT something WHERE foo) UNION (SELECT somethingelse WHERE foo);

Right.

> As has been pointed out repeatedly, it's not entirely obvious whether
> this is a valid transformation in the general case.

Right. And I agreed that it as soon as you first pointed it out.
And still do.

But the message I was replying to was a similar union query, and I was
thinking that that person might be having a similar initial intuitive
reaction, "well, it looks kinda the same." I just wanted to note that
you need to check this stuff with explain, rather than blindly assuming
you know what's going on.

> If you can provide a proof that it's always safe, or that it's safe
> under such-and-such conditions, I'll see what I can do about making it
> happen.

It's on my list of things to do, but not high enough that it's
likely I'll ever get to it. :-)

BTW, if anybody can think of a way to make a view that really does
represent my original query, I'd appreciate a hint.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Rules and Views

From
"Zeugswetter Andreas SB SD"
Date:
> But the message I was replying to was a similar union query, and I was
> thinking that that person might be having a similar initial intuitive
> reaction, "well, it looks kinda the same." I just wanted to note that
> you need to check this stuff with explain, rather than
> blindly assuming
> you know what's going on.

I had a "union all" view, which is actually a quite different animal than
a "union" view which needs to eliminate duplicates before further processing.

Andreas


Re: Rules and Views

From
Curt Sampson
Date:
On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:

> I had a "union all" view, which is actually a quite different animal than
> a "union" view which needs to eliminate duplicates before further processing.

I had the same problem with UNION ALL.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Rules and Views

From
Hannu Krosing
Date:
On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:
> On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:
> 
> > I had a "union all" view, which is actually a quite different animal than
> > a "union" view which needs to eliminate duplicates before further processing.
> 
> I had the same problem with UNION ALL.
>

Could someone give an example where it is not safe to push the WHERE
clause down to individual parts of UNION (or UNION ALL) wher these parts
are simple (non-aggregate) queries?

I can see that it has to be made into HAVING in subquery if UNION's
subqueries are aggregate (GROUP BY) queries, but can anyone give an
example where the meaning of the query changes for non-aggregate
subqueries.

---------------
Hannu


Re: Rules and Views

From
Stephan Szabo
Date:
On 1 Aug 2002, Hannu Krosing wrote:

> On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:
> > On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:
> >
> > > I had a "union all" view, which is actually a quite different animal than
> > > a "union" view which needs to eliminate duplicates before further processing.
> >
> > I had the same problem with UNION ALL.
> >
>
> Could someone give an example where it is not safe to push the WHERE
> clause down to individual parts of UNION (or UNION ALL) wher these parts
> are simple (non-aggregate) queries?

For union, queries that want to do something like use a temporary
sequence to act sort of like rownum and do row limiting.  Admittedly
that's already pretty much unspecified behavior, but it does change
the behavior in the place of duplicate removal.  In addition, I think
using bits of the spec we don't completely support you can have the
same issue with the undefined behavior of which duplicate is returned
for values that aren't the same but are equal, for example where the
duplicate removal is in one collation but the outer comparison has
a different explicitly given one.

I haven't come up with any useful examples, and not really any for
union all, however.




Re: Rules and Views

From
Stephan Szabo
Date:
On Thu, 1 Aug 2002, Stephan Szabo wrote:

> On 1 Aug 2002, Hannu Krosing wrote:
>
> > On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:
> > > On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:
> > >
> > > > I had a "union all" view, which is actually a quite different animal than
> > > > a "union" view which needs to eliminate duplicates before further processing.
> > >
> > > I had the same problem with UNION ALL.
> > >
> >
> > Could someone give an example where it is not safe to push the WHERE
> > clause down to individual parts of UNION (or UNION ALL) wher these parts
> > are simple (non-aggregate) queries?
>
> For union, queries that want to do something like use a temporary
> sequence to act sort of like rownum and do row limiting.  Admittedly
> that's already pretty much unspecified behavior, but it does change
> the behavior in the place of duplicate removal.  In addition, I think
> using bits of the spec we don't completely support you can have the
> same issue with the undefined behavior of which duplicate is returned
> for values that aren't the same but are equal, for example where the
> duplicate removal is in one collation but the outer comparison has
> a different explicitly given one.

Replying to myself, you can do this right now with char columns if you
just push the conditions down blindly, something like:

create table t1(a char(5));
create table t2(a char(6));

insert into t1 values ('aaaaa');
insert into t2 values ('aaaaa');

select * from (select * from t2 union select * from t1) as f wherea::text='aaaaa';
select * from (select * from t2 where a::text='aaaaa' unionselect * from t1 where a::text='aaaaa') as f;

The first select gives no rows, the second gives one.  We'd have
to transform the second where clause to something like
cast(a as char(6))::text='aaaaa' in order to get the same effect
I think.



Re: Rules and Views

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> For union, queries that want to do something like use a temporary
> sequence to act sort of like rownum and do row limiting.  Admittedly
> that's already pretty much unspecified behavior, but it does change
> the behavior in the place of duplicate removal.  In addition, I think
> using bits of the spec we don't completely support you can have the
> same issue with the undefined behavior of which duplicate is returned
> for values that aren't the same but are equal, for example where the
> duplicate removal is in one collation but the outer comparison has
> a different explicitly given one.

Hmm.  I think this consideration boils down to whether the WHERE clause
can give different results for rows that appear equal under the rules of
UNION/EXCEPT/INTERSECT.  If it gives the same result for any two such
rows, then it's safe to push down; otherwise not.

It's not too difficult to come up with examples.  I invite you to play
with

select z,length(z) from
(select 'abc    '::char(7) as z intersect
select 'abc    '::char(8) as z) ss;

and contemplate the effects of pushing down a qual involving length(z).

Whether this particular case is very important in the real world is hard
to say.  But there might be more-important cases out there.

And yet, I think we can do it anyway.  The score card looks like this to
me:

UNION ALL: always safe to push down, since the rows will be passed
independently to the outer WHERE anyway.

UNION: it's unspecified which of a set of "equal" rows will be returned,
and therefore the behavior would be unspecified anyway if the outer
WHERE can distinguish the rows - you might get 1 row of the set out or
none.  If we push down, then we create a situation where the returned
row will always be one that passes the outer WHERE, but that is a legal
behavior.

INTERSECT: again it's unspecified which of a set of "equal" rows will be
returned, and so you might get 1 row out or none.  If we push down then
it's still unspecified whether you get a row out (example: if the outer
WHERE will pass only for rows of the left table and not the right, then
push down will result in no rows of the "equal" set being emitted, but
that's a legal behavior).

INTERSECT ALL: if a set of "equal" rows contains M rows from the left
table and N from the right table, you're supposed to get min(M,N) rows
of the set out of the INTERSECT ALL.  Again you can't say which of the
set you will get, so the outer WHERE might let anywhere between 0 and
min(M,N) rows out.  With push down, M and N will be reduced by the WHERE
before we do the intersection, so you still have 0 to min(M,N) rows out.
The behavior will change, but it's still legal per spec AFAICT.

EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

In short, it looks to me like the spec was carefully designed to allow
push down.  Pushing down a condition of this sort *does* change the
behavior, but the new behavior is still within spec.

The above analysis assumes that the WHERE condition is "stable", ie its
results for a row don't depend on the order in which the rows are tested
or anything as weird as that.  But we're assuming that already when we
push down a qual in a non-set-operation case, I think.

Comments?  Are there any other considerations to worry about?
        regards, tom lane


Re: Rules and Views

From
Stephan Szabo
Date:
On Thu, 1 Aug 2002, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > For union, queries that want to do something like use a temporary
> > sequence to act sort of like rownum and do row limiting.  Admittedly
> > that's already pretty much unspecified behavior, but it does change
> > the behavior in the place of duplicate removal.  In addition, I think
> > using bits of the spec we don't completely support you can have the
> > same issue with the undefined behavior of which duplicate is returned
> > for values that aren't the same but are equal, for example where the
> > duplicate removal is in one collation but the outer comparison has
> > a different explicitly given one.
>
> Hmm.  I think this consideration boils down to whether the WHERE clause
> can give different results for rows that appear equal under the rules of
> UNION/EXCEPT/INTERSECT.  If it gives the same result for any two such
> rows, then it's safe to push down; otherwise not.
>
> It's not too difficult to come up with examples.  I invite you to play
> with
>
> select z,length(z) from
> (select 'abc    '::char(7) as z intersect
> select 'abc    '::char(8) as z) ss;
>
> and contemplate the effects of pushing down a qual involving length(z).
>
> Whether this particular case is very important in the real world is hard
> to say.  But there might be more-important cases out there.
>
> And yet, I think we can do it anyway.  The score card looks like this to
> me:
>
> UNION ALL: always safe to push down, since the rows will be passed
> independently to the outer WHERE anyway.
>
> UNION: it's unspecified which of a set of "equal" rows will be returned,
> and therefore the behavior would be unspecified anyway if the outer
> WHERE can distinguish the rows - you might get 1 row of the set out or
> none.  If we push down, then we create a situation where the returned
> row will always be one that passes the outer WHERE, but that is a legal
> behavior.
>
> INTERSECT: again it's unspecified which of a set of "equal" rows will be
> returned, and so you might get 1 row out or none.  If we push down then
> it's still unspecified whether you get a row out (example: if the outer
> WHERE will pass only for rows of the left table and not the right, then
> push down will result in no rows of the "equal" set being emitted, but
> that's a legal behavior).
>
> INTERSECT ALL: if a set of "equal" rows contains M rows from the left
> table and N from the right table, you're supposed to get min(M,N) rows
> of the set out of the INTERSECT ALL.  Again you can't say which of the
> set you will get, so the outer WHERE might let anywhere between 0 and
> min(M,N) rows out.  With push down, M and N will be reduced by the WHERE
> before we do the intersection, so you still have 0 to min(M,N) rows out.
> The behavior will change, but it's still legal per spec AFAICT.
>

> EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

Actually I think in except you may only push down to the left, since in
this case you know that any duplicate from the right will not be
returned (since there must be none).  So, you can't potentially drop
a row from the right side that may have been a duplicate of a left
side row that does match the condition.

If we assume two collations one case sensitive one not with the
except in the non-sensitive and the where in the sensitive and
a left with 'A' and right with 'a', it'd be incorrect to push a
case sensitive where foo='A' down to the right since that'd change the
output from zero rows to one.

Something similar for except all since lowering the number of rows
on the right can increase the number of returned rows above
m-n (if say all m dups match the condition and none of n do)


> The above analysis assumes that the WHERE condition is "stable", ie its
> results for a row don't depend on the order in which the rows are tested
> or anything as weird as that.  But we're assuming that already when we
> push down a qual in a non-set-operation case, I think.

In which case we don't have to worry about the nextval() case.



Re: Rules and Views

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> Actually I think in except you may only push down to the left, since in
> this case you know that any duplicate from the right will not be
> returned (since there must be none).  So, you can't potentially drop
> a row from the right side that may have been a duplicate of a left
> side row that does match the condition.

But we *want* to push down --- the point is to get some selectivity
into the bottom queries.  You're right that in a plain EXCEPT it would
be possible to push only to the left, but that doesn't give the
performance improvement we want.

> If we assume two collations one case sensitive one not with the
> except in the non-sensitive and the where in the sensitive and
> a left with 'A' and right with 'a', it'd be incorrect to push a
> case sensitive where foo='A' down to the right since that'd change the
> output from zero rows to one.

You missed my point.  Per spec, either zero or one rows out of the whole
thing is okay, because either the 'A' or the 'a' row might be returned
as the representative row for the group by the EXCEPT.  Yes, the
behavior may change, but it's still within spec.

> In which case we don't have to worry about the nextval() case.

Yeah, I think nextval() and random() and so forth can be ignored;
the transformations we already do will confuse the results for such
cases, so one more isn't gonna make it worse.
        regards, tom lane


Re: Rules and Views

From
Hannu Krosing
Date:
On Thu, 2002-08-01 at 18:02, Tom Lane wrote:
> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > For union, queries that want to do something like use a temporary
> > sequence to act sort of like rownum and do row limiting.  Admittedly
> > that's already pretty much unspecified behavior, but it does change
> > the behavior in the place of duplicate removal.  In addition, I think
> > using bits of the spec we don't completely support you can have the
> > same issue with the undefined behavior of which duplicate is returned
> > for values that aren't the same but are equal, for example where the
> > duplicate removal is in one collation but the outer comparison has
> > a different explicitly given one.
> 
> Hmm.  I think this consideration boils down to whether the WHERE clause
> can give different results for rows that appear equal under the rules of
> UNION/EXCEPT/INTERSECT.

Yes. I originally started to ponder this when trying to draw up a plan
for automatic generation of ON UPDATE DO INSTEAD rules for views. While
pushing down the WHERE clause is just a performance thing for SELECT it
is essential for ON UPDATE rules.

> If it gives the same result for any two such
> rows, then it's safe to push down; otherwise not.
> 
> It's not too difficult to come up with examples.  I invite you to play
> with
> 
> select z,length(z) from
> (select 'abc    '::char(7) as z intersect
> select 'abc    '::char(8) as z) ss;
> 
> and contemplate the effects of pushing down a qual involving length(z).

I guess the pushdown must also push implicit conversions done to parts
of union.

if that conversion were applied to z's in both parts of UNION then the
result should be the same.


select z,length(z) from( select 'abc    '::char(7) as z union select 'abc   '::char(8) as z) ss where length(z) = 7;

becomes:

select z,length(z) from( select 'abc    '::char(7) as z  where length(cast('abc    '::char(7) as char(7))) = 7 union
select'abc   '::char(8) as z  where length(cast('abc   '::char(8) as char(7))) = 7) ss ;
 

which both return 'abc    ', 7

Of course it is beneficial to detect when the conversion is not needed,
so that indexes will be used if available. 

---------------
Hannu



Re: Rules and Views

From
Stephan Szabo
Date:
On Thu, 1 Aug 2002, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

> > If we assume two collations one case sensitive one not with the
> > except in the non-sensitive and the where in the sensitive and
> > a left with 'A' and right with 'a', it'd be incorrect to push a
> > case sensitive where foo='A' down to the right since that'd change the
> > output from zero rows to one.
>
> You missed my point.  Per spec, either zero or one rows out of the whole
> thing is okay, because either the 'A' or the 'a' row might be returned
> as the representative row for the group by the EXCEPT.  Yes, the
> behavior may change, but it's still within spec.

Except can't return 'A' or 'a', there is no representative row because
n>0. That's the difference with UNION and INTERSECT.

"If EXCEPT is specified, thenCase: A) If m>0 and n=0, then T contains exactly one duplicate of R. B) Otherwise, T
containsno duplicate of R."
 

So if T1 has a #dups>0 and T2 has a #dups>0 we should get
no rows, but what if T1' (with the clause) has a #dups>0 but
T2' has a #dups=0?




Re: Rules and Views

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> So if T1 has a #dups>0 and T2 has a #dups>0 we should get
> no rows, but what if T1' (with the clause) has a #dups>0 but
> T2' has a #dups=0?

Um, you're right --- pushing down into the right-hand side would reduce
N, thereby possibly *increasing* the number of output rows not reducing
it.  My mistake ... should have worked out the EXCEPT case in more
detail.

This says that we can't push down at all in the EXCEPT ALL case, I
think, and I'm leery about whether we should push for EXCEPT.  But
the UNION and INTERSECT cases are probably the important ones anyway.
        regards, tom lane


Re: Rules and Views

From
Stephan Szabo
Date:
On Thu, 1 Aug 2002, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > So if T1 has a #dups>0 and T2 has a #dups>0 we should get
> > no rows, but what if T1' (with the clause) has a #dups>0 but
> > T2' has a #dups=0?
>
> Um, you're right --- pushing down into the right-hand side would reduce
> N, thereby possibly *increasing* the number of output rows not reducing
> it.  My mistake ... should have worked out the EXCEPT case in more
> detail.
>
> This says that we can't push down at all in the EXCEPT ALL case, I
> think, and I'm leery about whether we should push for EXCEPT.  But
> the UNION and INTERSECT cases are probably the important ones anyway.

I think that we can push to the left in both (should is a separate issue).

If the condition is true for all of the left hand dups, we can
choose to have emitted such rows as the output of the EXCEPT ALL in
the theoretical case so that the output is the same, max(0, m-n) rows.
If the condition is false for any of the left hand dups, we can safely
return any number of rows between 0 and max(0,m-n) rows since we can
say that the difference were rows that failed the where clause.  If
we push the condition down, we'll get some number m1 rows that succeed
the condition (with m1<m), so returning max(0, m1-n) should be safe.
If the condition is false for all of the rows, m1=0 so we'll correctly
return no rows.

I think.




Re: Rules and Views

From
"Zeugswetter Andreas SB SD"
Date:
> Hmm.  I think this consideration boils down to whether the WHERE clause
> can give different results for rows that appear equal under the rules of
> UNION/EXCEPT/INTERSECT.  If it gives the same result for any two such
> rows, then it's safe to push down; otherwise not.
>
> It's not too difficult to come up with examples.  I invite you to play
> with
>
> select z,length(z) from
> (select 'abc    '::char(7) as z intersect
> select 'abc    '::char(8) as z) ss;
>
> and contemplate the effects of pushing down a qual involving
> length(z).

I guess that is why e.g. Informix returns 3 for both of them. Imho that
makes a lot of sense. The trailing spaces in char's are supposed to be
irrellevant. (But iirc this has already been discussed and rejected)

> Whether this particular case is very important in the real world is hard
> to say.  But there might be more-important cases out there.
>
> And yet, I think we can do it anyway.  The score card looks
> like this to
> me:
>
> UNION ALL: always safe to push down, since the rows will be passed
> independently to the outer WHERE anyway.

Yes, that would imho also be the most important optimization.

> UNION: it's unspecified which of a set of "equal" rows will be returned,
> and therefore the behavior would be unspecified anyway if the outer
> WHERE can distinguish the rows - you might get 1 row of the set out or
> none.  If we push down, then we create a situation where the returned
> row will always be one that passes the outer WHERE, but that
> is a legal behavior.
>
> INTERSECT: again it's unspecified which of a set of "equal" rows will be
> returned, and so you might get 1 row out or none.  If we push down then
> it's still unspecified whether you get a row out (example: if the outer
> WHERE will pass only for rows of the left table and not the right, then
> push down will result in no rows of the "equal" set being emitted, but
> that's a legal behavior).
>
> INTERSECT ALL: if a set of "equal" rows contains M rows from the left
> table and N from the right table, you're supposed to get min(M,N) rows
> of the set out of the INTERSECT ALL.  Again you can't say which of the
> set you will get, so the outer WHERE might let anywhere between 0 and
> min(M,N) rows out.  With push down, M and N will be reduced by the WHERE
> before we do the intersection, so you still have 0 to
> min(M,N) rows out.
> The behavior will change, but it's still legal per spec AFAICT.
>
> EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

The imho difficult question is, which select locks down the datatype to use
for this column. In a strict sense char(6) and char(7) are not the same
type. Since I would certainly not want to be that strict, it imho has to be
decided what type the union/intersect... is supposed to use.
Informix converts them both to the longer char. I do not think it is
valid to return variable length char's.

e.g.:
create table atab1 (a char(6));
create table atab2 (a char(8));
insert into atab1 values ('abc');
insert into atab2 values ('abc');
create view aview as select * from atab1 union all select * from atab2;
select '<'||a||'>' from aview;
Informix:
(expression)
<abc     >
<abc     >
PostgreSQL: ?column?
------------<abc   ><abc     >

I am not sure eighter answer is strictly correct. I would probably have
expected <abc   > <abc   > (char(6)) since the first select is supposed to
lock down the type, no ?

> In short, it looks to me like the spec was carefully designed to allow
> push down.  Pushing down a condition of this sort *does* change the
> behavior, but the new behavior is still within spec.

I think this would be a great performance boost for views and thus
worth a change in results that are within spec.
Would you want to push down always ? There could be outer where clauses,
that are so expensive that you would not want to do them twice.
If it is all or nothing, I do think pushing down always is better than not.

Andreas