Thread: Rules and Views
I'm having a weird problem on my " PostgreSQL 7.2.1 on i386--netbsdelf, compiled by GCC 2.95.3" system. Executing these commands: CREATE TABLE test_one (id int PRIMARY KEY, value_one text); CREATE TABLE test_two (id int PRIMARY KEY, value_two text); CREATE VIEW test AS SELECT test_one.id, value_one, value_two FROM test_one JOIN test_two USING (id); CREATE RULE test_insert AS ON INSERT TO test DO (INSERT INTO test_one (id, value_one) VALUES (NEW.id, NEW.value_one);INSERTINTO test_two (id, value_two) VALUES (NEW.id, NEW.value_two);); INSERT INTO test VALUES (1, 'one', 'onemore'); returns "ERROR: Cannot insert into a view without an appropriate rule" for that last statement. The rule does show up in pg_rules, though. What am I doing wrong here? Is there a bug? cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Curt Sampson <cjs@cynic.net> writes: > CREATE VIEW test AS ... > CREATE RULE test_insert AS > ON INSERT TO test > DO ... > INSERT INTO test VALUES (1, 'one', 'onemore'); > ERROR: Cannot insert into a view without an appropriate rule > What am I doing wrong here? Is there a bug? Make that "ON INSERT DO INSTEAD". As coded, the rule leaves the original insertion into the view still active. Perhaps the error message could be phrased better --- any thoughts? regards, tom lane
On Tue, 30 Jul 2002, Tom Lane wrote: > Curt Sampson <cjs@cynic.net> writes: > > CREATE VIEW test AS ... > > CREATE RULE test_insert AS > > ON INSERT TO test > > DO ... > > INSERT INTO test VALUES (1, 'one', 'onemore'); > > ERROR: Cannot insert into a view without an appropriate rule > > > What am I doing wrong here? Is there a bug? > > Make that "ON INSERT DO INSTEAD". As coded, the rule leaves the > original insertion into the view still active. Ah, I see! My model of how this was working was wrong. > Perhaps the error message could be phrased better --- any thoughts? Maybe a message that says something along the lines of "cannot insert into views; you need to override this behaviour with a rule"? Also, some examples in the manual would be helpful. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Curt Sampson <cjs@cynic.net> writes: > ERROR: Cannot insert into a view without an appropriate rule >> Perhaps the error message could be phrased better --- any thoughts? > Maybe a message that says something along the lines of "cannot insert > into views; you need to override this behaviour with a rule"? Well, to my mind that's what the error message says now. The reason it didn't help you was that you *did* have a rule ... but it didn't completely override the view insertion. I'm not sure how to phrase a more useful message. Note that the place where the error can be detected doesn't have any good way to know that a non-INSTEAD rule was in fact processed, so we can't say anything quite as obvious as "You needed to use INSTEAD in your rule, luser". Can we cover both the no-rule-at-all case and the had-a-rule-but-it-wasn't- INSTEAD case in a single, reasonably phrased error message? (Just to make life interesting, there's also the case where you made an INSTEAD rule but it's conditional.) > Also, some examples in the manual would be helpful. Aren't there several already? But feel free to contribute more... regards, tom lane
On Wed, 31 Jul 2002, Tom Lane wrote: > Well, to my mind that's what the error message says now. The reason > it didn't help you was that you *did* have a rule ... but it didn't > completely override the view insertion. Right, like I said, my model was wrong. I didn't think of the error message as being an "insert behaviour" that had to be overridden; I thought of it as a "there is no behaviour right now" message. Maybe it's just me not reading the docs all that well; I wouldn't worry about this if it's not been a problem for others. > > Also, some examples in the manual would be helpful. > > Aren't there several already? But feel free to contribute more... Yeah, but nothing showing these rules on a view across two tables. I'll try to work it out and send it here for comments. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Curt Sampson <cjs@cynic.net> writes: > On Wed, 31 Jul 2002, Tom Lane wrote: >> Well, to my mind that's what the error message says now. The reason >> it didn't help you was that you *did* have a rule ... but it didn't >> completely override the view insertion. > Right, like I said, my model was wrong. I didn't think of the error > message as being an "insert behaviour" that had to be overridden; I > thought of it as a "there is no behaviour right now" message. Hm. How about ERROR: Cannot insert into a viewYou need an unconditional ON INSERT DO INSTEAD rule regards, tom lane
On Wed, 31 Jul 2002, Tom Lane wrote: > ERROR: Cannot insert into a view > You need an unconditional ON INSERT DO INSTEAD rule Sounds great to me! cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
On Wed, 2002-07-31 at 10:22, Tom Lane wrote: > Curt Sampson <cjs@cynic.net> writes: > > On Wed, 31 Jul 2002, Tom Lane wrote: > >> Well, to my mind that's what the error message says now. The reason > >> it didn't help you was that you *did* have a rule ... but it didn't > >> completely override the view insertion. > > > Right, like I said, my model was wrong. I didn't think of the error > > message as being an "insert behaviour" that had to be overridden; I > > thought of it as a "there is no behaviour right now" message. > > Hm. How about > > ERROR: Cannot insert into a view > You need an unconditional ON INSERT DO INSTEAD rule Seems more accurate, but actually you may also have two or more conditional rules that cover all possibilities if taken together. Maybe ERROR: Cannot insert into a view You need an ON INSERT DO INSTEAD rule that matches your INSERT Which covers both cases. ----------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: > On Wed, 2002-07-31 at 10:22, Tom Lane wrote: >> Hm. How about >> >> ERROR: Cannot insert into a view >> You need an unconditional ON INSERT DO INSTEAD rule > Seems more accurate, but actually you may also have two or more > conditional rules that cover all possibilities if taken together. > Maybe > ERROR: Cannot insert into a view > You need an ON INSERT DO INSTEAD rule that matches your INSERT > Which covers both cases. Actually not: the system insists that you provide an unconditional DO INSTEAD rule. The other would require trying to prove (during rule expansion) a theorem that the conditions of the available conditional rules cover all possible cases. Alternatively we could move the test for insertion-into-a-view out of the rewriter and into a low level of the executor, producing an error message only if some inserted tuple actually gets past the rule conditions. I don't much care for that answer because (a) it turns a once-per-query overhead check into once-per-tuple overhead, and (b) if you fail to span the full space of possibilities in your rule conditions, you might not find out about it until your application goes belly-up in production. There's some version of Murphy's Law that says rare conditions arise with very low probability during testing, and very high probability as soon as you go live... regards, tom lane
> > Seems more accurate, but actually you may also have two or more > > conditional rules that cover all possibilities if taken together. > > Maybe > > ERROR: Cannot insert into a view > > You need an ON INSERT DO INSTEAD rule that matches your INSERT > > Which covers both cases. > > Actually not: the system insists that you provide an unconditional > DO INSTEAD rule. The other would require trying to prove (during > rule expansion) a theorem that the conditions of the available > conditional rules cover all possible cases. > > Alternatively we could move the test for insertion-into-a-view out of > the rewriter and into a low level of the executor, producing an error > message only if some inserted tuple actually gets past the rule > conditions. I don't much care for that answer because (a) it turns a > once-per-query overhead check into once-per-tuple overhead, and Since I see a huge benefit in allowing conditional rules for a view, I think it is worth finding a solution. The current rewriter test could still catch the case where no instead rule exists at all. The utility is "Table Partitioning by expression". Basically you have a union view like: create view history as select * from history2000 where yearcol=2000 union all select * from history2001 where yearcol=2001 You get the idea. Now you need conditional insert and update rules to act on the correct table. Maybe we would also need additional intelligence in the planner to eliminate the history2000 table in a select * from history where yearcol=2001. But that is all you need for a really useful feature for large databases. > (b) if you fail to span the full space of possibilities in your rule > conditions, you might not find out about it until your application goes > belly-up in production. There's some version of Murphy's Law that says > rare conditions arise with very low probability during testing, and very > high probability as soon as you go live... This is true for other db's table partitioning capabilities as well, and they still implement the feature. Andreas
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes: > Since I see a huge benefit in allowing conditional rules for a view, > I think it is worth finding a solution. We do allow conditional rules for a view. You just have to write an unconditional one too (which can be merely DO INSTEAD NOTHING). regards, tom lane
> > Since I see a huge benefit in allowing conditional rules for a view, > > I think it is worth finding a solution. > > We do allow conditional rules for a view. You just have to write an > unconditional one too (which can be merely DO INSTEAD NOTHING). Hmm, but you cannot then trow an error, but that is prbbly a minor issue. Good that we can do Table Partitioning :-) Andreas
On Wed, 31 Jul 2002, Zeugswetter Andreas SB SD wrote: > The utility is "Table Partitioning by expression". > > Basically you have a union view like: > create view history as > select * from history2000 where yearcol=2000 > union all > select * from history2001 where yearcol=2001 You want to be careful with this sort of stuff, since the query planner sometimes won't do the view as efficiently as it would do the fully specified equivalant query. I've posted about this here before. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Curt Sampson <cjs@cynic.net> writes: > You want to be careful with this sort of stuff, since the query planner > sometimes won't do the view as efficiently as it would do the fully > specified equivalant query. I've posted about this here before. Please provide an example. AFAIK a view is a query macro, and nothing else. regards, tom lane
On Thu, 1 Aug 2002, Tom Lane wrote: > Curt Sampson <cjs@cynic.net> writes: > > You want to be careful with this sort of stuff, since the query planner > > sometimes won't do the view as efficiently as it would do the fully > > specified equivalant query. I've posted about this here before. > > Please provide an example. AFAIK a view is a query macro, and nothing > else. I already did provide an example, and you even replied to it. :-) See the appended message. BTW, this page http://archives.postgresql.org/pgsql-general/2002-06/threads.php does not display in Navigator 4.78. Otherwise I would have provided a reference to the thread in the archive. Maybe we need a web based form for reporting problem pages in the archives. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Curt Sampson <cjs@cynic.net> writes: > On Thu, 1 Aug 2002, Tom Lane wrote: >> Curt Sampson <cjs@cynic.net> writes: > You want to be careful with this sort of stuff, since the query planner > sometimes won't do the view as efficiently as it would do the fully > specified equivalant query. I've posted about this here before. >> >> Please provide an example. AFAIK a view is a query macro, and nothing >> else. > I already did provide an example, and you even replied to it. :-) But that isn't an "equivalent query". You've manually transformed SELECT * FROM (SELECT something UNION SELECT somethingelse)WHERE foo; into (SELECT something WHERE foo) UNION (SELECT somethingelse WHERE foo); As has been pointed out repeatedly, it's not entirely obvious whether this is a valid transformation in the general case. (The knee-jerk reaction that it's obviously right should be held in check, since SQL's three-valued notion of boolean logic tends to trip up the intuition.) If you can provide a proof that it's always safe, or that it's safe under such-and-such conditions, I'll see what I can do about making it happen. regards, tom lane
On Thu, 1 Aug 2002, Tom Lane wrote: > But that isn't an "equivalent query". You've manually transformed > SELECT * FROM (SELECT something UNION SELECT somethingelse) WHERE foo; > into > (SELECT something WHERE foo) UNION (SELECT somethingelse WHERE foo); Right. > As has been pointed out repeatedly, it's not entirely obvious whether > this is a valid transformation in the general case. Right. And I agreed that it as soon as you first pointed it out. And still do. But the message I was replying to was a similar union query, and I was thinking that that person might be having a similar initial intuitive reaction, "well, it looks kinda the same." I just wanted to note that you need to check this stuff with explain, rather than blindly assuming you know what's going on. > If you can provide a proof that it's always safe, or that it's safe > under such-and-such conditions, I'll see what I can do about making it > happen. It's on my list of things to do, but not high enough that it's likely I'll ever get to it. :-) BTW, if anybody can think of a way to make a view that really does represent my original query, I'd appreciate a hint. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
> But the message I was replying to was a similar union query, and I was > thinking that that person might be having a similar initial intuitive > reaction, "well, it looks kinda the same." I just wanted to note that > you need to check this stuff with explain, rather than > blindly assuming > you know what's going on. I had a "union all" view, which is actually a quite different animal than a "union" view which needs to eliminate duplicates before further processing. Andreas
On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote: > I had a "union all" view, which is actually a quite different animal than > a "union" view which needs to eliminate duplicates before further processing. I had the same problem with UNION ALL. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
On Thu, 2002-08-01 at 12:29, Curt Sampson wrote: > On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote: > > > I had a "union all" view, which is actually a quite different animal than > > a "union" view which needs to eliminate duplicates before further processing. > > I had the same problem with UNION ALL. > Could someone give an example where it is not safe to push the WHERE clause down to individual parts of UNION (or UNION ALL) wher these parts are simple (non-aggregate) queries? I can see that it has to be made into HAVING in subquery if UNION's subqueries are aggregate (GROUP BY) queries, but can anyone give an example where the meaning of the query changes for non-aggregate subqueries. --------------- Hannu
On 1 Aug 2002, Hannu Krosing wrote: > On Thu, 2002-08-01 at 12:29, Curt Sampson wrote: > > On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote: > > > > > I had a "union all" view, which is actually a quite different animal than > > > a "union" view which needs to eliminate duplicates before further processing. > > > > I had the same problem with UNION ALL. > > > > Could someone give an example where it is not safe to push the WHERE > clause down to individual parts of UNION (or UNION ALL) wher these parts > are simple (non-aggregate) queries? For union, queries that want to do something like use a temporary sequence to act sort of like rownum and do row limiting. Admittedly that's already pretty much unspecified behavior, but it does change the behavior in the place of duplicate removal. In addition, I think using bits of the spec we don't completely support you can have the same issue with the undefined behavior of which duplicate is returned for values that aren't the same but are equal, for example where the duplicate removal is in one collation but the outer comparison has a different explicitly given one. I haven't come up with any useful examples, and not really any for union all, however.
On Thu, 1 Aug 2002, Stephan Szabo wrote: > On 1 Aug 2002, Hannu Krosing wrote: > > > On Thu, 2002-08-01 at 12:29, Curt Sampson wrote: > > > On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote: > > > > > > > I had a "union all" view, which is actually a quite different animal than > > > > a "union" view which needs to eliminate duplicates before further processing. > > > > > > I had the same problem with UNION ALL. > > > > > > > Could someone give an example where it is not safe to push the WHERE > > clause down to individual parts of UNION (or UNION ALL) wher these parts > > are simple (non-aggregate) queries? > > For union, queries that want to do something like use a temporary > sequence to act sort of like rownum and do row limiting. Admittedly > that's already pretty much unspecified behavior, but it does change > the behavior in the place of duplicate removal. In addition, I think > using bits of the spec we don't completely support you can have the > same issue with the undefined behavior of which duplicate is returned > for values that aren't the same but are equal, for example where the > duplicate removal is in one collation but the outer comparison has > a different explicitly given one. Replying to myself, you can do this right now with char columns if you just push the conditions down blindly, something like: create table t1(a char(5)); create table t2(a char(6)); insert into t1 values ('aaaaa'); insert into t2 values ('aaaaa'); select * from (select * from t2 union select * from t1) as f wherea::text='aaaaa'; select * from (select * from t2 where a::text='aaaaa' unionselect * from t1 where a::text='aaaaa') as f; The first select gives no rows, the second gives one. We'd have to transform the second where clause to something like cast(a as char(6))::text='aaaaa' in order to get the same effect I think.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > For union, queries that want to do something like use a temporary > sequence to act sort of like rownum and do row limiting. Admittedly > that's already pretty much unspecified behavior, but it does change > the behavior in the place of duplicate removal. In addition, I think > using bits of the spec we don't completely support you can have the > same issue with the undefined behavior of which duplicate is returned > for values that aren't the same but are equal, for example where the > duplicate removal is in one collation but the outer comparison has > a different explicitly given one. Hmm. I think this consideration boils down to whether the WHERE clause can give different results for rows that appear equal under the rules of UNION/EXCEPT/INTERSECT. If it gives the same result for any two such rows, then it's safe to push down; otherwise not. It's not too difficult to come up with examples. I invite you to play with select z,length(z) from (select 'abc '::char(7) as z intersect select 'abc '::char(8) as z) ss; and contemplate the effects of pushing down a qual involving length(z). Whether this particular case is very important in the real world is hard to say. But there might be more-important cases out there. And yet, I think we can do it anyway. The score card looks like this to me: UNION ALL: always safe to push down, since the rows will be passed independently to the outer WHERE anyway. UNION: it's unspecified which of a set of "equal" rows will be returned, and therefore the behavior would be unspecified anyway if the outer WHERE can distinguish the rows - you might get 1 row of the set out or none. If we push down, then we create a situation where the returned row will always be one that passes the outer WHERE, but that is a legal behavior. INTERSECT: again it's unspecified which of a set of "equal" rows will be returned, and so you might get 1 row out or none. If we push down then it's still unspecified whether you get a row out (example: if the outer WHERE will pass only for rows of the left table and not the right, then push down will result in no rows of the "equal" set being emitted, but that's a legal behavior). INTERSECT ALL: if a set of "equal" rows contains M rows from the left table and N from the right table, you're supposed to get min(M,N) rows of the set out of the INTERSECT ALL. Again you can't say which of the set you will get, so the outer WHERE might let anywhere between 0 and min(M,N) rows out. With push down, M and N will be reduced by the WHERE before we do the intersection, so you still have 0 to min(M,N) rows out. The behavior will change, but it's still legal per spec AFAICT. EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold. In short, it looks to me like the spec was carefully designed to allow push down. Pushing down a condition of this sort *does* change the behavior, but the new behavior is still within spec. The above analysis assumes that the WHERE condition is "stable", ie its results for a row don't depend on the order in which the rows are tested or anything as weird as that. But we're assuming that already when we push down a qual in a non-set-operation case, I think. Comments? Are there any other considerations to worry about? regards, tom lane
On Thu, 1 Aug 2002, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > For union, queries that want to do something like use a temporary > > sequence to act sort of like rownum and do row limiting. Admittedly > > that's already pretty much unspecified behavior, but it does change > > the behavior in the place of duplicate removal. In addition, I think > > using bits of the spec we don't completely support you can have the > > same issue with the undefined behavior of which duplicate is returned > > for values that aren't the same but are equal, for example where the > > duplicate removal is in one collation but the outer comparison has > > a different explicitly given one. > > Hmm. I think this consideration boils down to whether the WHERE clause > can give different results for rows that appear equal under the rules of > UNION/EXCEPT/INTERSECT. If it gives the same result for any two such > rows, then it's safe to push down; otherwise not. > > It's not too difficult to come up with examples. I invite you to play > with > > select z,length(z) from > (select 'abc '::char(7) as z intersect > select 'abc '::char(8) as z) ss; > > and contemplate the effects of pushing down a qual involving length(z). > > Whether this particular case is very important in the real world is hard > to say. But there might be more-important cases out there. > > And yet, I think we can do it anyway. The score card looks like this to > me: > > UNION ALL: always safe to push down, since the rows will be passed > independently to the outer WHERE anyway. > > UNION: it's unspecified which of a set of "equal" rows will be returned, > and therefore the behavior would be unspecified anyway if the outer > WHERE can distinguish the rows - you might get 1 row of the set out or > none. If we push down, then we create a situation where the returned > row will always be one that passes the outer WHERE, but that is a legal > behavior. > > INTERSECT: again it's unspecified which of a set of "equal" rows will be > returned, and so you might get 1 row out or none. If we push down then > it's still unspecified whether you get a row out (example: if the outer > WHERE will pass only for rows of the left table and not the right, then > push down will result in no rows of the "equal" set being emitted, but > that's a legal behavior). > > INTERSECT ALL: if a set of "equal" rows contains M rows from the left > table and N from the right table, you're supposed to get min(M,N) rows > of the set out of the INTERSECT ALL. Again you can't say which of the > set you will get, so the outer WHERE might let anywhere between 0 and > min(M,N) rows out. With push down, M and N will be reduced by the WHERE > before we do the intersection, so you still have 0 to min(M,N) rows out. > The behavior will change, but it's still legal per spec AFAICT. > > EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold. Actually I think in except you may only push down to the left, since in this case you know that any duplicate from the right will not be returned (since there must be none). So, you can't potentially drop a row from the right side that may have been a duplicate of a left side row that does match the condition. If we assume two collations one case sensitive one not with the except in the non-sensitive and the where in the sensitive and a left with 'A' and right with 'a', it'd be incorrect to push a case sensitive where foo='A' down to the right since that'd change the output from zero rows to one. Something similar for except all since lowering the number of rows on the right can increase the number of returned rows above m-n (if say all m dups match the condition and none of n do) > The above analysis assumes that the WHERE condition is "stable", ie its > results for a row don't depend on the order in which the rows are tested > or anything as weird as that. But we're assuming that already when we > push down a qual in a non-set-operation case, I think. In which case we don't have to worry about the nextval() case.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > Actually I think in except you may only push down to the left, since in > this case you know that any duplicate from the right will not be > returned (since there must be none). So, you can't potentially drop > a row from the right side that may have been a duplicate of a left > side row that does match the condition. But we *want* to push down --- the point is to get some selectivity into the bottom queries. You're right that in a plain EXCEPT it would be possible to push only to the left, but that doesn't give the performance improvement we want. > If we assume two collations one case sensitive one not with the > except in the non-sensitive and the where in the sensitive and > a left with 'A' and right with 'a', it'd be incorrect to push a > case sensitive where foo='A' down to the right since that'd change the > output from zero rows to one. You missed my point. Per spec, either zero or one rows out of the whole thing is okay, because either the 'A' or the 'a' row might be returned as the representative row for the group by the EXCEPT. Yes, the behavior may change, but it's still within spec. > In which case we don't have to worry about the nextval() case. Yeah, I think nextval() and random() and so forth can be ignored; the transformations we already do will confuse the results for such cases, so one more isn't gonna make it worse. regards, tom lane
On Thu, 2002-08-01 at 18:02, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > For union, queries that want to do something like use a temporary > > sequence to act sort of like rownum and do row limiting. Admittedly > > that's already pretty much unspecified behavior, but it does change > > the behavior in the place of duplicate removal. In addition, I think > > using bits of the spec we don't completely support you can have the > > same issue with the undefined behavior of which duplicate is returned > > for values that aren't the same but are equal, for example where the > > duplicate removal is in one collation but the outer comparison has > > a different explicitly given one. > > Hmm. I think this consideration boils down to whether the WHERE clause > can give different results for rows that appear equal under the rules of > UNION/EXCEPT/INTERSECT. Yes. I originally started to ponder this when trying to draw up a plan for automatic generation of ON UPDATE DO INSTEAD rules for views. While pushing down the WHERE clause is just a performance thing for SELECT it is essential for ON UPDATE rules. > If it gives the same result for any two such > rows, then it's safe to push down; otherwise not. > > It's not too difficult to come up with examples. I invite you to play > with > > select z,length(z) from > (select 'abc '::char(7) as z intersect > select 'abc '::char(8) as z) ss; > > and contemplate the effects of pushing down a qual involving length(z). I guess the pushdown must also push implicit conversions done to parts of union. if that conversion were applied to z's in both parts of UNION then the result should be the same. select z,length(z) from( select 'abc '::char(7) as z union select 'abc '::char(8) as z) ss where length(z) = 7; becomes: select z,length(z) from( select 'abc '::char(7) as z where length(cast('abc '::char(7) as char(7))) = 7 union select'abc '::char(8) as z where length(cast('abc '::char(8) as char(7))) = 7) ss ; which both return 'abc ', 7 Of course it is beneficial to detect when the conversion is not needed, so that indexes will be used if available. --------------- Hannu
On Thu, 1 Aug 2002, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > If we assume two collations one case sensitive one not with the > > except in the non-sensitive and the where in the sensitive and > > a left with 'A' and right with 'a', it'd be incorrect to push a > > case sensitive where foo='A' down to the right since that'd change the > > output from zero rows to one. > > You missed my point. Per spec, either zero or one rows out of the whole > thing is okay, because either the 'A' or the 'a' row might be returned > as the representative row for the group by the EXCEPT. Yes, the > behavior may change, but it's still within spec. Except can't return 'A' or 'a', there is no representative row because n>0. That's the difference with UNION and INTERSECT. "If EXCEPT is specified, thenCase: A) If m>0 and n=0, then T contains exactly one duplicate of R. B) Otherwise, T containsno duplicate of R." So if T1 has a #dups>0 and T2 has a #dups>0 we should get no rows, but what if T1' (with the clause) has a #dups>0 but T2' has a #dups=0?
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > So if T1 has a #dups>0 and T2 has a #dups>0 we should get > no rows, but what if T1' (with the clause) has a #dups>0 but > T2' has a #dups=0? Um, you're right --- pushing down into the right-hand side would reduce N, thereby possibly *increasing* the number of output rows not reducing it. My mistake ... should have worked out the EXCEPT case in more detail. This says that we can't push down at all in the EXCEPT ALL case, I think, and I'm leery about whether we should push for EXCEPT. But the UNION and INTERSECT cases are probably the important ones anyway. regards, tom lane
On Thu, 1 Aug 2002, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > So if T1 has a #dups>0 and T2 has a #dups>0 we should get > > no rows, but what if T1' (with the clause) has a #dups>0 but > > T2' has a #dups=0? > > Um, you're right --- pushing down into the right-hand side would reduce > N, thereby possibly *increasing* the number of output rows not reducing > it. My mistake ... should have worked out the EXCEPT case in more > detail. > > This says that we can't push down at all in the EXCEPT ALL case, I > think, and I'm leery about whether we should push for EXCEPT. But > the UNION and INTERSECT cases are probably the important ones anyway. I think that we can push to the left in both (should is a separate issue). If the condition is true for all of the left hand dups, we can choose to have emitted such rows as the output of the EXCEPT ALL in the theoretical case so that the output is the same, max(0, m-n) rows. If the condition is false for any of the left hand dups, we can safely return any number of rows between 0 and max(0,m-n) rows since we can say that the difference were rows that failed the where clause. If we push the condition down, we'll get some number m1 rows that succeed the condition (with m1<m), so returning max(0, m1-n) should be safe. If the condition is false for all of the rows, m1=0 so we'll correctly return no rows. I think.
> Hmm. I think this consideration boils down to whether the WHERE clause > can give different results for rows that appear equal under the rules of > UNION/EXCEPT/INTERSECT. If it gives the same result for any two such > rows, then it's safe to push down; otherwise not. > > It's not too difficult to come up with examples. I invite you to play > with > > select z,length(z) from > (select 'abc '::char(7) as z intersect > select 'abc '::char(8) as z) ss; > > and contemplate the effects of pushing down a qual involving > length(z). I guess that is why e.g. Informix returns 3 for both of them. Imho that makes a lot of sense. The trailing spaces in char's are supposed to be irrellevant. (But iirc this has already been discussed and rejected) > Whether this particular case is very important in the real world is hard > to say. But there might be more-important cases out there. > > And yet, I think we can do it anyway. The score card looks > like this to > me: > > UNION ALL: always safe to push down, since the rows will be passed > independently to the outer WHERE anyway. Yes, that would imho also be the most important optimization. > UNION: it's unspecified which of a set of "equal" rows will be returned, > and therefore the behavior would be unspecified anyway if the outer > WHERE can distinguish the rows - you might get 1 row of the set out or > none. If we push down, then we create a situation where the returned > row will always be one that passes the outer WHERE, but that > is a legal behavior. > > INTERSECT: again it's unspecified which of a set of "equal" rows will be > returned, and so you might get 1 row out or none. If we push down then > it's still unspecified whether you get a row out (example: if the outer > WHERE will pass only for rows of the left table and not the right, then > push down will result in no rows of the "equal" set being emitted, but > that's a legal behavior). > > INTERSECT ALL: if a set of "equal" rows contains M rows from the left > table and N from the right table, you're supposed to get min(M,N) rows > of the set out of the INTERSECT ALL. Again you can't say which of the > set you will get, so the outer WHERE might let anywhere between 0 and > min(M,N) rows out. With push down, M and N will be reduced by the WHERE > before we do the intersection, so you still have 0 to > min(M,N) rows out. > The behavior will change, but it's still legal per spec AFAICT. > > EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold. The imho difficult question is, which select locks down the datatype to use for this column. In a strict sense char(6) and char(7) are not the same type. Since I would certainly not want to be that strict, it imho has to be decided what type the union/intersect... is supposed to use. Informix converts them both to the longer char. I do not think it is valid to return variable length char's. e.g.: create table atab1 (a char(6)); create table atab2 (a char(8)); insert into atab1 values ('abc'); insert into atab2 values ('abc'); create view aview as select * from atab1 union all select * from atab2; select '<'||a||'>' from aview; Informix: (expression) <abc > <abc > PostgreSQL: ?column? ------------<abc ><abc > I am not sure eighter answer is strictly correct. I would probably have expected <abc > <abc > (char(6)) since the first select is supposed to lock down the type, no ? > In short, it looks to me like the spec was carefully designed to allow > push down. Pushing down a condition of this sort *does* change the > behavior, but the new behavior is still within spec. I think this would be a great performance boost for views and thus worth a change in results that are within spec. Would you want to push down always ? There could be outer where clauses, that are so expensive that you would not want to do them twice. If it is all or nothing, I do think pushing down always is better than not. Andreas