Thread: Another reason to redesign querytree representation
Try this one: create table fltsrc (f1 float8); insert into fltsrc values (1.0); insert into fltsrc values (1.1); create table intdest (f1 int4); insert into intdest select distinct f1 from fltsrc; Currently, this coredumps because it tries to apply float8lt to integer values; the result of column f1 has been coerced to the eventual destination format (int4) before it ever gets out of the SELECT stage. But the parser assigned the sortop to use for DISTINCT while f1 was still float :-(. In 6.4.2, there's no coredump, but only one tuple gets inserted into intdest, because the DISTINCT processing is done on int4 values. I claim that is wrong too, because "select distinct f1 from fltsrc" yields two tuples not one. As far as I can see, there is no way to represent doing the Right Thing with the current querytree representation. Either the targetlist expression includes a coercion to int4 or it doesn't; we can't represent "do the DISTINCT sort and filter on float8, *then* coerce to int4" in a single-level targetlist. Meanwhile, Jan has been muttering that he can't really do rules right without an RTE entry for the result of a sub-select. And the more I look at the UNION/EXCEPT/INTERSECT code, the less I like it. Maybe it is time to swallow hard and redesign the querytree representation? I think we really need a multi-level-plan data structure right out of the starting gate... a format closer to the plantree structure that the executor uses would probably be less trouble all around. regards, tom lane
> As far as I can see, there is no way to represent doing the Right > Thing with the current querytree representation. Either the > targetlist expression includes a coercion to int4 or it doesn't; > we can't represent "do the DISTINCT sort and filter on float8, > *then* coerce to int4" in a single-level targetlist. > > Meanwhile, Jan has been muttering that he can't really do rules > right without an RTE entry for the result of a sub-select. And > the more I look at the UNION/EXCEPT/INTERSECT code, the less I > like it. > > Maybe it is time to swallow hard and redesign the querytree > representation? I think we really need a multi-level-plan > data structure right out of the starting gate... a format > closer to the plantree structure that the executor uses would > probably be less trouble all around. The current system was designed by me. It was very little code, and I was quite suprised it worked as well as it does. Feel free to redesign it. I did it to be small and nimble. Seems it is not flexible enough. Got to admit, it is a nice trick to get UNIONS, which is all it was really designed to do. At the time, it was neat to get any feature out of so little code. Ah, the buggy pre-6.5 days. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Maybe it is time to swallow hard and redesign the querytree > representation? I think we really need a multi-level-plan > data structure right out of the starting gate... a format > closer to the plantree structure that the executor uses would > probably be less trouble all around. Well, while you do that keep in mind outer joins. We've been discussing it, and it is pretty clear that rte's cannot hold the join info, since the same source table can participate in more than one join. So we might have to carry the info in a special kind of qualification node, which allows the planner/optimizer to generate subselects below and above that node, but does not allow it to try the full range of different combinations of joins across it (since it appears to me that outer joins have only limited possibilities for rearrangement of the query). But perhaps a redesigned querytree could carry the info more naturally... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Maybe it is time to swallow hard and redesign the querytree >> representation? > Well, while you do that keep in mind outer joins. While *I* redesign it? I barely know what an outer join is. I'm happy to kibitz while someone else redesigns it, but I don't think I'm qualified to be the lead dog... regards, tom lane
> I'm happy to kibitz while someone else redesigns it, but I don't > think I'm qualified to be the lead dog... Well, that's OK, I barely know what a query tree is. But I think I know what I want for outer join behavior. Let's keep talking if you are mucking with the query tree stuff... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > >> Maybe it is time to swallow hard and redesign the querytree > >> representation? > > > Well, while you do that keep in mind outer joins. > > While *I* redesign it? I barely know what an outer join is. > > I'm happy to kibitz while someone else redesigns it, but I don't > think I'm qualified to be the lead dog... Did you want to do the change for UNION, or were you just suggesting it be done? I can easily add it to the TODO list. Done: * redesign UNION structures to have separarate target lists. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Did you want to do the change for UNION, or were you just suggesting it > be done? I can easily add it to the TODO list. > Done: > * redesign UNION structures to have separarate target lists. Actually, it's not so much UNION that's busted as it is INSERT. The parser problems could be dealt with by having a two-level structure for INSERT ... SELECT ..., so that the targetlist for the eventual INSERT could be described without changing the semantics of the underlying SELECT. There might be other extensions needed for rules (paging Jan...) but as far as what I've been looking at goes, the TODO entry could be just * redesign INSERT ... SELECT to have two levels of target list. Thomas, what do you think is needed for outer joins? regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Did you want to do the change for UNION, or were you just suggesting it > > be done? I can easily add it to the TODO list. > > Done: > > > * redesign UNION structures to have separarate target lists. > > Actually, it's not so much UNION that's busted as it is INSERT. > The parser problems could be dealt with by having a two-level structure > for INSERT ... SELECT ..., so that the targetlist for the eventual > INSERT could be described without changing the semantics of the > underlying SELECT. > > There might be other extensions needed for rules (paging Jan...) but > as far as what I've been looking at goes, the TODO entry could be just > > * redesign INSERT ... SELECT to have two levels of target list. Removed: * Be smarter about promoting types when UNION merges different data types * SELECT ... UNION ... GROUP BY fails if column types disagree * INSERT ... SELECT ... UNION is not reliable And added: * redesign INSERT ... SELECT to have two levels of target list -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Removed: > * Be smarter about promoting types when UNION merges different data types > * SELECT ... UNION ... GROUP BY fails if column types disagree > * INSERT ... SELECT ... UNION is not reliable > And added: > * redesign INSERT ... SELECT to have two levels of target list Er ... wait. The first two of those TODO items were separate issues. I think they can be fixed without changing the querytree representation. The third can be superseded by this item, though. regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Removed: > > * Be smarter about promoting types when UNION merges different data types > > * SELECT ... UNION ... GROUP BY fails if column types disagree > > * INSERT ... SELECT ... UNION is not reliable > > > And added: > > > * redesign INSERT ... SELECT to have two levels of target list > > Er ... wait. The first two of those TODO items were separate issues. > I think they can be fixed without changing the querytree representation. > The third can be superseded by this item, though. > Done. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Thomas, what do you think is needed for outer joins? Bruce and I have talked about it some already: For outer joins, tables must be combined in a particular order. For example, a left outer join requires that any entries in the left-side table which do not have a corresponding entry in the right-side table be expanded with nulls during the join. The information on the outer join can't be carried by the rte since the same table can appear twice in an outer join expression: select * from t1 left join t2 using (i) left join t1 on (i = t1.j); For a query like select * from t1 left join t2 using (i) where t2.j = 3; istm that the outer join must be done before the t2 qualification is applied, and that another ordering may produce the wrong result. >From what I understand Bruce to say, the planner/optimizer is allowed to try all kinds of permutations of plans, choosing the one with the lowest cost. But if the info for the join is carried in a qualification node, then the planner/optimizer must know that it can't reorder the query as freely as it does now. I was thinking of having a new qualification node to carry this info, and it could be transformed into a mergejoin node which has a couple of new fields indicating left and/or right outer join behavior. A hashjoin method may be possible for queries which are structured as a left outer join; other outer joins will need to use the mergejoin method. Also, some poorly-qualified outer joins reduce to inner joins, and perhaps the optimizer can be smart enough to realize this. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Thomas, what do you think is needed for outer joins? > The information on the outer > join can't be carried by the rte since the same table can appear twice > in an outer join expression: > > select * from t1 left join t2 using (i) > left join t1 on (i = t1.j); Is that actually a valid query? Wouldn't you at least need to rename one or the other appearance of t1? (Nitpick, probably, but maybe I am not understanding something...) > For a query like > select * from t1 left join t2 using (i) where t2.j = 3; > istm that the outer join must be done before the t2 qualification is > applied, and that another ordering may produce the wrong result. It's not immediately obvious what the semantics of that ought to be... but I agree it makes a difference if you eliminate t2 rows before rather than after the join. This looks to me like the syntactic notion is that t1-left-join-t2 is already a single table as far as the rest of the SELECT is concerned. (But what happens if they have column names in common, other than i?) In which case you're right, join first then apply the WHERE condition is presumably what's supposed to happen. If that's the way it works, I think that an RTE describing the joined table is the natural way to handle it. Obviously this would not be a primitive node; it would have to be some kind of structure of nodes. > I was thinking of having a new qualification node to carry this info, You would need a qual clause to carry the join condition (t1.i = t2.i in your first example, i = t1.j in your second). This would have to dangle off a node that represents the specially joined tables, I think. There's no such thing as a "qualification node"; qual clauses are just expressions that happen to be in WHERE. If the "specially joined tables" node isn't in the RTE then I think we need to invent some new place to put it. The WHERE expression isn't a natural place for it. > From what I understand Bruce to say, the planner/optimizer is allowed > to try all kinds of permutations of plans, choosing the one with the > lowest cost. But if the info for the join is carried in a > qualification node, then the planner/optimizer must know that it can't > reorder the query as freely as it does now. Yes, the join order would be forced W.R.T. the outer-joined tables, at least. The other alternative we should consider is the notion that the parser outputs are already a multilevel plan structure, where we'd have a whole lower plan item representing the outer-join table result. This might end up being the same thing as above, since quite possibly the RTE would be the natural place for the upper plan's link to the lower one. We need to get Jan involved in this, since this sounds like the same kind of stuff he's been saying is needed for rules. In fact Jan probably ought to be leading the discussion, not me... > A hashjoin method may be possible for queries which are structured as > a left outer join; other outer joins will need to use the mergejoin > method. I don't see why plain ol' nested loop couldn't be used too. mergejoin is not always better than nested loop, or even always feasible. It requires the availability of sort operators, for one thing. > Also, some poorly-qualified outer joins reduce to inner joins, > and perhaps the optimizer can be smart enough to realize this. OK, now tell me about inner joins... regards, tom lane
> > select * from t1 left join t2 using (i) > > left join t1 on (i = t1.j); > Is that actually a valid query? Wouldn't you at least need to rename > one or the other appearance of t1? (Nitpick, probably, but maybe I > am not understanding something...) afaik it is a valid query. The first outer join combines t1 and t2, and by definition the intermediate result loses its table-specific labeling. > > For a query like > > select * from t1 left join t2 using (i) where t2.j = 3; > > istm that the outer join must be done before the t2 qualification is > > applied, and that another ordering may produce the wrong result. > It's not immediately obvious what the semantics of that ought to be... > but I agree it makes a difference if you eliminate t2 rows before > rather than after the join. > This looks to me like the syntactic notion is that t1-left-join-t2 > is already a single table as far as the rest of the SELECT is > concerned. (But what happens if they have column names in common, > other than i?) In which case you're right, join first then apply > the WHERE condition is presumably what's supposed to happen. > If that's the way it works, I think that an RTE describing the joined > table is the natural way to handle it. Obviously this would not be > a primitive node; it would have to be some kind of structure of nodes. Your statements are all correct. Maybe defining an RTE which does not refer to a single specific table is the way to go; is there anything like that already? If not, then putting the equivalent down in the qualifications might work. > > I was thinking of having a new qualification node to carry this > > info, > You would need a qual clause to carry the join condition (t1.i = t2.i > in your first example, i = t1.j in your second). This would have to > dangle off a node that represents the specially joined tables, I > think. > There's no such thing as a "qualification node"; qual clauses are just > expressions that happen to be in WHERE. If the "specially joined > tables" node isn't in the RTE then I think we need to invent some new > place to put it. The WHERE expression isn't a natural place for it. Right, I wasn't remembering the right terminology. > > From what I understand Bruce to say, the planner/optimizer is > > allowed to try all kinds of permutations of plans, choosing the one > > with the lowest cost. But if the info for the join is carried in a > > qualification node, then the planner/optimizer must know that it > > can't reorder the query as freely as it does now. > Yes, the join order would be forced W.R.T. the outer-joined tables, > at least. > The other alternative we should consider is the notion that the parser > outputs are already a multilevel plan structure, where we'd have a > whole lower plan item representing the outer-join table result. This > might end up being the same thing as above, since quite possibly the > RTE would be the natural place for the upper plan's link to the lower > one. > We need to get Jan involved in this, since this sounds like the same > kind of stuff he's been saying is needed for rules. In fact Jan > probably ought to be leading the discussion, not me... > > A hashjoin method may be possible for queries which are structured > > as a left outer join; other outer joins will need to use the > > mergejoin method. > I don't see why plain ol' nested loop couldn't be used too. mergejoin > is not always better than nested loop, or even always feasible. It > requires the availability of sort operators, for one thing. Right. I'm learning as I go. I did put some code into the mergejoin routines to walk the tables properly for outer joins (it is marked by #ifdef ENABLE_OUTER_JOINS). But the flags for outer joins are not yet passed in, and the result tuples are not constructed (they need some null fields added to the left- or right-side table). > > Also, some poorly-qualified outer joins reduce to inner joins, > > and perhaps the optimizer can be smart enough to realize this. > OK, now tell me about inner joins... As you know that is what Postgres already does. You can specify inner joins using this newer join syntax: select * from t1 join t2 using (i); which I just convert to normal query tree nodes in the parser so it's equivalent to select * from t1, t2 where t1.i = t2.i; I probably don't do a complete job, but it's been a while since I've looked at it. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California