Thread: Another reason to redesign querytree representation

Another reason to redesign querytree representation

From
Tom Lane
Date:
Try this one:

create table fltsrc (f1 float8);
insert into fltsrc values (1.0);
insert into fltsrc values (1.1);
create table intdest (f1 int4);
insert into intdest select distinct f1 from fltsrc;

Currently, this coredumps because it tries to apply float8lt
to integer values; the result of column f1 has been coerced to
the eventual destination format (int4) before it ever gets out
of the SELECT stage.  But the parser assigned the sortop to use
for DISTINCT while f1 was still float :-(.

In 6.4.2, there's no coredump, but only one tuple gets inserted
into intdest, because the DISTINCT processing is done on int4 values.
I claim that is wrong too, because "select distinct f1 from fltsrc"
yields two tuples not one.

As far as I can see, there is no way to represent doing the Right
Thing with the current querytree representation.  Either the
targetlist expression includes a coercion to int4 or it doesn't;
we can't represent "do the DISTINCT sort and filter on float8,
*then* coerce to int4" in a single-level targetlist.

Meanwhile, Jan has been muttering that he can't really do rules
right without an RTE entry for the result of a sub-select.  And
the more I look at the UNION/EXCEPT/INTERSECT code, the less I
like it.

Maybe it is time to swallow hard and redesign the querytree
representation?  I think we really need a multi-level-plan
data structure right out of the starting gate... a format
closer to the plantree structure that the executor uses would
probably be less trouble all around.
        regards, tom lane


Re: [HACKERS] Another reason to redesign querytree representation

From
Bruce Momjian
Date:
> As far as I can see, there is no way to represent doing the Right
> Thing with the current querytree representation.  Either the
> targetlist expression includes a coercion to int4 or it doesn't;
> we can't represent "do the DISTINCT sort and filter on float8,
> *then* coerce to int4" in a single-level targetlist.
> 
> Meanwhile, Jan has been muttering that he can't really do rules
> right without an RTE entry for the result of a sub-select.  And
> the more I look at the UNION/EXCEPT/INTERSECT code, the less I
> like it.
> 
> Maybe it is time to swallow hard and redesign the querytree
> representation?  I think we really need a multi-level-plan
> data structure right out of the starting gate... a format
> closer to the plantree structure that the executor uses would
> probably be less trouble all around.

The current system was designed by me.  It was very little code, and I
was quite suprised it worked as well as it does.  Feel free to redesign
it.  I did it to be small and nimble.  Seems it is not flexible enough.

Got to admit, it is a nice trick to get UNIONS, which is all it was
really designed to do.  At the time, it was neat to get any feature out
of so little code.  Ah, the buggy pre-6.5 days.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Another reason to redesign querytree representation

From
Thomas Lockhart
Date:
> Maybe it is time to swallow hard and redesign the querytree
> representation?  I think we really need a multi-level-plan
> data structure right out of the starting gate... a format
> closer to the plantree structure that the executor uses would
> probably be less trouble all around.

Well, while you do that keep in mind outer joins. We've been
discussing it, and it is pretty clear that rte's cannot hold the join
info, since the same source table can participate in more than one
join. So we might have to carry the info in a special kind of
qualification node, which allows the planner/optimizer to generate
subselects below and above that node, but does not allow it to try the
full range of different combinations of joins across it (since it
appears to me that outer joins have only limited possibilities for
rearrangement of the query).

But perhaps a redesigned querytree could carry the info more
naturally...
                 - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Another reason to redesign querytree representation

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Maybe it is time to swallow hard and redesign the querytree
>> representation?

> Well, while you do that keep in mind outer joins.

While *I* redesign it?  I barely know what an outer join is.

I'm happy to kibitz while someone else redesigns it, but I don't
think I'm qualified to be the lead dog...
        regards, tom lane


Re: [HACKERS] Another reason to redesign querytree representation

From
Thomas Lockhart
Date:
> I'm happy to kibitz while someone else redesigns it, but I don't
> think I'm qualified to be the lead dog...

Well, that's OK, I barely know what a query tree is. But I think I
know what I want for outer join behavior. Let's keep talking if you
are mucking with the query tree stuff...
                  - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Another reason to redesign querytree representation

From
Bruce Momjian
Date:
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> >> Maybe it is time to swallow hard and redesign the querytree
> >> representation?
> 
> > Well, while you do that keep in mind outer joins.
> 
> While *I* redesign it?  I barely know what an outer join is.
> 
> I'm happy to kibitz while someone else redesigns it, but I don't
> think I'm qualified to be the lead dog...

Did you want to do the change for UNION, or were you just suggesting it
be done?  I can easily add it to the TODO list.

Done:
* redesign UNION structures to have separarate target lists.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Another reason to redesign querytree representation

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Did you want to do the change for UNION, or were you just suggesting it
> be done?  I can easily add it to the TODO list.
> Done:

>     * redesign UNION structures to have separarate target lists.

Actually, it's not so much UNION that's busted as it is INSERT.
The parser problems could be dealt with by having a two-level structure
for INSERT ... SELECT ..., so that the targetlist for the eventual
INSERT could be described without changing the semantics of the
underlying SELECT.

There might be other extensions needed for rules (paging Jan...) but
as far as what I've been looking at goes, the TODO entry could be just
* redesign INSERT ... SELECT to have two levels of target list.

Thomas, what do you think is needed for outer joins?
        regards, tom lane


Re: [HACKERS] Another reason to redesign querytree representation

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Did you want to do the change for UNION, or were you just suggesting it
> > be done?  I can easily add it to the TODO list.
> > Done:
> 
> >     * redesign UNION structures to have separarate target lists.
> 
> Actually, it's not so much UNION that's busted as it is INSERT.
> The parser problems could be dealt with by having a two-level structure
> for INSERT ... SELECT ..., so that the targetlist for the eventual
> INSERT could be described without changing the semantics of the
> underlying SELECT.
> 
> There might be other extensions needed for rules (paging Jan...) but
> as far as what I've been looking at goes, the TODO entry could be just
> 
>     * redesign INSERT ... SELECT to have two levels of target list.

Removed:
* Be smarter about promoting types when UNION merges different data types
* SELECT ... UNION ... GROUP BY fails if column types disagree
* INSERT ... SELECT ... UNION is not reliable

And added:

* redesign INSERT ... SELECT to have two levels of target list

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Another reason to redesign querytree representation

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Removed:
> * Be smarter about promoting types when UNION merges different data types
> * SELECT ... UNION ... GROUP BY fails if column types disagree
> * INSERT ... SELECT ... UNION is not reliable

> And added:

> * redesign INSERT ... SELECT to have two levels of target list

Er ... wait.  The first two of those TODO items were separate issues.
I think they can be fixed without changing the querytree representation.
The third can be superseded by this item, though.
        regards, tom lane


Re: [HACKERS] Another reason to redesign querytree representation

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Removed:
> > * Be smarter about promoting types when UNION merges different data types
> > * SELECT ... UNION ... GROUP BY fails if column types disagree
> > * INSERT ... SELECT ... UNION is not reliable
> 
> > And added:
> 
> > * redesign INSERT ... SELECT to have two levels of target list
> 
> Er ... wait.  The first two of those TODO items were separate issues.
> I think they can be fixed without changing the querytree representation.
> The third can be superseded by this item, though.
> 

Done.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Another reason to redesign querytree representation

From
Thomas Lockhart
Date:
> Thomas, what do you think is needed for outer joins?

Bruce and I have talked about it some already:

For outer joins, tables must be combined in a particular order. For
example, a left outer join requires that any entries in the left-side
table which do not have a corresponding entry in the right-side table
be expanded with nulls during the join. The information on the outer
join can't be carried by the rte since the same table can appear twice
in an outer join expression:
 select * from t1 left join t2 using (i)               left join t1 on (i = t1.j);

For a query like
 select * from t1 left join t2 using (i) where t2.j = 3;

istm that the outer join must be done before the t2 qualification is
applied, and that another ordering may produce the wrong result.

>From what I understand Bruce to say, the planner/optimizer is allowed
to try all kinds of permutations of plans, choosing the one with the
lowest cost. But if the info for the join is carried in a
qualification node, then the planner/optimizer must know that it can't
reorder the query as freely as it does now.

I was thinking of having a new qualification node to carry this info,
and it could be transformed into a mergejoin node which has a couple
of new fields indicating left and/or right outer join behavior.

A hashjoin method may be possible for queries which are structured as
a left outer join; other outer joins will need to use the mergejoin
method. Also, some poorly-qualified outer joins reduce to inner joins,
and perhaps the optimizer can be smart enough to realize this.
                    - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Another reason to redesign querytree representation

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Thomas, what do you think is needed for outer joins?

> The information on the outer
> join can't be carried by the rte since the same table can appear twice
> in an outer join expression:
>
>   select * from t1 left join t2 using (i)
>                 left join t1 on (i = t1.j);

Is that actually a valid query?  Wouldn't you at least need to rename
one or the other appearance of t1?  (Nitpick, probably, but maybe I
am not understanding something...)

> For a query like
>   select * from t1 left join t2 using (i) where t2.j = 3;
> istm that the outer join must be done before the t2 qualification is
> applied, and that another ordering may produce the wrong result.

It's not immediately obvious what the semantics of that ought to be...
but I agree it makes a difference if you eliminate t2 rows before
rather than after the join.

This looks to me like the syntactic notion is that t1-left-join-t2
is already a single table as far as the rest of the SELECT is 
concerned.  (But what happens if they have column names in common,
other than i?)  In which case you're right, join first then apply
the WHERE condition is presumably what's supposed to happen.

If that's the way it works, I think that an RTE describing the joined
table is the natural way to handle it.  Obviously this would not be
a primitive node; it would have to be some kind of structure of nodes.

> I was thinking of having a new qualification node to carry this info,

You would need a qual clause to carry the join condition (t1.i = t2.i
in your first example, i = t1.j in your second).  This would have to
dangle off a node that represents the specially joined tables, I think.

There's no such thing as a "qualification node"; qual clauses are just
expressions that happen to be in WHERE.  If the "specially joined
tables" node isn't in the RTE then I think we need to invent some new
place to put it.  The WHERE expression isn't a natural place for it.

> From what I understand Bruce to say, the planner/optimizer is allowed
> to try all kinds of permutations of plans, choosing the one with the
> lowest cost. But if the info for the join is carried in a
> qualification node, then the planner/optimizer must know that it can't
> reorder the query as freely as it does now.

Yes, the join order would be forced W.R.T. the outer-joined tables,
at least.

The other alternative we should consider is the notion that the parser
outputs are already a multilevel plan structure, where we'd have a whole
lower plan item representing the outer-join table result.  This might
end up being the same thing as above, since quite possibly the RTE would
be the natural place for the upper plan's link to the lower one.

We need to get Jan involved in this, since this sounds like the same
kind of stuff he's been saying is needed for rules.  In fact Jan
probably ought to be leading the discussion, not me...

> A hashjoin method may be possible for queries which are structured as
> a left outer join; other outer joins will need to use the mergejoin
> method.

I don't see why plain ol' nested loop couldn't be used too.  mergejoin
is not always better than nested loop, or even always feasible.  It
requires the availability of sort operators, for one thing.

> Also, some poorly-qualified outer joins reduce to inner joins,
> and perhaps the optimizer can be smart enough to realize this.

OK, now tell me about inner joins...
        regards, tom lane


Re: [HACKERS] Another reason to redesign querytree representation

From
Thomas Lockhart
Date:
> >   select * from t1 left join t2 using (i)
> >                 left join t1 on (i = t1.j);
> Is that actually a valid query?  Wouldn't you at least need to rename
> one or the other appearance of t1?  (Nitpick, probably, but maybe I
> am not understanding something...)

afaik it is a valid query. The first outer join combines t1 and t2,
and by definition the intermediate result loses its table-specific
labeling.

> > For a query like
> >   select * from t1 left join t2 using (i) where t2.j = 3;
> > istm that the outer join must be done before the t2 qualification is
> > applied, and that another ordering may produce the wrong result.
> It's not immediately obvious what the semantics of that ought to be...
> but I agree it makes a difference if you eliminate t2 rows before
> rather than after the join.
> This looks to me like the syntactic notion is that t1-left-join-t2
> is already a single table as far as the rest of the SELECT is
> concerned.  (But what happens if they have column names in common,
> other than i?)  In which case you're right, join first then apply
> the WHERE condition is presumably what's supposed to happen.
> If that's the way it works, I think that an RTE describing the joined
> table is the natural way to handle it.  Obviously this would not be
> a primitive node; it would have to be some kind of structure of nodes.

Your statements are all correct. Maybe defining an RTE which does not
refer to a single specific table is the way to go; is there anything
like that already? If not, then putting the equivalent down in the
qualifications might work.

> > I was thinking of having a new qualification node to carry this 
> > info,
> You would need a qual clause to carry the join condition (t1.i = t2.i
> in your first example, i = t1.j in your second).  This would have to
> dangle off a node that represents the specially joined tables, I 
> think.
> There's no such thing as a "qualification node"; qual clauses are just
> expressions that happen to be in WHERE.  If the "specially joined
> tables" node isn't in the RTE then I think we need to invent some new
> place to put it.  The WHERE expression isn't a natural place for it.

Right, I wasn't remembering the right terminology.

> > From what I understand Bruce to say, the planner/optimizer is 
> > allowed to try all kinds of permutations of plans, choosing the one 
> > with the lowest cost. But if the info for the join is carried in a
> > qualification node, then the planner/optimizer must know that it 
> > can't reorder the query as freely as it does now.
> Yes, the join order would be forced W.R.T. the outer-joined tables,
> at least.
> The other alternative we should consider is the notion that the parser
> outputs are already a multilevel plan structure, where we'd have a 
> whole lower plan item representing the outer-join table result.  This 
> might end up being the same thing as above, since quite possibly the 
> RTE would be the natural place for the upper plan's link to the lower 
> one.
> We need to get Jan involved in this, since this sounds like the same
> kind of stuff he's been saying is needed for rules.  In fact Jan
> probably ought to be leading the discussion, not me...
> > A hashjoin method may be possible for queries which are structured 
> > as a left outer join; other outer joins will need to use the 
> > mergejoin method.
> I don't see why plain ol' nested loop couldn't be used too.  mergejoin
> is not always better than nested loop, or even always feasible.  It
> requires the availability of sort operators, for one thing.

Right. I'm learning as I go. I did put some code into the mergejoin
routines to walk the tables properly for outer joins (it is marked by
#ifdef ENABLE_OUTER_JOINS). But the flags for outer joins are not yet
passed in, and the result tuples are not constructed (they need some
null fields added to the left- or right-side table).

> > Also, some poorly-qualified outer joins reduce to inner joins,
> > and perhaps the optimizer can be smart enough to realize this.
> OK, now tell me about inner joins...

As you know that is what Postgres already does. You can specify inner
joins using this newer join syntax:
 select * from t1 join t2 using (i);

which I just convert to normal query tree nodes in the parser so it's
equivalent to
 select * from t1, t2 where t1.i = t2.i;

I probably don't do a complete job, but it's been a while since I've
looked at it.
                    - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California