Thread: More on GROUP BY

More on GROUP BY

From
jwieck@debis.com (Jan Wieck)
Date:
While  looking  at all these parsetrees I wonder why the hell
    the GroupClause contains a complete copy of the TLE  at  all?
    The  planner  depends on finding a corresponding entry in the
    targetlist which should contain the same expression. At least
    it needs an equal junk TLE. For the query

        SELECT a, b FROM t1 GROUP BY b + 1;

    the  parser  in  fact creates 3 TLE's where the last one is a
    junk result named "resjunk" for the "b +  1"  expression  and
    the GroupClause contains a totally equal TLE.

    Could someone explain that please?

    Wouldn't it be better to have another field (resgroupno e.g.)
    in the resdom  which  the  GroupClause  can  reference?  Then
    changing  the resno's or even replacing the entire expression
    wouldn't hurt because  make_subplanTargetList()  could  match
    them  this  way  and  the expressions for the subplans can be
    pulled out directly from the targetlist. And  it  would  save
    processing  the  group  clauses in the rewriting because they
    cannot contain Var nodes anymore and the entire list  can  be
    ignored.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] More on GROUP BY

From
Bruce Momjian
Date:
>     While  looking  at all these parsetrees I wonder why the hell
>     the GroupClause contains a complete copy of the TLE  at  all?
>     The  planner  depends on finding a corresponding entry in the
>     targetlist which should contain the same expression. At least
>     it needs an equal junk TLE. For the query
> 
>         SELECT a, b FROM t1 GROUP BY b + 1;
> 
>     the  parser  in  fact creates 3 TLE's where the last one is a
>     junk result named "resjunk" for the "b +  1"  expression  and
>     the GroupClause contains a totally equal TLE.
> 
>     Could someone explain that please?
> 
>     Wouldn't it be better to have another field (resgroupno e.g.)
>     in the resdom  which  the  GroupClause  can  reference?  Then
>     changing  the resno's or even replacing the entire expression
>     wouldn't hurt because  make_subplanTargetList()  could  match
>     them  this  way  and  the expressions for the subplans can be
>     pulled out directly from the targetlist. And  it  would  save
>     processing  the  group  clauses in the rewriting because they
>     cannot contain Var nodes anymore and the entire list  can  be
>     ignored.

I think I can comment on this.  Aggregates had the similar problem.  It
was so long ago, I don't remember the solution, but it was a pain to
keep the aggs up-to-date with the target list and varno changes.  If you
think a redesign will fix the problem, go ahead.

I think the old problem may have been that the old Aggreg's kept
pointers to matching target list entries, so there was the aggregate in
the target list, and another separate list of aggregates in the Query
structure.  I think I removed the second copy, and just generated it in
the executor, where it was needed.

Please see parser/parse_agg.c for a description of how count(*) is
handled differently than other aggregates.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] More on GROUP BY

From
Tom Lane
Date:
jwieck@debis.com (Jan Wieck) writes:
>     While  looking  at all these parsetrees I wonder why the hell
>     the GroupClause contains a complete copy of the TLE  at  all?
>     The  planner  depends on finding a corresponding entry in the
>     targetlist which should contain the same expression. At least
>     it needs an equal junk TLE. For the query
>         SELECT a, b FROM t1 GROUP BY b + 1;
>     the  parser  in  fact creates 3 TLE's where the last one is a
>     junk result named "resjunk" for the "b +  1"  expression  and
>     the GroupClause contains a totally equal TLE.

>     Could someone explain that please?

All true, but so what?  It wastes a few bytes of memory during
planning, I suppose...

>     Wouldn't it be better to have another field (resgroupno e.g.)
>     in the resdom  which  the  GroupClause  can  reference?  Then
>     changing  the resno's or even replacing the entire expression
>     wouldn't hurt because  make_subplanTargetList()  could  match
>     them  this  way  and  the expressions for the subplans can be
>     pulled out directly from the targetlist. And  it  would  save
>     processing  the  group  clauses in the rewriting because they
>     cannot contain Var nodes anymore and the entire list  can  be
>     ignored.

I think I like better the idea of leaving the representation alone,
but using equal() on the exprs to match groupclause items to targetlist
entries.  That way, manipulation of the targetlist can't accidentally
cause the grouplist to look like it contains something different than
what it should have.  It doesn't bother me that the planner can fail
if it is unable to match a group item to a targetlist item --- that's
a good crosscheck that nothing's gone wrong.  (But matching on just
the resno is unsafe, as you said before.)

I think it's true that using a TLE for each grouplist item is a waste of
space, and that representing the grouplist as simply a list of expr's
would be good enough.  But pulling out the TLE decoration seems like
it's not an appropriate use of time at this stage of the release cycle.
I'd say hold off till after 6.5, then fold it in with the parsetree
redesign that you keep muttering we need (I agree!).

BTW, you keep using the term "RTE" which I'm not familiar with ---
I assume it's just referring to the parse tree nodes?
        regards, tom lane


Re: [HACKERS] More on GROUP BY

From
Bruce Momjian
Date:
> >     Wouldn't it be better to have another field (resgroupno e.g.)
> >     in the resdom  which  the  GroupClause  can  reference?  Then
> >     changing  the resno's or even replacing the entire expression
> >     wouldn't hurt because  make_subplanTargetList()  could  match
> >     them  this  way  and  the expressions for the subplans can be
> >     pulled out directly from the targetlist. And  it  would  save
> >     processing  the  group  clauses in the rewriting because they
> >     cannot contain Var nodes anymore and the entire list  can  be
> >     ignored.
> 
> I think I like better the idea of leaving the representation alone,
> but using equal() on the exprs to match groupclause items to targetlist
> entries.  That way, manipulation of the targetlist can't accidentally
> cause the grouplist to look like it contains something different than
> what it should have.  It doesn't bother me that the planner can fail
> if it is unable to match a group item to a targetlist item --- that's
> a good crosscheck that nothing's gone wrong.  (But matching on just
> the resno is unsafe, as you said before.)

The real problem is that it is hard to keep all these items synchronized
as they pass around through the stages.  I looked at the aggregates, and
it looks like that has a separate copy too, so it may not be that bad. 
We may just be missing a pass that makes appropriate changes in GROUP
clauses, but I am not sure.


> I think it's true that using a TLE for each grouplist item is a waste of
> space, and that representing the grouplist as simply a list of expr's
> would be good enough.  But pulling out the TLE decoration seems like
> it's not an appropriate use of time at this stage of the release cycle.
> I'd say hold off till after 6.5, then fold it in with the parsetree
> redesign that you keep muttering we need (I agree!).

Basically, it is my fault that I am bringing up the issue so late.  If I
had done the Open Items list earlier, we would not be so close to the
final.

Jan is currently researching it, and we have the regression tests and
three weeks.  He usually does a fine job of fixing things.  Jan, find
out what you think is required to get this working, and if it is not too
bad, maybe we should go ahead.

What does the Aggreg do?  Does it has similar duplication?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] More on GROUP BY

From
jwieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

> All true, but so what?  It wastes a few bytes of memory during
> planning, I suppose...

    Not  only  memory,  rewriting takes time too. If the GROUP BY
    clause has view columns, the rewriter has to change  all  the
    expressions  twice,  one time in the targetlist, another time
    in the group clause. Wasted efford if they  should  still  be
    equal after rewriting.

> I think it's true that using a TLE for each grouplist item is a waste of
> space, and that representing the grouplist as simply a list of expr's
> would be good enough.  But pulling out the TLE decoration seems like
> it's not an appropriate use of time at this stage of the release cycle.
> I'd say hold off till after 6.5, then fold it in with the parsetree
> redesign that you keep muttering we need (I agree!).

    I'm  sure  since  I  began to look at it that it's not such a
    good idea to make those changes at this stage.  But  in  fact
    some  of  them have to be done to make GROUP BY in views more
    robust.

>
> BTW, you keep using the term "RTE" which I'm not familiar with ---
> I assume it's just referring to the parse tree nodes?

    RTE == RangeTableEntry. The relations where the data  in  the
    querytree is coming from. The varno in Var nodes is the index
    into that table so the varno plus the varattno  identify  one
    relations attribute.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] More on GROUP BY

From
jwieck@debis.com (Jan Wieck)
Date:
Bruce Momjian wrote:

> The real problem is that it is hard to keep all these items synchronized
> as they pass around through the stages.  I looked at the aggregates, and
> it looks like that has a separate copy too, so it may not be that bad.
> We may just be missing a pass that makes appropriate changes in GROUP
> clauses, but I am not sure.

    Exactly  the probability of missing some changes somewhere is
    why  I  would  like  to  get  rid  of  the  field  entry   in
    GroupClause.  Having to keep duplicate items synced isn't the
    spirit of a relational database. It's like using triggers  to
    keep  two  tables in sync where a simple reference and a view
    would do a better job.

> > I think it's true that using a TLE for each grouplist item is a waste of
> > space, and that representing the grouplist as simply a list of expr's
> > would be good enough.  But pulling out the TLE decoration seems like
> > it's not an appropriate use of time at this stage of the release cycle.
> > I'd say hold off till after 6.5, then fold it in with the parsetree
> > redesign that you keep muttering we need (I agree!).
>
> Basically, it is my fault that I am bringing up the issue so late.  If I
> had done the Open Items list earlier, we would not be so close to the
> final.

    And my fault to spend too much time  playing  around  with  a
    raytracer.   Developers  should  develop,  publishers  should
    publish.

>
> Jan is currently researching it, and we have the regression tests and
> three weeks.  He usually does a fine job of fixing things.

    Thanks for the compliment :-)

>                                                            Jan, find
> out what you think is required to get this working, and if it is not too
> bad, maybe we should go ahead.
>
> What does the Aggreg do?  Does it has similar duplication?

    Not AFAICS. Aggregates can only appear in the  targetlist  by
    default.   and  the  nodes  below  them are the expression to
    aggregate over.  If an aggregate should appear in  the  WHERE
    clause  it  must  be placed into a proper subselect (the rule
    system already tries to do so if an aggregate column is  used
    in the WHERE of a view select, but fails sometimes).

    I'll go ahead now in little steps.

    1.  Get rid of the TLE copy in GroupClause.

    2.  Move the targetlist expansion into the rule system.

    3.  Rewrite the subquery creation for aggregates in the WHERE
        clause to take view grouping into account.

    4.  Allow qualifications against aggregates to  be  given  as
        "AggCol  op  Value"  by swapping the expression and using
        the negator operator (if one exists).

    As you said, we have three weeks. Let's see  what  one  Wieck
    can do in this time :-)


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] More on GROUP BY

From
jwieck@debis.com (Jan Wieck)
Date:
I wrote:

>     I'll go ahead now in little steps.
>
>     1.  Get rid of the TLE copy in GroupClause.

    Done. GroupClause now identifies the TLE by a number which is
    placed into the Resdom by parser/rewriter.

    New  initdb  required  because  of  modifications   in   node
    print/read functions.

>
>     2.  Move the targetlist expansion into the rule system.

    Not   required   anymore   AFAICS.  Instead  I  modified  the
    targetlist  preprocessing  in  the  planner  so   that   junk
    attributes  used  by  group  clauses  get  added again to the
    expanded list.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #