Re: Eager aggregation, take 3 - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Eager aggregation, take 3
Date
Msg-id CAMbWs48qOOfBV6svQgbdzZ4HbTV18jsjyFD_b6ofJQfAt5n11w@mail.gmail.com
Whole thread Raw
In response to Re: Eager aggregation, take 3  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Mon, Oct 6, 2025 at 10:59 PM David Rowley <dgrowleyml@gmail.com> wrote:
> Not a complete review, but a customary look:

Thanks for all the comments!  They've been very helpful.

> 1. setup_base_grouped_rels() by name and the header comment claim to
> operate on base relations, but the code seems to be coded to handle
> OTHER_MEMBER rels too.

Indeed.  I renamed it to setup_simple_grouped_rels() and updated the
related comments in v24.

> 2. All the calls to generate_grouped_paths() pass the grouped_rel
> RelOptInfo and also grouped_rel->agg_info. Is there a reason to keep
> it that way rather than access the agg_info from the given grouped_rel
> from within the function?

Thanks.  Fixed by removing the agg_info parameter.

> 3. " * The information needed are provided by the RelAggInfo
> structure." This should use "is" rather than "are"

Yes.

> 4. standard_join_search(). I think it's worth getting rid of the
> duplicate "if (!bms_equal(rel->relids, root->all_query_rels))" check.
> How about setting that in a local variable rather than recalling
> bms_equal(). I don't believe the compiler will optimise the extra one
> away as it can't know set_cheapest() doesn't change the relids. Also,
> wouldn't it be better to check rel->grouped_rel != NULL first? Won't
> that be NULL in most cases, where as !bms_equal(rel->relids,
> root->all_query_rels) will be true in most cases? Likewise in
> generate_partitionwise_join_paths().

Good point.  Done that way in v24.

> 5. Wouldn't it be better to do 0002 first and get that into core so
> you don't have to do the hacky stuff in is_partial_agg_memory_risky()?

Agreed.  Done in v24.

> 6. Shouldn't this be using lappend()?
>
>  agg_clause_list = list_append_unique(agg_clause_list, ac_info);
>
> I don't understand why ac_info could already be in the list. You've
> just done: ac_info = makeNode(AggClauseInfo);

A query can specify the same Aggref expressions multiple times in the
target list.  Using lappend here can lead to duplicate partial Aggref
nodes in the targetlist of a grouped path, which is what I want to
avoid.

> 7. The following comment talks about "base" relations. I don't think
> it should be as the RelOptInfo can be an OTHER_MEMBER rel.
>
>  * build_simple_grouped_rel
>  *   Construct a new RelOptInfo representing a grouped version of the input
>  *   base relation.
>  */

Fixed in v24.


> 8. Normally we check the List is NIL instead of:
>
> if (list_length(group_clauses) == 0)

Right.  Updated in v24.

> 9. In get_expression_sortgroupref(), a comment claims "We ignore child
> members here.". I think that's outdated since ec_members no longer has
> child members.

I think that comment is used to explain why we only scan ec_members
here.  Similar comments can be found in many other places, such as in
equivclass.c:

  /*
   * Found our match.  Scan the other EC members and attempt to generate
   * joinclauses.  Ignore children here.
   */
  foreach(lc2, cur_ec->ec_members)
  {


> 10. I don't think this comment quite makes sense:
>
>  * "apply_at" tracks the lowest join level at which partial aggregation is
>  * applied.
>
> maybe "minimum set of rels to join before partial aggregation can be applied"?
>
> or at least swap "is" for "can be".
>
> My confusion comes from the fact you're stating "lowest join level",
> which seems to indicate that it could be applied after further
> relations have been joined, but then you're saying "is applied" to
> indicate that it can only be applied at that level.
>
> 11. The way you've written the header comments for typedef struct
> RelAggInfo seems weird.  I've only ever seen extra details in the
> header comment when the inline comments have been kept to a single
> line. You're spanning multiple lines, so why have the out of line
> comments in the header at all?
>
> 12. This just doesn't feel like the right name for this field:
>
> /* lowest level partial aggregation is applied at */
> Relids apply_at;
>
> I can't help think that it should be something like "agg_relids" or
> "required_relids".  I understand you're currently only applying the
> partial grouping when you get exactly the minimum set of relids in the
> join search, but if this can be made fast enough, I expect that could
> be changed in the future. If you do change it, then "apply_at" is a
> pretty confusing name.  Perhaps I've misunderstood here and if you did
> that, you'd need to create another RelAggInfo to represent that?

Hmm, RelAggInfo is a per-relation structure; each grouped relation has
a valid RelAggInfo.  The apply_at field represents the set of relids
where partial aggregation is applied within the paths of this grouped
relation.  If we ever change this approach and allow the planner to
explore all join levels for placing partial aggregation, the apply_at
field will become obsolete (cf. prior to v17 patches).

I've updated the comment for apply_at to clarify that it refers to the
relids at which partial aggregation is applied.

I've also updated the comments within RelAggInfo to use one-line
style.

I retained the name of this field though.

> 13. Parameter names mismatch between definition and declaration in:
>
> extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
> RelOptInfo *rel_plain);
> extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
> RelOptInfo *rel_plain);
>
> extern void generate_grouped_paths(PlannerInfo *root,
>    RelOptInfo *rel_grouped,
>    RelOptInfo *rel_plain,
>    RelAggInfo *agg_info);

Nice catch!  Fixed in v24.

> 14. Do all the regression tests need VERBOSE in EXPLAIN? It's making
> the output kinda huge. It might also be nice to wrap the long queries
> onto multiple lines to make them easier to read.

One of the challenges in this patch is generating the correct target
list for each grouped relation.  So I'm kind of inclined to retain
VERBOSE in EXPLAIN.  As I recall, the output target list in the test
cases saved me several times during development when I introduced
problematic code changes.

I wrapped the long queries in v24.

- Richard

Attachment

pgsql-hackers by date:

Previous
From: "Aya Iwata (Fujitsu)"
Date:
Subject: RE: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE
Next
From: Jakub Wartak
Date:
Subject: Re: The ability of postgres to determine loss of files of the main fork