Re: Eager aggregation, take 3 - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Eager aggregation, take 3 |
Date | |
Msg-id | CAMbWs48qOOfBV6svQgbdzZ4HbTV18jsjyFD_b6ofJQfAt5n11w@mail.gmail.com Whole thread Raw |
In response to | Re: Eager aggregation, take 3 (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
On Mon, Oct 6, 2025 at 10:59 PM David Rowley <dgrowleyml@gmail.com> wrote: > Not a complete review, but a customary look: Thanks for all the comments! They've been very helpful. > 1. setup_base_grouped_rels() by name and the header comment claim to > operate on base relations, but the code seems to be coded to handle > OTHER_MEMBER rels too. Indeed. I renamed it to setup_simple_grouped_rels() and updated the related comments in v24. > 2. All the calls to generate_grouped_paths() pass the grouped_rel > RelOptInfo and also grouped_rel->agg_info. Is there a reason to keep > it that way rather than access the agg_info from the given grouped_rel > from within the function? Thanks. Fixed by removing the agg_info parameter. > 3. " * The information needed are provided by the RelAggInfo > structure." This should use "is" rather than "are" Yes. > 4. standard_join_search(). I think it's worth getting rid of the > duplicate "if (!bms_equal(rel->relids, root->all_query_rels))" check. > How about setting that in a local variable rather than recalling > bms_equal(). I don't believe the compiler will optimise the extra one > away as it can't know set_cheapest() doesn't change the relids. Also, > wouldn't it be better to check rel->grouped_rel != NULL first? Won't > that be NULL in most cases, where as !bms_equal(rel->relids, > root->all_query_rels) will be true in most cases? Likewise in > generate_partitionwise_join_paths(). Good point. Done that way in v24. > 5. Wouldn't it be better to do 0002 first and get that into core so > you don't have to do the hacky stuff in is_partial_agg_memory_risky()? Agreed. Done in v24. > 6. Shouldn't this be using lappend()? > > agg_clause_list = list_append_unique(agg_clause_list, ac_info); > > I don't understand why ac_info could already be in the list. You've > just done: ac_info = makeNode(AggClauseInfo); A query can specify the same Aggref expressions multiple times in the target list. Using lappend here can lead to duplicate partial Aggref nodes in the targetlist of a grouped path, which is what I want to avoid. > 7. The following comment talks about "base" relations. I don't think > it should be as the RelOptInfo can be an OTHER_MEMBER rel. > > * build_simple_grouped_rel > * Construct a new RelOptInfo representing a grouped version of the input > * base relation. > */ Fixed in v24. > 8. Normally we check the List is NIL instead of: > > if (list_length(group_clauses) == 0) Right. Updated in v24. > 9. In get_expression_sortgroupref(), a comment claims "We ignore child > members here.". I think that's outdated since ec_members no longer has > child members. I think that comment is used to explain why we only scan ec_members here. Similar comments can be found in many other places, such as in equivclass.c: /* * Found our match. Scan the other EC members and attempt to generate * joinclauses. Ignore children here. */ foreach(lc2, cur_ec->ec_members) { > 10. I don't think this comment quite makes sense: > > * "apply_at" tracks the lowest join level at which partial aggregation is > * applied. > > maybe "minimum set of rels to join before partial aggregation can be applied"? > > or at least swap "is" for "can be". > > My confusion comes from the fact you're stating "lowest join level", > which seems to indicate that it could be applied after further > relations have been joined, but then you're saying "is applied" to > indicate that it can only be applied at that level. > > 11. The way you've written the header comments for typedef struct > RelAggInfo seems weird. I've only ever seen extra details in the > header comment when the inline comments have been kept to a single > line. You're spanning multiple lines, so why have the out of line > comments in the header at all? > > 12. This just doesn't feel like the right name for this field: > > /* lowest level partial aggregation is applied at */ > Relids apply_at; > > I can't help think that it should be something like "agg_relids" or > "required_relids". I understand you're currently only applying the > partial grouping when you get exactly the minimum set of relids in the > join search, but if this can be made fast enough, I expect that could > be changed in the future. If you do change it, then "apply_at" is a > pretty confusing name. Perhaps I've misunderstood here and if you did > that, you'd need to create another RelAggInfo to represent that? Hmm, RelAggInfo is a per-relation structure; each grouped relation has a valid RelAggInfo. The apply_at field represents the set of relids where partial aggregation is applied within the paths of this grouped relation. If we ever change this approach and allow the planner to explore all join levels for placing partial aggregation, the apply_at field will become obsolete (cf. prior to v17 patches). I've updated the comment for apply_at to clarify that it refers to the relids at which partial aggregation is applied. I've also updated the comments within RelAggInfo to use one-line style. I retained the name of this field though. > 13. Parameter names mismatch between definition and declaration in: > > extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, > RelOptInfo *rel_plain); > extern RelOptInfo *build_grouped_rel(PlannerInfo *root, > RelOptInfo *rel_plain); > > extern void generate_grouped_paths(PlannerInfo *root, > RelOptInfo *rel_grouped, > RelOptInfo *rel_plain, > RelAggInfo *agg_info); Nice catch! Fixed in v24. > 14. Do all the regression tests need VERBOSE in EXPLAIN? It's making > the output kinda huge. It might also be nice to wrap the long queries > onto multiple lines to make them easier to read. One of the challenges in this patch is generating the correct target list for each grouped relation. So I'm kind of inclined to retain VERBOSE in EXPLAIN. As I recall, the output target list in the test cases saved me several times during development when I introduced problematic code changes. I wrapped the long queries in v24. - Richard
Attachment
pgsql-hackers by date: