Re: Eager aggregation, take 3 - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Eager aggregation, take 3 |
Date | |
Msg-id | CAApHDvrxyGSLv3Sbg9fpmz6yYri_7M6SaKYnqQQv59uZfQdduA@mail.gmail.com Whole thread Raw |
In response to | Re: Eager aggregation, take 3 (Richard Guo <guofenglinux@gmail.com>) |
List | pgsql-hackers |
On Mon, 6 Oct 2025 at 13:59, Richard Guo <guofenglinux@gmail.com> wrote: > Barring any objections, I'll plan to push v23 in a couple of days. Not a complete review, but a customary look: 1. setup_base_grouped_rels() by name and the header comment claim to operate on base relations, but the code seems to be coded to handle OTHER_MEMBER rels too. Note that set_base_rel_pathlists() explicitly skips anything that's not RELOPT_BASEREL, so if you're not doing that, then you shouldn't use "base" in the function name. It's confusing. 2. All the calls to generate_grouped_paths() pass the grouped_rel RelOptInfo and also grouped_rel->agg_info. Is there a reason to keep it that way rather than access the agg_info from the given grouped_rel from within the function? 3. " * The information needed are provided by the RelAggInfo structure." This should use "is" rather than "are" 4. standard_join_search(). I think it's worth getting rid of the duplicate "if (!bms_equal(rel->relids, root->all_query_rels))" check. How about setting that in a local variable rather than recalling bms_equal(). I don't believe the compiler will optimise the extra one away as it can't know set_cheapest() doesn't change the relids. Also, wouldn't it be better to check rel->grouped_rel != NULL first? Won't that be NULL in most cases, where as !bms_equal(rel->relids, root->all_query_rels) will be true in most cases? Likewise in generate_partitionwise_join_paths(). 5. Wouldn't it be better to do 0002 first and get that into core so you don't have to do the hacky stuff in is_partial_agg_memory_risky()? 6. Shouldn't this be using lappend()? agg_clause_list = list_append_unique(agg_clause_list, ac_info); I don't understand why ac_info could already be in the list. You've just done: ac_info = makeNode(AggClauseInfo); 7. The following comment talks about "base" relations. I don't think it should be as the RelOptInfo can be an OTHER_MEMBER rel. * build_simple_grouped_rel * Construct a new RelOptInfo representing a grouped version of the input * base relation. */ 8. Normally we check the List is NIL instead of: if (list_length(group_clauses) == 0) 9. In get_expression_sortgroupref(), a comment claims "We ignore child members here.". I think that's outdated since ec_members no longer has child members. 10. I don't think this comment quite makes sense: * "apply_at" tracks the lowest join level at which partial aggregation is * applied. maybe "minimum set of rels to join before partial aggregation can be applied"? or at least swap "is" for "can be". My confusion comes from the fact you're stating "lowest join level", which seems to indicate that it could be applied after further relations have been joined, but then you're saying "is applied" to indicate that it can only be applied at that level. 11. The way you've written the header comments for typedef struct RelAggInfo seems weird. I've only ever seen extra details in the header comment when the inline comments have been kept to a single line. You're spanning multiple lines, so why have the out of line comments in the header at all? 12. This just doesn't feel like the right name for this field: /* lowest level partial aggregation is applied at */ Relids apply_at; I can't help think that it should be something like "agg_relids" or "required_relids". I understand you're currently only applying the partial grouping when you get exactly the minimum set of relids in the join search, but if this can be made fast enough, I expect that could be changed in the future. If you do change it, then "apply_at" is a pretty confusing name. Perhaps I've misunderstood here and if you did that, you'd need to create another RelAggInfo to represent that? 13. Parameter names mismatch between definition and declaration in: extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain); extern RelOptInfo *build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain); extern void generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, RelOptInfo *rel_plain, RelAggInfo *agg_info); 14. Do all the regression tests need VERBOSE in EXPLAIN? It's making the output kinda huge. It might also be nice to wrap the long queries onto multiple lines to make them easier to read. David
pgsql-hackers by date: