Re: [PoC] Reducing planning time when tables have many partitions - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [PoC] Reducing planning time when tables have many partitions |
Date | |
Msg-id | CAExHW5tVgXj4iEMb=8pRAKeO0P+DL-+gdhpWKA9Rd4upDNXK5A@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Reducing planning time when tables have many partitions (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: [PoC] Reducing planning time when tables have many partitions
|
List | pgsql-hackers |
On Tue, Apr 8, 2025 at 8:30 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 8 Apr 2025 at 04:54, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > - foreach(lc2, cur_ec->ec_members) > > + setup_eclass_member_iterator(&it, cur_ec, rel); > > + while ((cur_em = eclass_member_iterator_next(&it)) != NULL) > > { > > - EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2); > > - > > /* > > * Ignore child members unless they match the request. > > */ > > > > Can this step be executed in the iterator itself? > > Yes, all the filtering requested *could* be done within the iterator. > The problem with doing that is that there's not even nearly a single > pattern to what callers need. Here's a summary of all 7 users of it: > > 1. get_eclass_for_sort_expr: CHILD members must have em_relids EQUAL > to rel (TYPE 1) > 2. generate_join_implied_equalities_normal: ALL members must be a > SUBSET of join_relids (TYPE 2) > 3. generate_implied_equalities_for_column: ALL members must have > em_relids EQUAL to rel->relids (TYPE 3) > 4. find_em_for_rel: ALL members must be a SUBSET of rel->relids (TYPE 2) > 5. find_ec_member_matching_expr: CHILD members must have em_relids > SUBSET of relids (TYPE 4) > 6. find_computable_ec_member: CHILD members must have em_relids SUBSET > of relids (TYPE 4) > 7. match_pathkeys_to_index: ALL members must have em_relids EQUAL to > index->rel->relids (TYPE 3) > > So, 4 distinct requirements and only types 3 and 4 are repeated. > Certainly, we could code up the iterator to handle all those different > requirements, but would that code be easy to follow? Do you think 1 > iterator should handle all those requirements with a set of IF > statements? I think that would be more difficult to read than how the > patch has it now. > > An alternative that I did consider, and even put a comment on the 2nd > paragraph of the header comment for EquivalenceMemberIterator about, > is the possibility of having multiple iterators for different > purposes. That would save the spaghetti code of IF statements doing it > in 1 iterator and alleviate the performance overhead of the spaghetti > code too. The problem is that there's still not a common pattern > that's followed often enough. If I wrote 4 iterators, I'd need to give > them meaningful names. Names like > eclass_member_iterator_children_with_subset_of_next are a too long and > if I was writing new code that was to call some function like that, > I'd probably be happier just doing the filtering locally than trying > to find which of the 4 iterators is the one I want. Thanks for listing all the patterns. Creating four different iterators is going to affect functionality and might require duplicate code. But each of the patterns is using exactly one BMS operation on em_relids and relids being used as search criteria. That BMS operation/function pointer can be passed to the iterator. I have not looked into whether each of those BMS functions return boolean or not OR whether all the functions take arguments in the same order. But, those things can be fixed. However, given that the feature freeze deadline is so close, it's fine to do it later either during the beta phase or in PG 19. The speed up would be small enough to be noticeable in PG 18 given this and other improvements that have gone in. > > > * the iterator > > * will return members where the em_relids overlaps the child_relids specified > > * when calling setup_eclass_member_iterator(). The caller may wish to ensure > > * the em_relids is a subset of the relids they're searching for. > > > > em_relids should be subset of or equal to child_relids specified, > > otherwise it may be missed as explained above. We should rephrase the > > sentence as " The caller should ensure ...". "May wish to" indicates > > an optional thing, but it doesn't look optional to me. > > That comment does not know what the caller needs to do. That's up to > the caller. If every caller needed to do that, we might as well just > filter non-matching ones out in the iterator itself. > > I did rewrite that comment quite a bit in the attached version as I > still wasn't quite happy with the wording. I still didn't make any > assumptions about what the caller wants, however. I have tried to improve it further in the attached diff. Please accept the diff if you find it useful and if time permits. > > > @@ -725,7 +819,9 @@ get_eclass_for_sort_expr(PlannerInfo *root, > > newec = makeNode(EquivalenceClass); > > newec->ec_opfamilies = list_copy(opfamilies); > > newec->ec_collation = collation; > > + newec->ec_childmembers_size = 0; > > newec->ec_members = NIL; > > > > How about renaming ec_members to ec_parent_members to make it > > explicit. That might break extensions that use ec_members but that way > > they would know that the purpose of this list has changed and it will > > nudge them to use ec_parent_member of EC iterator appropriately. > > The reason I didn't rename that field is mainly because I'm fine with > the current name. In my view, I think of parent members as normal > members of the class and only the child members are 2nd class members. > You can see evidence of that line of thought in the patch with > add_eq_member() not being called add_parent_eq_member(). I think the distinction between parent and child is useful. So I will still suggest renaming the field but it can be done post-feature-freeze. Tom seems to be fine with that per his last email on the thread. If we do earlier in the beta cycle like in April itself, that will give enough time for the extension authors to adjust their code, if necessary. > > + /* > > + * We don't expect any EC child members to exist at this point. Ensure > > + * that's the case, otherwise we might be getting asked to do something > > + * this function hasn't been coded for. > > + */ > > + Assert(ec->ec_childmembers == NULL); > > + > > > > In find_em_for_rel_target() we have Assert(!em->em_is_child), but here > > it's different Assert for the same purpose. It would be good to be > > consistent in all such places either Asserting !em->em_is_child in the > > loop or Assert(ec->ec_childmembers) OR both as you have done in > > process_equivalence(). > > I think this might not be obvious when reading the patch, but I think > where the confusion might be starting is that > "Assert(!cur_em->em_is_child);" isn't asserting what it used to. In > unpatched master, there are a few places that do that to verify the > function is never called after child members have been added. The > patch changes the meaning of this Assert. The comment was updated in a > bid to try to make this more obvious. The Assert's new purpose is to > ensure no child members snuck into the ec_members list. For the > functions that fundamentally don't expect children to exist yet, I've > added an Assert(cur_ec->ec_childmembers == NULL) to ensure that > remains true. I believe I'm consistent with that and I believe that's > a worthwhile assert to make. > > Occasionally, there are Asserts to check there are no child members in > ec_members elsewhere in places where ec_childmembers might contain > children. I believe all the Asserts doing this were all converted from > an "if (ec->em_is_child) continue;" statement. I don't feel the need > to add this assert everywhere that we loop over ec_members. It does > not seem likely that we'd get an em_is_child member in ec_members as > there's only a single place where the ec_members list is adjusted. > I'd be more easily convinced we can just delete all those Asserts than > I could be to add more. Thanks for the clarification. That's very useful Attached diff also brings ec_childmembers_size closer to ec_childmembers - usual practice of keeping the array and its size together. Thanks for all your last minute work. I think this is good to go for PG 18. -- Best Wishes, Ashutosh Bapat
Attachment
pgsql-hackers by date: