Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning |
Date | |
Msg-id | CAExHW5tObRq1hcwunx=7GhaJAjjdzx7gmH=h=hEanvRSJQeUcw@mail.gmail.com Whole thread Raw |
In response to | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
|
List | pgsql-hackers |
On Mon, Mar 24, 2025 at 7:23 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Ok, thanks for that analysis. I don't think there's anything about > the patch that makes it particularly less suitable for pwj=on. > I agree. > I read patch 0002 in detail last week and wrote a follow-up patch > (0003), mostly for cosmetic improvements, which I plan to squash into > 0002. For some reason v2-0003 didn't apply cleanly on my local branch. Possibly v2-0002 is a modified version of my 0002. In order to not disturb your patchset, I have attached my edits as a diff. If you find those useful, apply them to 0003 and then squash it into 0002. Here are some more cosmetic comments on 0003. A bit of discussion might be needed. Hence did not edit myself. -/* Hash table entry in ec_derives_hash. */ +/* Hash table entry used in ec_derives_hash. */ I think we don't need "used" here entry in hash table is good enough. But I am ok even if it stays that way. - add_derived_clauses(ec1, ec2->ec_derives_list); + /* Updates ec1's ec_derives_list and ec_derives_hash if present. */ + ec_add_derived_clauses(ec1, ec2->ec_derives_list); Suggestion in the attached diff. ec1->ec_relids = bms_join(ec1->ec_relids, ec2->ec_relids); ec1->ec_has_const |= ec2->ec_has_const; /* can't need to set has_volatile */ @@ -396,7 +400,7 @@ process_equivalence(PlannerInfo *root, /* just to avoid debugging confusion w/ dangling pointers: */ ec2->ec_members = NIL; ec2->ec_sources = NIL; - clear_ec_derived_clauses(ec2); + ec_clear_derived_clauses(ec2); I pondered about this naming convention when naming the functions. But it seems it's not used everywhere in this file OR I am not able to see the underlying naming rule if any. So I used a mixed naming. Let's keep your names though. I think they are better. @@ -1403,6 +1403,17 @@ typedef struct JoinDomain * entry: consider SELECT random() AS a, random() AS b ... ORDER BY b,a. * So we record the SortGroupRef of the originating sort clause. * + * Derived equality clauses between EC members are stored in ec_derives_list. "clauses between EC members" doesn't read correctly - "clauses equating EC members" seems better. We actually don't need to mention "between EC members" at all - it's not relevant to the "size of the list" which is the subject of this paragraph. What do you think? + * For small queries, this list is scanned directly during lookup. For larger + * queries -- e.g., with many partitions or joins -- a hash table + * (ec_derives_hash) is built for faster lookup. Both structures contain the + * same RestrictInfos and are maintained in parallel. If only the list exists, there is nothing to maintain in parallel. The sentence seems to indicate that both the hash table and the list are always present (and maintained in parallel). How about dropping "Both ... parallel" and modifying the previous sentence as " ... faster lookup when the list grows beyond a threshold" or something similar. The next sentence anyway mentions that both the structures are maintained. + We retain the list even + * when the hash is used to simplify serialization (e.g., in + * _outEquivalenceClass()) and support EquivalenceClass merging. Thanks for the review. -- Best Wishes, Ashutosh Bapat
Attachment
pgsql-hackers by date: