Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning |
Date | |
Msg-id | CAExHW5sFgbPCh25fkb1cUmN0cVeEum9zfa0Vf2p8RcekS2_B1g@mail.gmail.com Whole thread Raw |
In response to | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
|
List | pgsql-hackers |
On Tue, Apr 1, 2025 at 1:32 PM Amit Langote <amitlangote09@gmail.com> wrote: > > I think David’s suggestion to use 64 as the fixed initial size is > simpler and more predictable. Since list_length * 2 will always be >= > 64 anyway, unless we expect clause counts to grow significantly right > after the threshold, the fixed size avoids the need to reason about > sizing heuristics and keeps the logic clearer. > > It also lines up well with David’s point -- 64 strikes a good balance > for both memory usage and CPU efficiency. Unless there’s a strong > reason to favor dynamic sizing, I’d prefer to stick with the fixed 64. Consider following table and query (based on a similar table in join.sql, which exercises find_derived_clause_for_ec_member() code path. #create table fkest (x integer, x10 integer, x10b integer, x100 integer, unique(x, x10, x100), foreign key (x, x10b, x100) references fkest(x, x10, x100)); #select 'select * from fkest f1 join ' || string_agg(format('fkest f%s on (f1.x = f%s.x and f1.x10 = f%s.x10b and f1.x100 = f%s.x100)', i, i, i , i), ' join ') || ' where f1.x100 = 2' query from generate_series(2, 100) i; \gset #explain (summary) :query; This is a 100-way self-join between foreign key and referenced key and one of the foreign keys being set to a constant. This exercises the case of derived clauses with constant EM. When planning this query, all the 100 derived clauses containing the constant EM are created first and then they are searched one by one for every EM. Thus when the hash table is created in find_derived_clause_for_ec_member()->ec_search_derived_clause_for_ems(), the length of ec_derives_list is already 100, so the hash table will be expanded while it's being filled up the first time, if we use constant 64 as initial hash table size. This can be avoided if we use list_length() * 2. The pattern for create_join_clause() however is search then insert - so it will always create the hash table with initial size 64 right when the list length is 32. Both these cases are served well if we base the initial hash table size as a multiple of list_length(ec_derives_list). For the record, without the patch this query takes about 5800ms on average for planning on my laptop. With the patch the planning time reduces to about 5400ms - 6-7% of improvement if my approximate math is correct. > > As David suggested off-list, it seems better to have a static inline > function for the key canonicalization logic than to duplicate it in > the insert and lookup paths, as done in your 0004. Static inline function to fill the key in canonical form is a good idea. Thanks for the patch. > > * Replaces literal values for threshold and initial size with macros, > defined near the key and entry types. I don't see this in your attached patches. Am I missing something? It's still using list_length() for initial hash table size. But with my explanation above, I believe we will keep it that way. > > I wasn’t sure why you removed this comment: > > - * We do not attempt a second lookup with EMs swapped when using the hash > - * table; such clauses are inserted under both orderings at the time of > - * insertion. When I read it, I didn't understand why we mentioned a second lookup, so I dropped it. But now I see that the comment is in the context of two comparisons being done when using list. I have rephrased the comment a bit to make this comparison clear. > > 0003’s commit message includes a blurb I plan to paste into the main > patch’s message (with minor tweaks) when we squash the patches. Slight correction in the first sentence of that blurb message. Derived clauses are now stored in the ec_derives_hash using canonicalized keys: the EquivalenceMember with lower memory address is always placed in em1, and ... +/* + * fill_ec_derives_key + * Compute a canonical key for ec_derives_hash lookup or insertion. + * + * Derived clauses are looked up using a pair of EquivalenceMembers and a + * parent EquivalenceClass. To avoid storing or searching for both EM orderings, + * we canonicalize the key: + * + * - For clauses involving two non-constant EMs, we order the EMs by address + * and place the lower one first. + * - For clauses involving a constant EM, the caller must pass the non-constant + * EM as leftem; we then set em1 = NULL and em2 = leftem. + */ +static inline void +fill_ec_derives_key(ECDerivesKey *key, + EquivalenceMember *leftem, + EquivalenceMember *rightem, + EquivalenceClass *parent_ec) +{ + Assert(leftem); /* Always required for lookup or insertion */ + + /* + * Clauses with a constant EM are always stored and looked up using only + * the non-constant EM, with the other key slot set to NULL. + */ This comment seems to overlap with what's already there in the prologue. However "Clauses with a constant EM are always stored and looked up using the non-constant EM" is non-overlapping part. This part is also repeated in ec_add_clause_to_derives_hash(), which I think is a better place for this comment. Changed in the attached patch. */ Assert(!rinfo->left_em->em_is_const); + /* + * Clauses containing a constant are never considered redundant, so + * parent_ec is not set. + */ + Assert(!rinfo->parent_ec || !rinfo->right_em->em_is_const); These two Assertions are applicable to all the derived clauses but are added to a function which deals with only hash table. If an EC doesn't have enough derived clauses to create a hash table these assertions won't be checked. The reason they are here is because we are using these properties to create canonical hash table key. Should we move them to ec_add_derived_clause() for better coverage? I have also removed some comments repeated in the function prologue and function body. 0001, 0002 and 0003 are the same as your set. 0004 contains my changes in a separate patch so that it's easy to review those. -- Best Wishes, Ashutosh Bapat
Attachment
pgsql-hackers by date: