Thread: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi All, Following up on [1] ... A restrictlist is a list of RestrictInfo nodes, each representing one clause, applicable to a set of joins. When partitionwise join is used as a strategy, the restrictlists for child join are obtained by translating the restrictlists for the parent in try_partitionwise_join(). That function is called once for every join order. E.g. when computing join ABC, it will be called for planning joins AB, BC, AC, (AB)C, A(BC), B(AC). Every time it is called it will translate the given parent restrictlist. This means that a RestrictInfo node applicable to given child relations will be translated as many times as the join orders in which those child relations appear in different joining relations. For example, consider a query "select * from A, B, C where A.a = B.a and B.a = C.a" where A, B and C are partitioned tables. A has partitions A1, A2, ... An. B has partitions B1, B2, ... Bn and C has partitions C1, C2, ... Cn. Partitions Ai, Bi and Ci are matching partitions respectively for all i's. The join ABC is computed as Append of (A1B1C1, A2B2C2, ... AnBnCn). The clause A.a = B.a is translated to A1.a = B1.a thrice, when computing A1B1, A1(B1C1) and B1(A1C1) respectively. Similarly other clauses are translated multiple times. Some extra translations also happen in reparameterize_path_by_child(). These translations consume memory which remains allocated till the statement finishes. A ResrtictInfo should be translated only once per parent-child pair, thus avoiding consuming extra memory. There are two patches attached 0001 - to measure memory consumption during planning. This is the same one as attached to [1]. 0002 - WIP patch to avoid repeated translations of RestrictInfo. The WIP patch avoids repeated translations by tracking the child for which a RestrictInfo is translated and reusing the same translation every time it is requested. In order to track the translations, RestrictInfo gets two new members. 1. parent_rinfo - In a child's RestrictInfo this points to the RestrictInfo applicable to the topmost parent in partition hierarchy. This is NULL in the topmost parent's RestrictInfo 2. child_rinfos - In a parent's RestrictInfo, this is a list that contains all the translated child RestrictInfos. In child RestrictInfos this is NULL. Every translated RestrictInfo is stored in the top parent's RestrictInfo child_rinfos. RestrictInfo::required_relids is used as a key to search a given translation. I have intercepted adjust_appendrel_attrs_mutator() to track translations as well as avoid multiple translations. It first looks for an existing translation when translating a RestrictInfo and creates a new one only when one doesn't exist already. Using this patch the memory consumption for the above query reduces as follows Number of partitions: 1000 Number of tables | without patch | with patch | % reduction | being joined | | | | -------------------------------------------------------------- 2 | 40.3 MiB | 37.7 MiB | 6.43% | 3 | 146.8 MiB | 133.0 MiB | 9.42% | 4 | 445.4 MiB | 389.5 MiB | 12.57% | 5 | 1563.2 MiB | 1243.2 MiB | 20.47% | The number of times a RestrictInfo requires to be translated increases exponentially with the number of tables joined. Thus we see more memory saved as the number of tables joined increases. When two tables are joined there's only a single join planned so no extra translations happen in try_partitionwise_join(). The memory saved in case of 2 joining tables comes from avoiding extra translations happening during reparameterization of paths (in reparameterize_path_by_child()). The attached patch is to show how much memory can be saved if we avoid extra translation. But I want to discuss the following things about the approach. 1. The patch uses RestrictInfo::required_relids as the key for searching child RelOptInfos. I am not sure which of the two viz. required_relids and clause_relids is a better key. required_relids seems to be a subset of clause_relids and from the description it looks like that's the set that decides the applicability of a clause in a join. But clause_relids is obtained from all the Vars that appear in the clause, so may be that's the one that matters for the translations. Can somebody guide me? 2. The patch adds two extra pointers per RestrictInfo. They will remain unused when partitionwise join is not used. Right now, I do not see any increase in memory consumed by planner because of those pointers even in case of unpartitioned tables; maybe they are absorbed in memory alignment. They may show up as extra memory in the future. I am wondering whether we can instead save and track translations in PlannerInfo as a hash table using <rinfo_serial, required_relids (or whatever is the answer to above question) of parent and child respectively> as key. That will just add one extra pointer in PlannerInfo when partitionwise join is not used. Please let me know your suggestions. 3. I have changed adjust_appendrel_attrs_mutator() to return a translated RestrictInfo if it already exists. IOW, it won't always return a deep copy of given RestrictInfo as it does today. This can be fixed by writing wrappers around adjust_appendrel_attrs() to translate RestrictInfo specifically. But maybe we don't always need deep copies. Are there any cases when we need translated deep copies of RestrictInfo? Those cases will require fixing callers of adjust_appendrel_attrs() instead of the mutator. 4. IIRC, when partitionwise join was implemented we had discussed creating child RestrictInfos using a login similar to build_joinrel_restrictlist(). That might be another way to build RestrictInfo only once and use it multiple times. But we felt that it was much harder problem to solve since it's not known which partitions from joining partitioned tables will match and will be joined till we enter try_partitionwise_join(), so the required RestrictInfos may not be available in RelOptInfo::joininfo. Let me know your thoughts on this. Comments/suggestions welcome. references [1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com -- Best Wishes, Ashutosh Bapat
Attachment
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
0002 - WIP patch to avoid repeated translations of RestrictInfo.
The WIP patch avoids repeated translations by tracking the child for
which a RestrictInfo is translated and reusing the same translation
every time it is requested. In order to track the translations,
RestrictInfo gets two new members.
1. parent_rinfo - In a child's RestrictInfo this points to the
RestrictInfo applicable to the topmost parent in partition hierarchy.
This is NULL in the topmost parent's RestrictInfo
2. child_rinfos - In a parent's RestrictInfo, this is a list that
contains all the translated child RestrictInfos. In child
RestrictInfos this is NULL.
crash while planning the query below.
regression=# set enable_partitionwise_join to on;
SET
regression=# EXPLAIN (COSTS OFF)
SELECT * FROM prt1 t1, prt2 t2
WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0;
server closed the connection unexpectedly
Thanks
Richard
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Tue, Aug 8, 2023 at 4:08 PM Richard Guo <guofenglinux@gmail.com> wrote: > > > I haven't looked into the details but with 0002 patch I came across a > crash while planning the query below. > > regression=# set enable_partitionwise_join to on; > SET > regression=# EXPLAIN (COSTS OFF) > SELECT * FROM prt1 t1, prt2 t2 > WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0; > server closed the connection unexpectedly Thanks for the report. PFA the patches with this issue fixed. There is another issue seen when running partition_join.sql "ERROR: index key does not match expected index column". I will investigate that issue. Right now I am looking at the 4th point in my first email in this thread. [1]. I am trying to figure whether that approach would work. If that approach doesn't work what's the best to track the translations and also figure out answers to 1, 2, 3 there. Please let me know your opinions if any. -- Best Wishes, Ashutosh Bapat [1] https://www.postgresql.org/message-id/CAExHW5s=bCLMMq8n_bN6iU+Pjau0DS3z_6Dn6iLE69ESmsPMJQ@mail.gmail.com
Attachment
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > 0001 - to measure memory consumption during planning. This is the same > one as attached to [1]. I see you're recording the difference in the CurrentMemoryContext of palloc'd memory before and after planning. That won't really alert us to problems if the planner briefly allocates, say 12GBs of memory during, say the join search then quickly pfree's it again. unless it's an oversized chunk, aset.c won't free() any memory until MemoryContextReset(). Chunks just go onto a freelist for reuse later. So at the end of planning, the context may still have that 12GBs malloc'd, yet your new EXPLAIN ANALYZE property might end up just reporting a tiny fraction of that. I wonder if it would be more useful to just have ExplainOneQuery() create a new memory context and switch to that and just report the context's mem_allocated at the end. It's also slightly annoying that these planner-related summary outputs are linked to EXPLAIN ANALYZE. We could be showing them in EXPLAIN without ANALYZE. If we were to change that now, it might be a bit annoying for the regression tests as we'd need to go and add SUMMARY OFF in a load of places... drowley@amd3990x:~/pg_src/src/test/regress/sql$ git grep -i "costs off" | wc -l 1592 hmm, that would cause a bit of churn... :-( David
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Wed, Aug 9, 2023 at 10:12 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > 0001 - to measure memory consumption during planning. This is the same > > one as attached to [1]. > I have started a separate thread to discuss this patch. I am taking this discussion to that thread. -- Best Wishes, Ashutosh Bapat
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
I spent some time on 4th point below but also looked at other points. Here's what I have found so far On Thu, Jul 27, 2023 at 7:35 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > 1. The patch uses RestrictInfo::required_relids as the key for > searching child RelOptInfos. I am not sure which of the two viz. > required_relids and clause_relids is a better key. required_relids > seems to be a subset of clause_relids and from the description it > looks like that's the set that decides the applicability of a clause > in a join. But clause_relids is obtained from all the Vars that appear > in the clause, so may be that's the one that matters for the > translations. Can somebody guide me? I was wrong that required_relids is subset of clause_relids. The first can contain OJ relids which the other can not. OJ relids do not have any children, so they won't be translated. So clause_relids seems to be a better key. I haven't made a decision yet. > > 2. The patch adds two extra pointers per RestrictInfo. They will > remain unused when partitionwise join is not used. Right now, I do not > see any increase in memory consumed by planner because of those > pointers even in case of unpartitioned tables; maybe they are absorbed > in memory alignment. They may show up as extra memory in the future. I > am wondering whether we can instead save and track translations in > PlannerInfo as a hash table using <rinfo_serial, required_relids (or > whatever is the answer to above question) of parent and child > respectively> as key. That will just add one extra pointer in > PlannerInfo when partitionwise join is not used. Please let me know > your suggestions. I will go ahead with a pointer in PlannerInfo for now. > > 3. I have changed adjust_appendrel_attrs_mutator() to return a > translated RestrictInfo if it already exists. IOW, it won't always > return a deep copy of given RestrictInfo as it does today. This can be > fixed by writing wrappers around adjust_appendrel_attrs() to translate > RestrictInfo specifically. But maybe we don't always need deep copies. > Are there any cases when we need translated deep copies of > RestrictInfo? Those cases will require fixing callers of > adjust_appendrel_attrs() instead of the mutator. I think it's better to handle the tracking logic outside adjust_appendrel_attrs. That will be some code churn but it will be cleaner and won't affect anything other that partitionwise joins. > > 4. IIRC, when partitionwise join was implemented we had discussed > creating child RestrictInfos using a login similar to > build_joinrel_restrictlist(). That might be another way to build > RestrictInfo only once and use it multiple times. But we felt that it > was much harder problem to solve since it's not known which partitions > from joining partitioned tables will match and will be joined till we > enter try_partitionwise_join(), so the required RestrictInfos may not > be available in RelOptInfo::joininfo. Let me know your thoughts on > this. Here's some lengthy description of why I feel translations are better compared to computing restrictlist and joininfo for a child join from joining relation's joininfo Consider a query "select * from p, q, r where p.c1 = q.c1 and q.c1 = r.c1 and p.c2 + q.c2 < r.c2 and p.c3 != q.c3 and q.c3 != r.c3". The query has following clauses 1. p.c1 = q.c1 2. q.c1 = r.c1 3. p.c2 + q.c2 < r.c2 4. p.c3 != q.c3 5. q.c3 != r.c3 The first two clauses are added to EC machinery and do not appear in joininfo. They appear in restrictlist when we construct clauses in restrictlist from ECs. Let's ignore them for now. Assume that each table p, q, r has partitions (p1, p2, ...), (q1, q2, ...) and (r1, r2, ... ) respectively. Each triplet (pi, qi,ri) forms the set of matching partitions from p, q, r respectively for all "i". Consider join, p1q1r1. We will generate relations p1, q1, r1, p1q1, p1r1, q1r1 and p1q1r1 while building the last join. Below is description of how these clauses would look in each of these relations and the list they appear in when computing that join. Please notice the numeric suffixes carefully. p1. joininfo: p1.c2 + q.c2 < r.c2, p1.c3 != q.c3 restrictlist: <> q1 joininfo: p.c2 + q1.c2 < r.c2, p.c3 != q1.c3, q1.c3 != r.c3 restrictlist: <> r1 joininfo: p.c2 + q.c2 < r1.c2, q.c3 != r1.c3 restrictlist: <> p1q1 joininfo: p1.c2 + q1.c2 < r.c2, q1.c3 != r.c3 restrictlist: p1.c3 != q1.c3 q1r1 joininfo: p.c2 + q1.c2 < r1.c2, p.c3 != q1.c3 restrictlist: q1.c3 != r1.c3 p1r1 joininfo: p1.c2 + q.c2 < r1.c2, p1.c3 != q.c3, q.c3 != r1.c3 restrictlist: <> p1q1r1 joininfo: <> restrictlist for (p1q1)r1: p1.c2 + q1.c2 < r1.c2, q1.c3 != r1.c3 restrictlist for (p1r1)q1: p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3, q1.c3 != r1.c3 restrictlist for p1(q1r1): p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3 If we translate the clauses when building join e.g. translate p1.c3 != q1.c3 when building p1q1 or p1q1r1, it would cause repeated translations. So the translations need to be saved in lower relations when we detect matching partitions and then use these translations. Something I have done in the attached patches. But the problem is the same clause reaches its final translation through different intermediate translations as the join search advances. E.g. the evolution of p.c2 + q.c2 < r.c2 to p1.c2 + q1.c2 < r1.c2 has three different intemediate translations at second level of join. Each of these intermediate translations conflict with each other and none of them can be saved in any of the second level joins as a candidate for the last stage translation. Extending the logic in the patches would make those more complicated. Another possibility is to avoid the same clause being translated multiple times when building the join using RestrictInfo::rinfo_serial. But simply that won't help avoiding repeated translations caused by different join orders. E.g. we won't be able to detect that p.c2 + q.c2 < r.c2 has been translated to p1.c2 + q1.c2 < r1.c2 already when we computed (p1r1)q1 or p1(q1r1) or (p1q1)r1 whichever was computed earlier. For that we need some tracking outside the join relations themselves like I did in my first patch. Coming back to the problem of generating child restrictlist clauses from equivalence classes, I think it's easier with some tweaks to pass child relids down to the minions when dealing with child joins. It seems to be working as is but I haven't tested it thoroughly. Obtaining child clauses from parent clauses by translation and tracking the translations is less complex and may be more efficient too. I will post a patch on those lines soon. -- Best Wishes, Ashutosh Bapat
Attachment
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi All, On Fri, Aug 11, 2023 at 6:24 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Obtaining child clauses from parent clauses by translation and > tracking the translations is less complex and may be more efficient > too. I will post a patch on those lines soon. > PFA patch set to add infrastructure to track RestrictInfo translations and reuse them. PlannerInfo gets a new member "struct HTAB *child_rinfo_hash" which is a hash table of hash tables keyed by RestrictInfo::rinfo_serial. Each element in the array is a hash table of RestrictInfos keyed by RestrictInfo::required_relids as explained in my previous email. When building child clauses when a. building child join rels or b. when reparameterizing paths, we first access the first level hash table using RestrictInfo::rinfo_serial of the parent and search the required translation by computing the child RestrictInfo::required_relids obtained by translating RestrictInfo::required_relids of the parent RestrictInfo. If the translation doesn't exist, we create one and add to the hash table. RestrictInfo::required_relids is same for a RestrictInfo and its commuted RestrictInfo. The order of operands is important for IndexClauses. Hence we track the commuted RestrictInfo in a new field RestrictInfo::comm_rinfo. RestrictInfo::is_commuted differentiates between a RestrictInfo and its commuted version. This is explained as a comment in the patch. This scheme has a minor benefit of saving memory when the same RestrictInfo is commuted multiple times. Hash table of hash table is used instead of an array of hash tables since a. not every rinfo_serial has a RestrictInfo associated with it b. not every RestrictInfo has translations, c. I don't think the exact size of this array is not known till the planning ends since we continue to create new clauses as we create new RelOptInfos. Of course, an array can be repalloc'ed and unused slots in the array may not waste a lot of memory. I am open to change hash table to an array which may be more efficient. With these set of patches, the memory consumption stands as below Number of tables | without patch | with patch | % reduction | being joined | | | | -------------------------------------------------------------- 2 | 40.8 MiB | 37.4 MiB | 8.46% | 3 | 151.6 MiB | 135.0 MiB | 10.96% | 4 | 464.0 MiB | 403.6 MiB | 12.00% | 5 | 1663.9 MiB | 1329.1 MiB | 20.12% | The patch set is thus 0001 - patch used to measure memory used during planning 0002 - Patch to free intermediate Relids computed by adjust_child_relids_multilevel(). I didn't test memory consumption for multi-level partitioning. But this is clear improvement. In that function we free the AppendInfos array which as many pointers long as the number of relids. So it doesn't make sense not to free the Relids which can be {largest relid}/8 bytes long at least. 0003 - patch to save and reuse commuted RestrictInfo. This patch by itself shows a small memory saving (3%) in the query below where the same clause is commuted twice. The query does not contain any partitioned tables. create table t2 (a int primary key, b int, c int); create index t2_a_b on t2(a, b); select * from t2 where 10 = a Memory used without patch: 13696 bytes Memory used with patch: 13264 bytes 0004 - Patch which implements the hash table of hash table described above and also code to avoid repeated RestrictInfo list translations. I will add this patchset to next commitfest. -- Best Wishes, Ashutosh Bapat
Attachment
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > The patch set is thus > 0001 - patch used to measure memory used during planning > > 0002 - Patch to free intermediate Relids computed by > adjust_child_relids_multilevel(). I didn't test memory consumption for > multi-level partitioning. But this is clear improvement. In that > function we free the AppendInfos array which as many pointers long as > the number of relids. So it doesn't make sense not to free the Relids > which can be {largest relid}/8 bytes long at least. > > 0003 - patch to save and reuse commuted RestrictInfo. This patch by > itself shows a small memory saving (3%) in the query below where the > same clause is commuted twice. The query does not contain any > partitioned tables. > create table t2 (a int primary key, b int, c int); > create index t2_a_b on t2(a, b); > select * from t2 where 10 = a > Memory used without patch: 13696 bytes > Memory used with patch: 13264 bytes > > 0004 - Patch which implements the hash table of hash table described > above and also code to avoid repeated RestrictInfo list translations. > > I will add this patchset to next commitfest. > > -- > Best Wishes, > Ashutosh Bapat PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is the squashed version of the latest patch set at https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com. -- Best Wishes, Ashutosh Bapat
Attachment
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi! Thank you for your work on the subject, I think it's a really useful feature. I've reviewed your patch and I have a few questions.
First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging.
Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0].
[0] https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru
-- Regards, Alena Rybakina Postgres Professional
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Sorry, my first question was not clear, I mean: have you thought about creating a guc parameter to display memory planning information?Hi! Thank you for your work on the subject, I think it's a really useful feature. I've reviewed your patch and I have a few questions.
First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging.Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0].
[0] https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru
-- Regards, Alena Rybakina
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Fri, Nov 24, 2023 at 3:56 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote: > > On 24.11.2023 13:20, Alena Rybakina wrote: > > Hi! Thank you for your work on the subject, I think it's a really useful feature. > > I've reviewed your patch and I have a few questions. > > First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that thisis an important feature, but I think only for debugging. Not a GUC parameter but I have a proposal to use EXPLAIN for the same. [1] > > Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn'tyou generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0]. Not usually. But that's the only "key" we have to access a set of sematically same RestrictInfos. Relids is another key to access the exact RestrictInfo. A child RestrictInfo can not be used since there will many child RestrictInfos. Similar parent RestrictInfo can not be used since there will be multiple forms of the same RestrictInfo. [1] https://commitfest.postgresql.org/45/4492/ -- Best Wishes, Ashutosh Bapat
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > The patch set is thus > > 0001 - patch used to measure memory used during planning > > > > 0002 - Patch to free intermediate Relids computed by > > adjust_child_relids_multilevel(). I didn't test memory consumption for > > multi-level partitioning. But this is clear improvement. In that > > function we free the AppendInfos array which as many pointers long as > > the number of relids. So it doesn't make sense not to free the Relids > > which can be {largest relid}/8 bytes long at least. > > > > 0003 - patch to save and reuse commuted RestrictInfo. This patch by > > itself shows a small memory saving (3%) in the query below where the > > same clause is commuted twice. The query does not contain any > > partitioned tables. > > create table t2 (a int primary key, b int, c int); > > create index t2_a_b on t2(a, b); > > select * from t2 where 10 = a > > Memory used without patch: 13696 bytes > > Memory used with patch: 13264 bytes > > > > 0004 - Patch which implements the hash table of hash table described > > above and also code to avoid repeated RestrictInfo list translations. > > > > I will add this patchset to next commitfest. > > > > -- > > Best Wishes, > > Ashutosh Bapat > > PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is > the squashed version of the latest patch set at > https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com. CFBot shows that the patch does not apply anymore as in [1]: === Applying patches on top of PostgreSQL commit ID 7014c9a4bba2d1b67d60687afb5b2091c1d07f73 === === applying patch ./0001-Report-memory-used-for-planning-a-query-in--20231031.patch ... patching file src/test/regress/expected/explain.out Hunk #5 FAILED at 290. Hunk #6 succeeded at 545 (offset 4 lines). 1 out of 6 hunks FAILED -- saving rejects to file src/test/regress/expected/explain.out.rej patching file src/tools/pgindent/typedefs.list Hunk #1 succeeded at 1562 (offset 18 lines). Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_4564.log Regards, Vignesh
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Sat, Jan 27, 2024 at 8:26 AM vignesh C <vignesh21@gmail.com> wrote: > > On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > The patch set is thus > > > 0001 - patch used to measure memory used during planning > > > > > > 0002 - Patch to free intermediate Relids computed by > > > adjust_child_relids_multilevel(). I didn't test memory consumption for > > > multi-level partitioning. But this is clear improvement. In that > > > function we free the AppendInfos array which as many pointers long as > > > the number of relids. So it doesn't make sense not to free the Relids > > > which can be {largest relid}/8 bytes long at least. > > > > > > 0003 - patch to save and reuse commuted RestrictInfo. This patch by > > > itself shows a small memory saving (3%) in the query below where the > > > same clause is commuted twice. The query does not contain any > > > partitioned tables. > > > create table t2 (a int primary key, b int, c int); > > > create index t2_a_b on t2(a, b); > > > select * from t2 where 10 = a > > > Memory used without patch: 13696 bytes > > > Memory used with patch: 13264 bytes > > > > > > 0004 - Patch which implements the hash table of hash table described > > > above and also code to avoid repeated RestrictInfo list translations. > > > > > > I will add this patchset to next commitfest. > > > > > > -- > > > Best Wishes, > > > Ashutosh Bapat > > > > PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is > > the squashed version of the latest patch set at > > https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com. > > CFBot shows that the patch does not apply anymore as in [1]: > === Applying patches on top of PostgreSQL commit ID > 7014c9a4bba2d1b67d60687afb5b2091c1d07f73 === > === applying patch > ./0001-Report-memory-used-for-planning-a-query-in--20231031.patch > ... > patching file src/test/regress/expected/explain.out > Hunk #5 FAILED at 290. > Hunk #6 succeeded at 545 (offset 4 lines). > 1 out of 6 hunks FAILED -- saving rejects to file > src/test/regress/expected/explain.out.rej > patching file src/tools/pgindent/typedefs.list > Hunk #1 succeeded at 1562 (offset 18 lines). > > Please post an updated version for the same. > > [1] - http://cfbot.cputube.org/patch_46_4564.log > > Regards, > Vignesh Thanks Vignesh for the notification. PFA rebased patches. 0001 in earlier patch-set is now removed. -- Best Wishes, Ashutosh Bapat
Attachment
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi, After taking a look at the patch optimizing SpecialJoinInfo allocations, I decided to take a quick look at this one too. I don't have any specific comments on the code yet, but it seems quite a bit more complex than the other patch ... it's introducing a HTAB into the optimizer, surely that has costs too. I started by doing the same test as with the other patch, comparing master to the two patches (applied independently and both). And I got about this (in MB): tables master sjinfo rinfo both ----------------------------------------------- 2 37 36 34 33 3 138 129 122 113 4 421 376 363 318 5 1495 1254 1172 931 Unlike the SpecialJoinInfo patch, I haven't seen any reduction in planning time for this one. The reduction in allocated memory is nice. I wonder what's allocating the remaining memory, and we'd have to do to reduce it further. However, this is a somewhat extreme example - it's joining 5 tables, each with 1000 partitions, using a partition-wise join. It reduces the amount of memory, but the planning time is still quite high (and essentially the same as without the patch). So it's not like it'd make them significantly more practical ... do we have any other ideas/plans how to improve that? AFAIK we don't expect this to improve "regular" cases with modest number of tables / partitions, etc. But could it make them slower in some case? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On 19/2/2024 06:05, Tomas Vondra wrote: > However, this is a somewhat extreme example - it's joining 5 tables, > each with 1000 partitions, using a partition-wise join. It reduces the > amount of memory, but the planning time is still quite high (and > essentially the same as without the patch). So it's not like it'd make > them significantly more practical ... do we have any other ideas/plans > how to improve that? The planner principle of cleaning up all allocated structures after the optimization stage simplifies development and code. But, if we want to achieve horizontal scalability on many partitions, we should introduce per-partition memory context and reset it in between. GEQO already has a short-lived memory context, making designing extensions a bit more challenging but nothing too painful. -- regards, Andrei Lepikhov Postgres Professional
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > After taking a look at the patch optimizing SpecialJoinInfo allocations, > I decided to take a quick look at this one too. I don't have any > specific comments on the code yet, but it seems quite a bit more complex > than the other patch ... it's introducing a HTAB into the optimizer, > surely that has costs too. Thanks for looking into this too. > > I started by doing the same test as with the other patch, comparing > master to the two patches (applied independently and both). And I got > about this (in MB): > > tables master sjinfo rinfo both > ----------------------------------------------- > 2 37 36 34 33 > 3 138 129 122 113 > 4 421 376 363 318 > 5 1495 1254 1172 931 > > Unlike the SpecialJoinInfo patch, I haven't seen any reduction in > planning time for this one. Yeah. That agreed with my observation as well. > > The reduction in allocated memory is nice. I wonder what's allocating > the remaining memory, and we'd have to do to reduce it further. Please see reply to SpecialJoinInfo thread. Other that the current patches, we require memory efficient Relids implementation. I have shared some ideas in the slides I shared in the other thread, but haven't spent time experimenting with any ideas there. > > However, this is a somewhat extreme example - it's joining 5 tables, > each with 1000 partitions, using a partition-wise join. It reduces the > amount of memory, but the planning time is still quite high (and > essentially the same as without the patch). So it's not like it'd make > them significantly more practical ... do we have any other ideas/plans > how to improve that? Yuya has been working on reducing planning time [1]. Has some significant improvements in that area based on my experiments. But those patches are complex and still WIP. > > AFAIK we don't expect this to improve "regular" cases with modest number > of tables / partitions, etc. But could it make them slower in some case? > AFAIR, my experiments did not show any degradation in regular cases with modest number of tables/partitions. The variation in planning time was with the usual planning time variations. [1] https://www.postgresql.org/message-id/flat/CAExHW5uVZ3E5RT9cXHaxQ_DEK7tasaMN%3DD6rPHcao5gcXanY5w%40mail.gmail.com#112b3e104e0f9e39eb007abe075aae20 -- Best Wishes, Ashutosh Bapat
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> After taking a look at the patch optimizing SpecialJoinInfo allocations,
> I decided to take a quick look at this one too. I don't have any
> specific comments on the code yet, but it seems quite a bit more complex
> than the other patch ... it's introducing a HTAB into the optimizer,
> surely that has costs too.
Thanks for looking into this too.
>
> I started by doing the same test as with the other patch, comparing
> master to the two patches (applied independently and both). And I got
> about this (in MB):
>
> tables master sjinfo rinfo both
> -----------------------------------------------
> 2 37 36 34 33
> 3 138 129 122 113
> 4 421 376 363 318
> 5 1495 1254 1172 931
>
> Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
> planning time for this one.
Yeah. That agreed with my observation as well.
>
> The reduction in allocated memory is nice. I wonder what's allocating
> the remaining memory, and we'd have to do to reduce it further.
Please see reply to SpecialJoinInfo thread. Other that the current
patches, we require memory efficient Relids implementation. I have
shared some ideas in the slides I shared in the other thread, but
haven't spent time experimenting with any ideas there.
>
> However, this is a somewhat extreme example - it's joining 5 tables,
> each with 1000 partitions, using a partition-wise join. It reduces the
> amount of memory, but the planning time is still quite high (and
> essentially the same as without the patch). So it's not like it'd make
> them significantly more practical ... do we have any other ideas/plans
> how to improve that?
Yuya has been working on reducing planning time [1]. Has some
significant improvements in that area based on my experiments. But
those patches are complex and still WIP.
>
> AFAIK we don't expect this to improve "regular" cases with modest number
> of tables / partitions, etc. But could it make them slower in some case?
>
AFAIR, my experiments did not show any degradation in regular cases
with modest number of tables/partitions. The variation in planning
time was with the usual planning time variations.
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Mon, Aug 19, 2024 at 6:43 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Sorry for the delay in my response. > > On Wed, May 29, 2024 at 9:34 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > Documenting some comments from todays' patch review session > > I forgot to mention back then that both of the suggestions below came > from Tom Lane. > > > 1. Instead of a nested hash table, it might be better to use a flat hash table to save more memory. > > Done. It indeed saves memory without impacting planning time. > > > 2. new comm_rinfo member in RestrictInfo may have problems when copying RestrictInfo or translating it. Instead commutedversions may be tracked outside RestrictInfo > > After commit 767c598954bbf72e0535f667e2e0667765604b2a, > repameterization of parent paths for child relation happen at the time > of creating the plan. This reduces the number of child paths produced > by reparameterization and also reduces the number of RestrictInfos > that get translated during reparameterization. During > reparameterization commuted parent RestrictInfos are required to be > translated to child RestrictInfos. Before the commit, this led to > translating the same commuted parent RestrictInfo multiple times. > After the commit, only one path gets reparameterized for a given > parent and child pair. Hence we do not produce multiple copies of the > same commuted child RestrictInfo. Hence we don't need to keep track of > commuted child RestrictInfos anymore. Removed that portion of code > from the patches. > > I made detailed memory consumption measurements with this patch for > number of partitions changed from 0 (unpartitioned) to 1000 and for 2 > to 5-way joins. They are available in the spreadsheet at [1]. The raw > measurement data is in the first sheet named "pwj_mem_measurements raw > numbers". The averages over multiple runs are in second sheet named > "avg_numbers". Rest of the sheet represent the averages in more > consumable manner. Please note that the averages make sense only for > planning time since the memory consumption remains same with every > run. Also note that EXPLAIN now reports planning memory consumption in > kB. Any changes to memory consumption below 1kB are not reported and > hence not noticed. Here are some observations. > > 1. When partitionwise join is not enabled, no changes to planner's > memory consumption are observed. See sheet named "pwj disabled, > planning memory". > > 2. When partitionwise join is enabled, upto 17% (206MB) memory is > saved by the patch in case of 5-way self-join with 1000 partitions. > This is the maximum memory saving observed. The amount of memory saved > increases with the number of joins and number of partitions. See sheet > with name "pwj enabled, planning memory" > > 3. After commit 767c598954bbf72e0535f667e2e0667765604b2a, we do not > translate a parent RestrictInfo multiple times for the same > parent-child pair in case of a *2-way partitionwise join*. But we > still consume memory in saving the child RestrictInfo in the hash > table. Hence in case of 2-way join we see increased memory consumption > with the patch compared to master. The memory consumption increases by > 13kb, 23kB, 76kB and 146kB for 10, 100, 500, 1000 partitions > respectively. This increase is smaller compared to the overall memory > saving. In order to avoid this memory increase, we will need to avoid > using hash table for 2-way join. We will need to know whether there > will be more than one partitionwise join before translating the > RestrictInfos for the first partitionwise join. This is hard to > achieve in all the cases since the decision to use partitionwise join > happens at the time of creating paths for a given join relation, which > itself is computed on the fly. We may choose some heuristics which > take into account the number of partitioned tables in the query, their > partition schemes, and the quals in the query to decide whether or not > to track the translated child RestrictInfos. But that will make the > code more complex, but more importantly the heuristics may not be able > to keep up if we start using partitionwise join as an optimization > strategy for more cases (e.g. asymmetric partitionwise join [2]). The > attached patch looks like a good tradeoff to me. But other opinions > might vary. Suggestions are welcome. > > 4. There is no noticeable change in the planning time. I ran the same > experiment multiple times. The planning time variations from each > experiment do not show any noticeable pattern suggesting increase or > decrease in the planning time with the patch. > > A note about the code: I have added all the structures and functions > dealing with the RestrictInfo hash table at the end of restrictinfo.c. > I have not come across a C file in PostgreSQL code base where private > structures are defined in the middle the file; usually they are > defined at the beginning of the file. But I have chosen it that way > here since it makes it easy to document the hash table and the > functions at one place at the beginning of this code section. I am > open to suggestions which make the documentation easy while placing > the structures at the beginning of the file. > > [1] https://docs.google.com/spreadsheets/d/1CEjRWZ02vuR8fSwhYaNugewtX8f0kIm5pLoRA95f3s8/edit?usp=sharing > [2] https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com > Noticed that Tomas's email address has changed. Adding his new email address. -- Best Wishes, Ashutosh Bapat
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
> On Thu, Oct 10, 2024 at 05:36:10PM GMT, Ashutosh Bapat wrote: > > 3. With just patch 0001 applied, planning time usually shows > degradation (column Q and X in planning time sheets) with or without > PWJ enabled. I first thought that it might be because of the increased > size of PlannerInfo. We had seen a similar phenomenon when adding a > new member to WindowAggState [2]. Hence I introduced patch 0002 which > moves two fields around to not increase the size of structure. But > that doesn't fix the regression in the planning time (columns R and > Y). Apart from increasing the PlannerInfo size and may be object file > size, 0002 does not have any other impact. But the regression seen > with just that patch is more than what we saw in [2]. More investigate > is required to decide whether this regression is real or not and if > real, the root cause. Looking at the numbers, it seems that this > regression is causing the planning time regression in rest of the > patches. If we fix regression by 0001, we should not see much > regression in rest of the patches. I am looking for some guidance in > investigating this regression. Hi, I've tried to reproduce some subset of those results, in case if I would be able to notice anything useful. Strangely enough, I wasn't able to get much boost in planning time e.g. with 4 first patches, 100 partitions and 5 joins -- the results you've posted are showing about 16% in that case, where I'm getting only a couple of percents. Probably I'm doing something differently, but it's turned out to be hard to reconstruct (based only on this thread) how did you exactly benchmark the patch -- could you maybe summarize the benchmark in a reproducible way? From what I understand you were testing againt an empty partitioned table. Here is what I was doing: create table t1p (c1 int) partition by list(c1); select format('create table %I partition of t1p for values in (%s)', 't1p' || i, i) from generate_series(1, 100) i; \gexec do $x$ declare i record; plan float[]; plan_line text; begin for i in select * from generate_series(1, 1000) i loop for plan_line in execute format($y$ explain analyze select * from t1p t1, t1p t2, t1p t3, t1p t4, t1p t5 where t2.c1 = t1.c1 and t3.c1 = t2.c1 and t4.c1 = t3.c1 and t5.c1 = t4.c1 $y$) loop if plan_line like '%Planning Time%' then plan := array_append(plan, substring(plan_line from '\d+.\d+')::float); end if; end loop; end loop; -- skip the first record as prewarming raise warning 'avg: %', (select avg(v) from unnest(plan[2:]) v); raise warning 'median: %', (select percentile_cont(0.5) within group(order by v) from unnest(plan[2:]) v); end; $x$; As a side note, may I ask to attach benchmark results as files, rather than a link to a google spreadsheet? It feels nice to have something static to work with.