Thread: Memory consumed by child SpecialJoinInfo in partitionwise join planning
Hi All, In try_partitionwise_join() we create SpecialJoinInfo structures for child joins by translating SpecialJoinInfo structures applicable to the parent join. These SpecialJoinInfos are not used outside try_partitionwise_join() but we do not free memory allocated to those. In try_partitionwise_join() we create as many SpecialJoinInfos as the number of partitions. But try_partitionwise_join() itself is called as many times as the number of join orders for a given join. Thus the number of child SpecialJoinInfos that are created increases exponentially with the number of partitioned tables being joined. The attached patch (0002) fixes this issue by 1. declaring child SpecialJoinInfo as a local variable, thus allocating memory on the stack instead of CurrentMemoryContext. The memory on the stack is freed automatically. 2. Freeing the Relids in SpecialJoinInfo explicitly after SpecialJoinInfo has been used. We can not free the object trees in SpecialJoinInfo since those may be referenced in the paths. Similar to my previous emails [1], the memory consumption for given queries is measured using attached patch (0001). The table definitions and helper functions can be found in setup.sql and queries.sql. Following table shows the reduction in the memory using the attached patch (0002). Number of | without | | | Absolute | joining tables | patch | with patch | % diff | diff | -------------------------------------------------------------------- 2 | 40.9 MiB | 39.9 MiB | 2.27% | 925.7 KiB | 3 | 151.7 MiB | 142.6 MiB | 5.97% | 9.0 MiB | 4 | 464.0 MiB | 418.4 MiB | 9.83% | 45.6 MiB | 5 | 1663.9 MiB | 1419.4 MiB | 14.69% | 244.5 MiB | Since the memory consumed by SpecialJoinInfos is exponentially proportional to the number of tables being joined, we see that the patch is able to save more memory at higher number of tables joined. The attached patch (0002) frees members of individual SpecialJoinInfos. It might have some impact on the planning time. We may improve that by allocating the members in a separate memory context. The memory context will be allocated and deleted in each invocation of try_partitionwise_join(). I am assuming that deleting a memory context is more efficient than freeing bits of memory multiple times. But I have not tried that approach. Another approach would be to track the SpecialJoinInfo translations (similar to RestrictListInfo translations proposed in other email thread [2]) and avoid repeatedly translating the same SpecialJoinInfo. The approach will have similar disadvantages that it may increase size of SpecialJoinInfo structure even when planning the queries which do not need partitionwise join. So I haven't tried that approach yet. At this stage, I am looking for opinions on 1. whether it's worth reducing this memory 2. any suggestions/comments on which approach from above is more suitable or if there are other approaches References [1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com [2] https://www.postgresql.org/message-id/CAExHW5s=bCLMMq8n_bN6iU+Pjau0DS3z_6Dn6iLE69ESmsPMJQ@mail.gmail.com -- Best Wishes, Ashutosh Bapat
Attachment
On Thu, Jul 27, 2023 at 10:10 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
The attached patch (0002) fixes this issue by
1. declaring child SpecialJoinInfo as a local variable, thus
allocating memory on the stack instead of CurrentMemoryContext. The
memory on the stack is freed automatically.
2. Freeing the Relids in SpecialJoinInfo explicitly after
SpecialJoinInfo has been used.
it's OK to construct and free the SpecialJoinInfo for a child join on
the fly. So the approach in 0002 looks reasonable to me. But there is
something that needs to be fixed in 0002.
+ bms_free(child_sjinfo->commute_above_l);
+ bms_free(child_sjinfo->commute_above_r);
+ bms_free(child_sjinfo->commute_below_l);
+ bms_free(child_sjinfo->commute_below_r);
These four members in SpecialJoinInfo only contain outer join relids.
They do not need to be translated. So they would reference the same
memory areas in child_sjinfo as in parent_sjinfo. We should not free
them, otherwise they would become dangling pointers in parent sjinfo.
Thanks
Richard
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
Hi Richard, On Fri, Jul 28, 2023 at 2:03 PM Richard Guo <guofenglinux@gmail.com> wrote: > > > It doesn't seem too expensive to translate SpecialJoinInfos, so I think > it's OK to construct and free the SpecialJoinInfo for a child join on > the fly. So the approach in 0002 looks reasonable to me. But there is > something that needs to be fixed in 0002. Thanks for the review. I am fine if going ahead with 0002 is enough. > > + bms_free(child_sjinfo->commute_above_l); > + bms_free(child_sjinfo->commute_above_r); > + bms_free(child_sjinfo->commute_below_l); > + bms_free(child_sjinfo->commute_below_r); > > These four members in SpecialJoinInfo only contain outer join relids. > They do not need to be translated. So they would reference the same > memory areas in child_sjinfo as in parent_sjinfo. We should not free > them, otherwise they would become dangling pointers in parent sjinfo. > Good catch. Saved my time. I would have caught this rather hard way when running regression. Thanks a lot. I think we should 1. add an assert to make sure that commute_above_* do not require any transations i.e. they do not contain any parent relids of the child rels. 2. Do not free those. 3. Add a comment about keeping the build and free functions in sync. I will work on those next. -- Best Wishes, Ashutosh Bapat
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Fri, Jul 28, 2023 at 7:28 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>
> + bms_free(child_sjinfo->commute_above_l);
> + bms_free(child_sjinfo->commute_above_r);
> + bms_free(child_sjinfo->commute_below_l);
> + bms_free(child_sjinfo->commute_below_r);
>
> These four members in SpecialJoinInfo only contain outer join relids.
> They do not need to be translated. So they would reference the same
> memory areas in child_sjinfo as in parent_sjinfo. We should not free
> them, otherwise they would become dangling pointers in parent sjinfo.
>
Attached patchset fixing those.
0001 - patch to report planning memory, with to explain.out regression output fix. We may consider committing this as well.
0002 - with your comment addressed above.
I think we should 1. add an assert to make sure that commute_above_*
do not require any transations i.e. they do not contain any parent
relids of the child rels.
Looking at the definitions of commute_above and commute_below relids and OJ relids, I don't think these assertions are necessary. So didn't add those.
2. Do not free those.
Done
3. Add a comment about
keeping the build and free functions in sync.
Done.
--
Best Wishes,
Ashutosh Bapat
Attachment
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Fri, Aug 4, 2023 at 2:11 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Attached patchset fixing those. > 0001 - patch to report planning memory, with to explain.out regression output fix. We may consider committing this as well. > 0002 - with your comment addressed above. 0003 - Added this patch for handling SpecialJoinInfos for inner joins. These SpecialJoinInfos are created on the fly for parent joins. They can be created on the fly for child joins as well without requiring any translations. Thus they do not require any new memory. This patch is intended to be merged into 0002 finally. Added to the upcoming commitfest https://commitfest.postgresql.org/44/4500/ -- Best Wishes, Ashutosh Bapat
Attachment
Hi Ashutosh, On Wed, Aug 16, 2023 at 2:28 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Fri, Aug 4, 2023 at 2:11 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > Attached patchset fixing those. > > 0001 - patch to report planning memory, with to explain.out regression output fix. We may consider committing this aswell. > > 0002 - with your comment addressed above. > > 0003 - Added this patch for handling SpecialJoinInfos for inner joins. > These SpecialJoinInfos are created on the fly for parent joins. They > can be created on the fly for child joins as well without requiring > any translations. Thus they do not require any new memory. This patch > is intended to be merged into 0002 finally. I read this thread and have been reading the latest patch. At first glance, it seems quite straightforward to me. I agree with Richard that pfree()'ing 4 bitmapsets may not be a lot of added overhead. I will study the patch a bit more. Just one comment on 0003: + /* + * Dummy SpecialJoinInfos do not have any translated fields and hence have + * nothing to free. + */ + if (child_sjinfo->jointype == JOIN_INNER) + return; Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Wed, Sep 20, 2023 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote: > > I read this thread and have been reading the latest patch. At first > glance, it seems quite straightforward to me. I agree with Richard > that pfree()'ing 4 bitmapsets may not be a lot of added overhead. I > will study the patch a bit more. Thanks. > > Just one comment on 0003: > > + /* > + * Dummy SpecialJoinInfos do not have any translated fields and hence have > + * nothing to free. > + */ > + if (child_sjinfo->jointype == JOIN_INNER) > + return; > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? try_partitionwise_join() calls free_child_sjinfo_members() unconditionally i.e. it will be called even SpecialJoinInfos of INNER join. Above condition filters those out early. In fact if we convert into an Assert, in a production build the rest of the code will free Relids which are referenced somewhere else causing dangling pointers. -- Best Wishes, Ashutosh Bapat
On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Wed, Sep 20, 2023 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Just one comment on 0003: > > > > + /* > > + * Dummy SpecialJoinInfos do not have any translated fields and hence have > > + * nothing to free. > > + */ > > + if (child_sjinfo->jointype == JOIN_INNER) > > + return; > > > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? > > try_partitionwise_join() calls free_child_sjinfo_members() > unconditionally i.e. it will be called even SpecialJoinInfos of INNER > join. Above condition filters those out early. In fact if we convert > into an Assert, in a production build the rest of the code will free > Relids which are referenced somewhere else causing dangling pointers. You're right. I hadn't realized that the parent SpecialJoinInfo passed to try_partitionwise_join() might itself be a dummy. I guess I should've read the whole thing before commenting. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Thu, Sep 21, 2023 at 6:37 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > On Wed, Sep 20, 2023 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > Just one comment on 0003: > > > > > > + /* > > > + * Dummy SpecialJoinInfos do not have any translated fields and hence have > > > + * nothing to free. > > > + */ > > > + if (child_sjinfo->jointype == JOIN_INNER) > > > + return; > > > > > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? > > > > try_partitionwise_join() calls free_child_sjinfo_members() > > unconditionally i.e. it will be called even SpecialJoinInfos of INNER > > join. Above condition filters those out early. In fact if we convert > > into an Assert, in a production build the rest of the code will free > > Relids which are referenced somewhere else causing dangling pointers. > > You're right. I hadn't realized that the parent SpecialJoinInfo > passed to try_partitionwise_join() might itself be a dummy. I guess I > should've read the whole thing before commenting. Maybe there's something to improve in terms of readability, I don't know. -- Best Wishes, Ashutosh Bapat
Hi Ashutosh, On Thu, Sep 21, 2023 at 1:20 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Thu, Sep 21, 2023 at 6:37 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > On Wed, Sep 20, 2023 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > Just one comment on 0003: > > > > > > > > + /* > > > > + * Dummy SpecialJoinInfos do not have any translated fields and hence have > > > > + * nothing to free. > > > > + */ > > > > + if (child_sjinfo->jointype == JOIN_INNER) > > > > + return; > > > > > > > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? > > > > > > try_partitionwise_join() calls free_child_sjinfo_members() > > > unconditionally i.e. it will be called even SpecialJoinInfos of INNER > > > join. Above condition filters those out early. In fact if we convert > > > into an Assert, in a production build the rest of the code will free > > > Relids which are referenced somewhere else causing dangling pointers. > > > > You're right. I hadn't realized that the parent SpecialJoinInfo > > passed to try_partitionwise_join() might itself be a dummy. I guess I > > should've read the whole thing before commenting. > > Maybe there's something to improve in terms of readability, I don't know. Here are some comments. Please merge 0003 into 0002. + /* + * But the list of operator OIDs and the list of expressions may be + * referenced somewhere else. Do not free those. + */ I don't see build_child_join_sjinfo() copying any list of OIDs, so I'm not sure what the comment is referring to. Also, instead of "the list of expressions", it might be more useful to mention semi_rhs_expr, because that's the only one being copied. The comment for SpecialJoinInfo mentions that they are stored in PlannerInfo.join_info_list, but the child SpecialJoinInfos used by partitionwise joins are not, which I noticed has not been mentioned anywhere. Should we make a note of that in the SpecialJoinInfo's comment? Just out of curiosity, is their not being present in join_info_list problematic in some manner, such as missed optimization opportunities for child joins? I noticed there is a loop over join_info_list in add_paths_to_join_rel(), which does get called for child joinrels. I know this a bit off-topic for the thread, but thought to ask in case you've already thought about it. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Wed, Sep 27, 2023 at 2:30 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Here are some comments. Thanks for your review. > > Please merge 0003 into 0002. Done. > > + /* > + * But the list of operator OIDs and the list of expressions may be > + * referenced somewhere else. Do not free those. > + */ > > I don't see build_child_join_sjinfo() copying any list of OIDs, so I'm > not sure what the comment is referring to. Also, instead of "the list > of expressions", it might be more useful to mention semi_rhs_expr, > because that's the only one being copied. list of OID is semi_operators. It's copied as is from parent SpecialJoinInfo. But the way it's mentioned in the comment is confusing. I have modified the prologue of function to provide a general guidance on what can be freed and what should not be. and then specific examples. Let me know if this is more clear. > > The comment for SpecialJoinInfo mentions that they are stored in > PlannerInfo.join_info_list, but the child SpecialJoinInfos used by > partitionwise joins are not, which I noticed has not been mentioned > anywhere. Should we make a note of that in the SpecialJoinInfo's > comment? Good point. Since SpecialJoinInfos for JOIN_INNER are mentioned there, it makes sense to mention transient child SpecialJoinInfos as well. Done. > > Just out of curiosity, is their not being present in join_info_list > problematic in some manner, such as missed optimization opportunities > for child joins? I noticed there is a loop over join_info_list in > add_paths_to_join_rel(), which does get called for child joinrels. I > know this a bit off-topic for the thread, but thought to ask in case > you've already thought about it. The function has a comment and code to take care of this at the very beginning /* * PlannerInfo doesn't contain the SpecialJoinInfos created for joins * between child relations, even if there is a SpecialJoinInfo node for * the join between the topmost parents. So, while calculating Relids set * representing the restriction, consider relids of topmost parent of * partitions. */ if (joinrel->reloptkind == RELOPT_OTHER_JOINREL) joinrelids = joinrel->top_parent_relids; else joinrelids = joinrel->relids; PFA latest patch set 0001 - same as previous patch set 0002 - 0002 and 0003 from previous patch set squashed together 0003 - changes addressing above comments. -- Best Wishes, Ashutosh Bapat
Attachment
On Wed, Sep 27, 2023 at 8:07 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Wed, Sep 27, 2023 at 2:30 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Just out of curiosity, is their not being present in join_info_list > > problematic in some manner, such as missed optimization opportunities > > for child joins? I noticed there is a loop over join_info_list in > > add_paths_to_join_rel(), which does get called for child joinrels. I > > know this a bit off-topic for the thread, but thought to ask in case > > you've already thought about it. > > The function has a comment and code to take care of this at the very beginning > /* > * PlannerInfo doesn't contain the SpecialJoinInfos created for joins > * between child relations, even if there is a SpecialJoinInfo node for > * the join between the topmost parents. So, while calculating Relids set > * representing the restriction, consider relids of topmost parent of > * partitions. > */ > if (joinrel->reloptkind == RELOPT_OTHER_JOINREL) > joinrelids = joinrel->top_parent_relids; > else > joinrelids = joinrel->relids; Ah, that's accounted for. Thanks. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
On Wed, Sep 27, 2023 at 8:07 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Wed, Sep 27, 2023 at 2:30 PM Amit Langote <amitlangote09@gmail.com> wrote: > > + /* > > + * But the list of operator OIDs and the list of expressions may be > > + * referenced somewhere else. Do not free those. > > + */ > > > > I don't see build_child_join_sjinfo() copying any list of OIDs, so I'm > > not sure what the comment is referring to. Also, instead of "the list > > of expressions", it might be more useful to mention semi_rhs_expr, > > because that's the only one being copied. > > list of OID is semi_operators. It's copied as is from parent > SpecialJoinInfo. But the way it's mentioned in the comment is > confusing. I have modified the prologue of function to provide a > general guidance on what can be freed and what should not be. and then > specific examples. Let me know if this is more clear. Thanks. I would've kept the notes about specific members inside the function. Also, no need to have a note for pointers that are not deep-copied to begin with, such as semi_operators. IOW, something like the following would have sufficed: @@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo, /* * free_child_sjinfo_members * Free memory consumed by members of a child SpecialJoinInfo. + * + * Only members that are translated copies of their counterpart in the parent + * SpecialJoinInfo are freed here. However, members that could be referenced + * elsewhere are not freed. */ static void free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) @@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) bms_free(child_sjinfo->syn_lefthand); bms_free(child_sjinfo->syn_righthand); - /* - * But the list of operator OIDs and the list of expressions may be - * referenced somewhere else. Do not free those. - */ + /* semi_rhs_exprs may be referenced, so don't free. */ } -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Fri, Sep 29, 2023 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > IOW, something > like the following would have sufficed: > > @@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root, > SpecialJoinInfo *parent_sjinfo, > /* > * free_child_sjinfo_members > * Free memory consumed by members of a child SpecialJoinInfo. > + * > + * Only members that are translated copies of their counterpart in the parent > + * SpecialJoinInfo are freed here. However, members that could be referenced > + * elsewhere are not freed. > */ > static void > free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) > @@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) > bms_free(child_sjinfo->syn_lefthand); > bms_free(child_sjinfo->syn_righthand); > > - /* > - * But the list of operator OIDs and the list of expressions may be > - * referenced somewhere else. Do not free those. > - */ > + /* semi_rhs_exprs may be referenced, so don't free. */ > } Works for me. PFA patchset with these changes. I have still left the changes addressing your comments as a separate patch for easier review. -- Best Wishes, Ashutosh Bapat
Attachment
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
Rebased patchset. No changes in actual patches. 0001 is squashed version of patches submitted at https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com. On Tue, Oct 3, 2023 at 6:42 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Fri, Sep 29, 2023 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > > IOW, something > > like the following would have sufficed: > > > > @@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root, > > SpecialJoinInfo *parent_sjinfo, > > /* > > * free_child_sjinfo_members > > * Free memory consumed by members of a child SpecialJoinInfo. > > + * > > + * Only members that are translated copies of their counterpart in the parent > > + * SpecialJoinInfo are freed here. However, members that could be referenced > > + * elsewhere are not freed. > > */ > > static void > > free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) > > @@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) > > bms_free(child_sjinfo->syn_lefthand); > > bms_free(child_sjinfo->syn_righthand); > > > > - /* > > - * But the list of operator OIDs and the list of expressions may be > > - * referenced somewhere else. Do not free those. > > - */ > > + /* semi_rhs_exprs may be referenced, so don't free. */ > > } > > Works for me. PFA patchset with these changes. I have still left the > changes addressing your comments as a separate patch for easier > review. > > -- > Best Wishes, > Ashutosh Bapat -- Best Wishes, Ashutosh Bapat
On Wed, 4 Oct 2023 at 04:02, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Fri, Sep 29, 2023 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > > IOW, something > > like the following would have sufficed: > > > > @@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root, > > SpecialJoinInfo *parent_sjinfo, > > /* > > * free_child_sjinfo_members > > * Free memory consumed by members of a child SpecialJoinInfo. > > + * > > + * Only members that are translated copies of their counterpart in the parent > > + * SpecialJoinInfo are freed here. However, members that could be referenced > > + * elsewhere are not freed. > > */ > > static void > > free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) > > @@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) > > bms_free(child_sjinfo->syn_lefthand); > > bms_free(child_sjinfo->syn_righthand); > > > > - /* > > - * But the list of operator OIDs and the list of expressions may be > > - * referenced somewhere else. Do not free those. > > - */ > > + /* semi_rhs_exprs may be referenced, so don't free. */ > > } > > Works for me. PFA patchset with these changes. I have still left the > changes addressing your comments as a separate patch for easier > review. CFBot shows that the patch does not apply anymore as in [1]: === Applying patches on top of PostgreSQL commit ID 55627ba2d334ce98e1f5916354c46472d414bda6 === === applying patch ./0001-Report-memory-used-for-planning-a-query-in--20231003.patch ... Hunk #1 succeeded at 89 (offset -3 lines). patching file src/test/regress/expected/explain.out Hunk #5 FAILED at 285. Hunk #6 succeeded at 540 (offset 4 lines). 1 out of 6 hunks FAILED -- saving rejects to file src/test/regress/expected/explain.out.rej Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_4500.log Regards, Vignesh
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Fri, Jan 26, 2024 at 8:42 PM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, 4 Oct 2023 at 04:02, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Fri, Sep 29, 2023 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > IOW, something > > > like the following would have sufficed: .. snip... > > > > Works for me. PFA patchset with these changes. I have still left the > > changes addressing your comments as a separate patch for easier > > review. > > CFBot shows that the patch does not apply anymore as in [1]: > > === Applying patches on top of PostgreSQL commit ID > 55627ba2d334ce98e1f5916354c46472d414bda6 === > === applying patch > ./0001-Report-memory-used-for-planning-a-query-in--20231003.patch > ... > Hunk #1 succeeded at 89 (offset -3 lines). > patching file src/test/regress/expected/explain.out > Hunk #5 FAILED at 285. > Hunk #6 succeeded at 540 (offset 4 lines). > 1 out of 6 hunks FAILED -- saving rejects to file > src/test/regress/expected/explain.out.rej > > Please post an updated version for the same. > > [1] - http://cfbot.cputube.org/patch_46_4500.log Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch addressing Amit's comments is still a separate patch for him to review. -- Best Wishes, Ashutosh Bapat
Attachment
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Andrei Lepikhov
Date:
On 30/1/2024 12:44, Ashutosh Bapat wrote: > Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch > addressing Amit's comments is still a separate patch for him to > review. Thanks for this improvement. Working with partitions, I frequently see peaks of memory consumption during planning. So, maybe one more case can be resolved here. Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do we really need it as a separate routine? It might be better to inline this code. Patch 0002 adds valuable comments, and I'm OK with that. Also, as I remember, some extensions, such as pg_hint_plan, call build_child_join_sjinfo. It is OK to break the interface with a major version. But what if they need child_sjinfo a bit longer and collect links to this structure? I don't think it is a real stopper, but it is worth additional analysis. -- regards, Andrei Lepikhov Postgres Professional
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Wed, Feb 14, 2024 at 9:50 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > > On 30/1/2024 12:44, Ashutosh Bapat wrote: > > Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch > > addressing Amit's comments is still a separate patch for him to > > review. > Thanks for this improvement. Working with partitions, I frequently see > peaks of memory consumption during planning. So, maybe one more case can > be resolved here. > Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do > we really need it as a separate routine? It might be better to inline > this code. try_partitionwise_join() is already 200 lines long. A separate function is better than adding more lines to try_partitionwise_join(). Also if someone wants to call build_child_join_sjinfo() outside try_partitionwise_join() may find free_child_sjinfo_members() handy. > Patch 0002 adds valuable comments, and I'm OK with that. > > Also, as I remember, some extensions, such as pg_hint_plan, call > build_child_join_sjinfo. It is OK to break the interface with a major > version. But what if they need child_sjinfo a bit longer and collect > links to this structure? I don't think it is a real stopper, but it is > worth additional analysis. If these extensions call build_child_join_sjinfo() and do not call free_child_sjinfo_members, they can keep their child sjinfo as long as they want. I didn't understand your concern. -- Best Wishes, Ashutosh Bapat
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Andrei Lepikhov
Date:
On 14/2/2024 13:32, Ashutosh Bapat wrote: > On Wed, Feb 14, 2024 at 9:50 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> >> On 30/1/2024 12:44, Ashutosh Bapat wrote: >>> Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch >>> addressing Amit's comments is still a separate patch for him to >>> review. >> Thanks for this improvement. Working with partitions, I frequently see >> peaks of memory consumption during planning. So, maybe one more case can >> be resolved here. >> Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do >> we really need it as a separate routine? It might be better to inline >> this code. > > try_partitionwise_join() is already 200 lines long. A separate > function is better than adding more lines to try_partitionwise_join(). > Also if someone wants to call build_child_join_sjinfo() outside > try_partitionwise_join() may find free_child_sjinfo_members() handy. Make sense, thanks. >> Patch 0002 adds valuable comments, and I'm OK with that. >> >> Also, as I remember, some extensions, such as pg_hint_plan, call >> build_child_join_sjinfo. It is OK to break the interface with a major >> version. But what if they need child_sjinfo a bit longer and collect >> links to this structure? I don't think it is a real stopper, but it is >> worth additional analysis. > > If these extensions call build_child_join_sjinfo() and do not call > free_child_sjinfo_members, they can keep their child sjinfo as long as > they want. I didn't understand your concern. Nothing special. This patch makes external code responsible for allocation of the structure and it adds more paths to do something wrong. Current code looks good. -- regards, Andrei Lepikhov Postgres Professional
Hi, I took a quick look at this patch today. I certainly agree with the intent to reduce the amount of memory during planning, assuming it's not overly disruptive. And I think this patch is fairly localized and looks sensible. That being said I'm a big fan of using a local variable on stack and filling it. I'd probably go with the usual palloc/pfree, because that makes it much easier to use - the callers would not be responsible for allocating the SpecialJoinInfo struct. Sure, it's a little bit of overhead, but with the AllocSet caching I doubt it's measurable. I did put this through check-world on amd64/arm64, with valgrind, without any issue. I also tried the scripts shared by Ashutosh in his initial message (with some minor fixes, adding MEMORY to explain etc). The results with the 20240130 patches are like this: tables master patched ----------------------------- 2 40.8 39.9 3 151.7 142.6 4 464.0 418.5 5 1663.9 1419.5 That's certainly a nice improvement, and it even reduces the amount of time for planning (the 5-table join goes from 18s to 17s on my laptop). That's nice, although 17 seconds for planning is not ... great. That being said, the amount of remaining memory needed by planning is still pretty high - we save ~240MB for a join of 5 tables, but we still need ~1.4GB. Yes, this is a bit extreme example, and it probably is not very common to join 5 tables with 1000 partitions each ... Do we know what are the other places consuming the 1.4GB of memory? Considering my recent thread about scalability, where malloc() turned out to be one of the main culprits, I wonder if maybe there's a lot to gain by reducing the memory usage ... Our attitude to memory usage is that it doesn't really matter if we keep it allocated for a bit, because we'll free it shortly. And that may be true for "modest" memory usage, but with 1.4GB that doesn't seem great, and the malloc overhead can be pretty bad. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Sun, Feb 18, 2024 at 10:55 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > I took a quick look at this patch today. I certainly agree with the > intent to reduce the amount of memory during planning, assuming it's not > overly disruptive. And I think this patch is fairly localized and looks > sensible. Thanks for looking at the patch-set. > > That being said I'm a big fan of using a local variable on stack and > filling it. I'd probably go with the usual palloc/pfree, because that > makes it much easier to use - the callers would not be responsible for > allocating the SpecialJoinInfo struct. Sure, it's a little bit of > overhead, but with the AllocSet caching I doubt it's measurable. You are suggesting that instead of declaring a local variable of type SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return SpecialJoinInfo which will be freed by free_child_sjinfo() (free_child_sjinfo_members in the patch). I am fine with that. > > I did put this through check-world on amd64/arm64, with valgrind, > without any issue. I also tried the scripts shared by Ashutosh in his > initial message (with some minor fixes, adding MEMORY to explain etc). > > The results with the 20240130 patches are like this: > > tables master patched > ----------------------------- > 2 40.8 39.9 > 3 151.7 142.6 > 4 464.0 418.5 > 5 1663.9 1419.5 > > That's certainly a nice improvement, and it even reduces the amount of > time for planning (the 5-table join goes from 18s to 17s on my laptop). > That's nice, although 17 seconds for planning is not ... great. > > That being said, the amount of remaining memory needed by planning is > still pretty high - we save ~240MB for a join of 5 tables, but we still > need ~1.4GB. Yes, this is a bit extreme example, and it probably is not > very common to join 5 tables with 1000 partitions each ... > Yes. > Do we know what are the other places consuming the 1.4GB of memory? Please find the breakdown at [1]. Investigating further, I found that memory consumed by Bitmapsets is not explicitly visible through that breakdown. But Bitmapsets consume a lot of memory in this case. I shared google slides with raw numbers in [2]. The Gslides can be found at https://docs.google.com/presentation/d/1S9BiAADhX-Fv9tDbx5R5Izq4blAofhZMhHcO1c-wzfI/edit#slide=id.p. According to that measurement, Bitmapset objects created while planning a 5-way join between partitioned tables with 1000 partitions each consumed 769.795 MiB as compared to 19.728 KiB consumed by Bitmapsets when planning a 5-way non-partitioned join. See slide 3 for detailed numbers. > Considering my recent thread about scalability, where malloc() turned > out to be one of the main culprits, I wonder if maybe there's a lot to > gain by reducing the memory usage ... Our attitude to memory usage is > that it doesn't really matter if we keep it allocated for a bit, because > we'll free it shortly. And that may be true for "modest" memory usage, > but with 1.4GB that doesn't seem great, and the malloc overhead can be > pretty bad. > That probably explains why we see some modest speedups because of my memory saving patches. Thanks for sharing it. [1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com [2] https://www.postgresql.org/message-id/CAExHW5t0cPNjJRPRtt%3D%2Bc5SiTeBPHvx%3DSd2n8EO%2B7XdVuE8_YQ%40mail.gmail.com -- Best Wishes, Ashutosh Bapat
Hi Ashutosh, On Mon, Feb 19, 2024 at 10:01 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Sun, Feb 18, 2024 at 10:55 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > > Hi, > > > > I took a quick look at this patch today. I certainly agree with the > > intent to reduce the amount of memory during planning, assuming it's not > > overly disruptive. And I think this patch is fairly localized and looks > > sensible. > > Thanks for looking at the patch-set. > > > > That being said I'm a big fan of using a local variable on stack and > > filling it. I'd probably go with the usual palloc/pfree, because that > > makes it much easier to use - the callers would not be responsible for > > allocating the SpecialJoinInfo struct. Sure, it's a little bit of > > overhead, but with the AllocSet caching I doubt it's measurable. > > You are suggesting that instead of declaring a local variable of type > SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return > SpecialJoinInfo which will be freed by free_child_sjinfo() > (free_child_sjinfo_members in the patch). I am fine with that. Agree with Tomas about using makeNode() / pfree(). Having the pfree() kind of makes it extra-evident that those SpecialJoinInfos are transitory. > > I did put this through check-world on amd64/arm64, with valgrind, > > without any issue. I also tried the scripts shared by Ashutosh in his > > initial message (with some minor fixes, adding MEMORY to explain etc). > > > > The results with the 20240130 patches are like this: > > > > tables master patched > > ----------------------------- > > 2 40.8 39.9 > > 3 151.7 142.6 > > 4 464.0 418.5 > > 5 1663.9 1419.5 Could you please post the numbers with the palloc() / pfree() version? -- Thanks, Amit Langote
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
Hi Amit,
On Fri, Mar 15, 2024 at 11:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > That being said I'm a big fan of using a local variable on stack and
> > filling it. I'd probably go with the usual palloc/pfree, because that
> > makes it much easier to use - the callers would not be responsible for
> > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > overhead, but with the AllocSet caching I doubt it's measurable.
>
> You are suggesting that instead of declaring a local variable of type
> SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> SpecialJoinInfo which will be freed by free_child_sjinfo()
> (free_child_sjinfo_members in the patch). I am fine with that.
Agree with Tomas about using makeNode() / pfree(). Having the pfree()
kind of makes it extra-evident that those SpecialJoinInfos are
transitory.
Attached patch-set
0001 - original patch as is
0002 - addresses your first set of comments
0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo structure.
I will squash both 0002 and 0003 into 0001 once you review those changes and are fine with those.
> > I did put this through check-world on amd64/arm64, with valgrind,
> > without any issue. I also tried the scripts shared by Ashutosh in his
> > initial message (with some minor fixes, adding MEMORY to explain etc).
> >
> > The results with the 20240130 patches are like this:
> >
> > tables master patched
> > -----------------------------
> > 2 40.8 39.9
> > 3 151.7 142.6
> > 4 464.0 418.5
> > 5 1663.9 1419.5
Could you please post the numbers with the palloc() / pfree() version?
Here are they
tables | master | patched
--------+---------+---------
2 | 29 MB | 28 MB
3 | 102 MB | 93 MB
4 | 307 MB | 263 MB
5 | 1076 MB | 843 MB
--------+---------+---------
2 | 29 MB | 28 MB
3 | 102 MB | 93 MB
4 | 307 MB | 263 MB
5 | 1076 MB | 843 MB
The numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure memory that time is different from the one that we committed. So there's a slight difference in the way we measure memory as well.
--
Best Wishes,
Ashutosh Bapat
Attachment
On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Hi Amit,On Fri, Mar 15, 2024 at 11:45 AM Amit Langote <amitlangote09@gmail.com> wrote:> >
> > That being said I'm a big fan of using a local variable on stack and
> > filling it. I'd probably go with the usual palloc/pfree, because that
> > makes it much easier to use - the callers would not be responsible for
> > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > overhead, but with the AllocSet caching I doubt it's measurable.
>
> You are suggesting that instead of declaring a local variable of type
> SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> SpecialJoinInfo which will be freed by free_child_sjinfo()
> (free_child_sjinfo_members in the patch). I am fine with that.
Agree with Tomas about using makeNode() / pfree(). Having the pfree()
kind of makes it extra-evident that those SpecialJoinInfos are
transitory.Attached patch-set0001 - original patch as is0002 - addresses your first set of comments0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo structure.I will squash both 0002 and 0003 into 0001 once you review those changes and are fine with those.
Thanks for the new patches.
> > I did put this through check-world on amd64/arm64, with valgrind,
> > without any issue. I also tried the scripts shared by Ashutosh in his
> > initial message (with some minor fixes, adding MEMORY to explain etc).
> >
> > The results with the 20240130 patches are like this:
> >
> > tables master patched
> > -----------------------------
> > 2 40.8 39.9
> > 3 151.7 142.6
> > 4 464.0 418.5
> > 5 1663.9 1419.5
Could you please post the numbers with the palloc() / pfree() version?Here are theytables | master | patched
--------+---------+---------
2 | 29 MB | 28 MB
3 | 102 MB | 93 MB
4 | 307 MB | 263 MB
5 | 1076 MB | 843 MBThe numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure memory that time is different from the one that we committed. So there's a slight difference in the way we measure memory as well.
Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically, to see if the palloc / frees add noticeable overhead.
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Mon, Mar 18, 2024 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:Hi Amit,On Fri, Mar 15, 2024 at 11:45 AM Amit Langote <amitlangote09@gmail.com> wrote:> >
> > That being said I'm a big fan of using a local variable on stack and
> > filling it. I'd probably go with the usual palloc/pfree, because that
> > makes it much easier to use - the callers would not be responsible for
> > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > overhead, but with the AllocSet caching I doubt it's measurable.
>
> You are suggesting that instead of declaring a local variable of type
> SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> SpecialJoinInfo which will be freed by free_child_sjinfo()
> (free_child_sjinfo_members in the patch). I am fine with that.
Agree with Tomas about using makeNode() / pfree(). Having the pfree()
kind of makes it extra-evident that those SpecialJoinInfos are
transitory.Attached patch-set0001 - original patch as is0002 - addresses your first set of comments0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo structure.I will squash both 0002 and 0003 into 0001 once you review those changes and are fine with those.Thanks for the new patches.> > I did put this through check-world on amd64/arm64, with valgrind,
> > without any issue. I also tried the scripts shared by Ashutosh in his
> > initial message (with some minor fixes, adding MEMORY to explain etc).
> >
> > The results with the 20240130 patches are like this:
> >
> > tables master patched
> > -----------------------------
> > 2 40.8 39.9
> > 3 151.7 142.6
> > 4 464.0 418.5
> > 5 1663.9 1419.5
Could you please post the numbers with the palloc() / pfree() version?Here are theytables | master | patched
--------+---------+---------
2 | 29 MB | 28 MB
3 | 102 MB | 93 MB
4 | 307 MB | 263 MB
5 | 1076 MB | 843 MBThe numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure memory that time is different from the one that we committed. So there's a slight difference in the way we measure memory as well.Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically, to see if the palloc / frees add noticeable overhead.
No problem. Here you go
tables | master | patched | perc_change
--------+----------+----------+-------------
2 | 477.87 | 492.32 | -3.02
3 | 1970.83 | 1989.52 | -0.95
4 | 6305.01 | 6268.81 | 0.57
5 | 19261.56 | 18684.86 | 2.99
+ve change indicates reduced planning time. It seems that the planning time improves as the number of tables increases. But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If you want, I can repeat the experiment.
--
Best Wishes,
Ashutosh Bapat
On Mon, Mar 18, 2024 at 8:57 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Mon, Mar 18, 2024 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote: >> On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: >>> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote <amitlangote09@gmail.com> wrote: >>>> Could you please post the numbers with the palloc() / pfree() version? >>> >>> Here are they >>> tables | master | patched >>> --------+---------+--------- >>> 2 | 29 MB | 28 MB >>> 3 | 102 MB | 93 MB >>> 4 | 307 MB | 263 MB >>> 5 | 1076 MB | 843 MB >>> >>> The numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure memorythat time is different from the one that we committed. So there's a slight difference in the way we measure memoryas well. >> >> >> Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically, tosee if the palloc / frees add noticeable overhead. > > No problem. Here you go > > tables | master | patched | perc_change > --------+----------+----------+------------- > 2 | 477.87 | 492.32 | -3.02 > 3 | 1970.83 | 1989.52 | -0.95 > 4 | 6305.01 | 6268.81 | 0.57 > 5 | 19261.56 | 18684.86 | 2.99 > > +ve change indicates reduced planning time. It seems that the planning time improves as the number of tables increases.But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If youwant, I can repeat the experiment. Thanks. I also wanted to see cpu time difference between the two approaches of SpecialJoinInfo allocation -- on stack and on heap. So comparing times with the previous version of the patch using stack allocation and the new one with palloc. I'm hoping that the new approach is only worse in the noise range. -- Thanks, Amit Langote
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Mon, Mar 18, 2024 at 5:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically, to see if the palloc / frees add noticeable overhead.
>
> No problem. Here you go
>
> tables | master | patched | perc_change
> --------+----------+----------+-------------
> 2 | 477.87 | 492.32 | -3.02
> 3 | 1970.83 | 1989.52 | -0.95
> 4 | 6305.01 | 6268.81 | 0.57
> 5 | 19261.56 | 18684.86 | 2.99
>
> +ve change indicates reduced planning time. It seems that the planning time improves as the number of tables increases. But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If you want, I can repeat the experiment.
Thanks. I also wanted to see cpu time difference between the two
approaches of SpecialJoinInfo allocation -- on stack and on heap. So
comparing times with the previous version of the patch using stack
allocation and the new one with palloc. I'm hoping that the new
approach is only worse in the noise range.
Ah, sorry, I didn't read it carefully. Alvaro made me realise that I have been gathering numbers using assert enabled builds, so they are not that reliable. Here they are with non-assert enabled builds.
planning time (in ms) reported by EXPLAIN.
tables | master | stack_alloc | perc_change_1 | palloc | perc_change_2 | total_perc_change
--------+----------+-------------+---------------+----------+---------------+-------------------
2 | 338.1 | 333.92 | 1.24 | 332.16 | 0.53 | 1.76
3 | 1507.93 | 1475.76 | 2.13 | 1472.79 | 0.20 | 2.33
4 | 5099.45 | 4980.35 | 2.34 | 4947.3 | 0.66 | 2.98
5 | 15442.64 | 15531.94 | -0.58 | 15393.41 | 0.89 | 0.32
--------+----------+-------------+---------------+----------+---------------+-------------------
2 | 338.1 | 333.92 | 1.24 | 332.16 | 0.53 | 1.76
3 | 1507.93 | 1475.76 | 2.13 | 1472.79 | 0.20 | 2.33
4 | 5099.45 | 4980.35 | 2.34 | 4947.3 | 0.66 | 2.98
5 | 15442.64 | 15531.94 | -0.58 | 15393.41 | 0.89 | 0.32
stack_alloc = timings with SpecialJoinInfo on stack
perc_change_1 = percentage change of above wrt master
palloc = timings with palloc'ed SpecialJoinInfo
perc_change_2 = percentage change of above wrt timings with stack_alloc
total_perc_change = percentage change between master and all patches
total change is within noise. Timing with palloc is better than that with SpecialJoinInfo on stack but only marginally. I don't think that means palloc based allocation is better or worse than stack based.
You requested CPU time difference between stack based SpecialJoinInfo and palloc'ed SpecialJoinInfo. Here it is
tables | stack_alloc | palloc | perc_change
--------+-------------+----------+-------------
2 | 0.438204 | 0.438986 | -0.18
3 | 1.224672 | 1.238781 | -1.15
4 | 3.511317 | 3.663334 | -4.33
5 | 9.283856 | 9.340516 | -0.61
--------+-------------+----------+-------------
2 | 0.438204 | 0.438986 | -0.18
3 | 1.224672 | 1.238781 | -1.15
4 | 3.511317 | 3.663334 | -4.33
5 | 9.283856 | 9.340516 | -0.61
Yes, there's a consistent degradation albeit within noise levels.
The memory measurements
tables | master | with_patch
--------+--------+------------
2 | 26 MB | 26 MB
3 | 93 MB | 84 MB
4 | 277 MB | 235 MB
5 | 954 MB | 724 MB
--------+--------+------------
2 | 26 MB | 26 MB
3 | 93 MB | 84 MB
4 | 277 MB | 235 MB
5 | 954 MB | 724 MB
The first row shows the same value because of rounding. The actual values there are 27740432 and 26854704 resp.
Please let me know if you need anything.
Best Wishes,
Ashutosh Bapat
Hi Ashutosh, On Tue, Mar 19, 2024 at 12:47 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Mon, Mar 18, 2024 at 5:40 PM Amit Langote <amitlangote09@gmail.com> wrote: >> >> >> >> Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically,to see if the palloc / frees add noticeable overhead. >> > >> > No problem. Here you go >> > >> > tables | master | patched | perc_change >> > --------+----------+----------+------------- >> > 2 | 477.87 | 492.32 | -3.02 >> > 3 | 1970.83 | 1989.52 | -0.95 >> > 4 | 6305.01 | 6268.81 | 0.57 >> > 5 | 19261.56 | 18684.86 | 2.99 >> > >> > +ve change indicates reduced planning time. It seems that the planning time improves as the number of tables increases.But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If youwant, I can repeat the experiment. >> >> Thanks. I also wanted to see cpu time difference between the two >> approaches of SpecialJoinInfo allocation -- on stack and on heap. So >> comparing times with the previous version of the patch using stack >> allocation and the new one with palloc. I'm hoping that the new >> approach is only worse in the noise range. > > > Ah, sorry, I didn't read it carefully. Alvaro made me realise that I have been gathering numbers using assert enabled builds,so they are not that reliable. Here they are with non-assert enabled builds. > > planning time (in ms) reported by EXPLAIN. > tables | master | stack_alloc | perc_change_1 | palloc | perc_change_2 | total_perc_change > --------+----------+-------------+---------------+----------+---------------+------------------- > 2 | 338.1 | 333.92 | 1.24 | 332.16 | 0.53 | 1.76 > 3 | 1507.93 | 1475.76 | 2.13 | 1472.79 | 0.20 | 2.33 > 4 | 5099.45 | 4980.35 | 2.34 | 4947.3 | 0.66 | 2.98 > 5 | 15442.64 | 15531.94 | -0.58 | 15393.41 | 0.89 | 0.32 > > stack_alloc = timings with SpecialJoinInfo on stack > perc_change_1 = percentage change of above wrt master > palloc = timings with palloc'ed SpecialJoinInfo > perc_change_2 = percentage change of above wrt timings with stack_alloc > total_perc_change = percentage change between master and all patches > > total change is within noise. Timing with palloc is better than that with SpecialJoinInfo on stack but only marginally.I don't think that means palloc based allocation is better or worse than stack based. > > You requested CPU time difference between stack based SpecialJoinInfo and palloc'ed SpecialJoinInfo. Here it is > tables | stack_alloc | palloc | perc_change > --------+-------------+----------+------------- > 2 | 0.438204 | 0.438986 | -0.18 > 3 | 1.224672 | 1.238781 | -1.15 > 4 | 3.511317 | 3.663334 | -4.33 > 5 | 9.283856 | 9.340516 | -0.61 > > Yes, there's a consistent degradation albeit within noise levels. > > The memory measurements > tables | master | with_patch > --------+--------+------------ > 2 | 26 MB | 26 MB > 3 | 93 MB | 84 MB > 4 | 277 MB | 235 MB > 5 | 954 MB | 724 MB > > The first row shows the same value because of rounding. The actual values there are 27740432 and 26854704 resp. > > Please let me know if you need anything. Thanks for repeating the benchmark. So I don't see a problem with keeping the existing palloc() way of allocating the SpecialJoinInfos(). We're adding a few cycles by adding free_child_join_sjinfo() (see my delta patch about the rename), but the reduction in memory consumption due to it, which is our goal here, far outweighs what looks like a minor CPU slowdown. I have squashed 0002, 0003 into 0001, done some changes of my own, and attached the delta patch as 0002. I've listed the changes in the commit message. Let me know if anything seems amiss. I'd like to commit 0001 + 0002 + any other changes based on your comments (if any) sometime early next week. -- Thanks, Amit Langote
Attachment
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Fri, Mar 22, 2024 at 2:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Please let me know if you need anything.
Thanks for repeating the benchmark. So I don't see a problem with
keeping the existing palloc() way of allocating the
SpecialJoinInfos(). We're adding a few cycles by adding
free_child_join_sjinfo() (see my delta patch about the rename), but
the reduction in memory consumption due to it, which is our goal here,
far outweighs what looks like a minor CPU slowdown.
I have squashed 0002, 0003 into 0001, done some changes of my own, and
attached the delta patch as 0002. I've listed the changes in the
commit message. Let me know if anything seems amiss.
Thanks Amit for your changes.
> * Revert (IMO) unnecessary modifications of build_child_join_sjinfo().
> For example, hunks to rename sjinfo to child_sjinfo seem unnecessary,
> because it's clear from the context that they are "child" sjinfos.
>
Ok.
> * Rename free_child_sjinfo_members() to free_child_join_sjinfo() to
> be symmetric with build_child_join_sjinfo(). Note that the function
> is charged with freeing the sjinfo itself too. Like
> build_child_join_sjinfo(), name the argument sjinfo, not
> child_sjinfo.
Yes. It should have been done in my patch.
>
> * Don't set the child sjinfo pointer to NULL after freeing it. While
> a good convention in general, it's not really common in the planner
> code, so doing it here seems a bit overkill
That function is the last line in the block where pointer variable is declared.
As of now there is no danger of the dangling pointer being used.
>
> * Rename make_dummy_sjinfo() to init_dummy_sjinfo() because the
> function is not really making it per se, only initializing fields
> in the SpecialJoinInfo struct made available by the caller.
Right.
>
> * Make init_dummy_sjinfo() take Relids instead of RelOptInfos because
> that's what's needed. Also allows to reduce changes to
> build_child_join_sjinfo()'s signature.
I remember, in one of the versions it was Relids and then I changed to
RelOptInfos for some reason. The latest version seems to use just Relids. So
this looks better.
>
> * Update the reason mentioned in a comment in free_child_join_sjinfo()
> about why semi_rhs_expr is not freed -- it may be freed, but there's
> no way to "deep" free it, so we leave it alone.
>
> * Update a comment somewhere about why SpecialJoinInfo can be
> "transient" sometimes.
With the later code, semi_rhs_expr is indeed free'able. It wasn't so when I
wrote the patches. Thanks for updating the comment. I wish we would have
freeObject(). Nothing that can be done for this patch.
> For example, hunks to rename sjinfo to child_sjinfo seem unnecessary,
> because it's clear from the context that they are "child" sjinfos.
>
Ok.
> * Rename free_child_sjinfo_members() to free_child_join_sjinfo() to
> be symmetric with build_child_join_sjinfo(). Note that the function
> is charged with freeing the sjinfo itself too. Like
> build_child_join_sjinfo(), name the argument sjinfo, not
> child_sjinfo.
Yes. It should have been done in my patch.
>
> * Don't set the child sjinfo pointer to NULL after freeing it. While
> a good convention in general, it's not really common in the planner
> code, so doing it here seems a bit overkill
That function is the last line in the block where pointer variable is declared.
As of now there is no danger of the dangling pointer being used.
>
> * Rename make_dummy_sjinfo() to init_dummy_sjinfo() because the
> function is not really making it per se, only initializing fields
> in the SpecialJoinInfo struct made available by the caller.
Right.
>
> * Make init_dummy_sjinfo() take Relids instead of RelOptInfos because
> that's what's needed. Also allows to reduce changes to
> build_child_join_sjinfo()'s signature.
I remember, in one of the versions it was Relids and then I changed to
RelOptInfos for some reason. The latest version seems to use just Relids. So
this looks better.
>
> * Update the reason mentioned in a comment in free_child_join_sjinfo()
> about why semi_rhs_expr is not freed -- it may be freed, but there's
> no way to "deep" free it, so we leave it alone.
>
> * Update a comment somewhere about why SpecialJoinInfo can be
> "transient" sometimes.
With the later code, semi_rhs_expr is indeed free'able. It wasn't so when I
wrote the patches. Thanks for updating the comment. I wish we would have
freeObject(). Nothing that can be done for this patch.
Attached patches
Squashed your 0001 and 0002 into 0001. I have also reworded the commit message to reflect the latest code changes.
0002 has some minor edits. Please feel free to include or reject when committing the patch.
Best Wishes,
Ashutosh Bapat
Attachment
On Fri, Mar 22, 2024 at 7:48 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > With the later code, semi_rhs_expr is indeed free'able. It wasn't so when I > wrote the patches. Thanks for updating the comment. I wish we would have > freeObject(). Nothing that can be done for this patch. Yes. > Attached patches > Squashed your 0001 and 0002 into 0001. I have also reworded the commit message to reflect the latest code changes. > 0002 has some minor edits. Please feel free to include or reject when committing the patch. I've pushed this now. When updating the commit message, I realized that you had made the right call to divide the the changes around not translating the dummy SpecialJoinInfos into a separate patch. So, I pushed it like that. Thanks for working on this. -- Thanks, Amit Langote
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Mon, Mar 25, 2024 at 2:47 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Attached patches
> Squashed your 0001 and 0002 into 0001. I have also reworded the commit message to reflect the latest code changes.
> 0002 has some minor edits. Please feel free to include or reject when committing the patch.
I've pushed this now.
Thanks a lot.
When updating the commit message, I realized that you had made the
right call to divide the the changes around not translating the dummy
SpecialJoinInfos into a separate patch. So, I pushed it like that.
Ok. Thanks for separating the changes.
--
Best Wishes,
Ashutosh Bapat
On Mon, Mar 25, 2024 at 5:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
I've pushed this now.
When updating the commit message, I realized that you had made the
right call to divide the the changes around not translating the dummy
SpecialJoinInfos into a separate patch. So, I pushed it like that.
Sorry I'm late for the party. I noticed that in commit 6190d828cd the
comment of init_dummy_sjinfo() mentions the non-existing 'rel1' and
'rel2', which should be addressed. Also, I think we can make function
consider_new_or_clause() call init_dummy_sjinfo() to simplify the code a
bit.
Attached is a trivial patch for the fixes.
Thanks
Richard
comment of init_dummy_sjinfo() mentions the non-existing 'rel1' and
'rel2', which should be addressed. Also, I think we can make function
consider_new_or_clause() call init_dummy_sjinfo() to simplify the code a
bit.
Attached is a trivial patch for the fixes.
Thanks
Richard
Attachment
On Mon, Mar 25, 2024 at 7:25 PM Richard Guo <guofenglinux@gmail.com> wrote: > On Mon, Mar 25, 2024 at 5:17 PM Amit Langote <amitlangote09@gmail.com> wrote: >> I've pushed this now. >> >> When updating the commit message, I realized that you had made the >> right call to divide the the changes around not translating the dummy >> SpecialJoinInfos into a separate patch. So, I pushed it like that. > > > Sorry I'm late for the party. I noticed that in commit 6190d828cd the > comment of init_dummy_sjinfo() mentions the non-existing 'rel1' and > 'rel2', which should be addressed. Thanks for catching that. Oops. > Also, I think we can make function > consider_new_or_clause() call init_dummy_sjinfo() to simplify the code a > bit. Indeed. > Attached is a trivial patch for the fixes. Will apply shortly. -- Thanks, Amit Langote
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
From
Ashutosh Bapat
Date:
On Mon, Mar 25, 2024 at 4:01 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Mar 25, 2024 at 7:25 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Mar 25, 2024 at 5:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> I've pushed this now.
>>
>> When updating the commit message, I realized that you had made the
>> right call to divide the the changes around not translating the dummy
>> SpecialJoinInfos into a separate patch. So, I pushed it like that.
>
>
> Sorry I'm late for the party. I noticed that in commit 6190d828cd the
> comment of init_dummy_sjinfo() mentions the non-existing 'rel1' and
> 'rel2', which should be addressed.
Thanks for catching that. Oops.
Good catch.
> Also, I think we can make function
> consider_new_or_clause() call init_dummy_sjinfo() to simplify the code a
> bit.
Indeed.
Best Wishes,
Ashutosh Bapat