Thread: Memory consumed by child SpecialJoinInfo in partitionwise join planning

Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
Hi All,

In try_partitionwise_join() we create SpecialJoinInfo structures for
child joins by translating SpecialJoinInfo structures applicable to
the parent join. These SpecialJoinInfos are not used outside
try_partitionwise_join() but we do not free memory allocated to those.
In try_partitionwise_join() we create as many SpecialJoinInfos as the
number of partitions. But try_partitionwise_join() itself is called as
many times as the number of join orders for a given join. Thus the
number of child SpecialJoinInfos that are created increases
exponentially with the number of partitioned tables being joined.

The attached patch (0002) fixes this issue by
1. declaring child SpecialJoinInfo as a local variable, thus
allocating memory on the stack instead of CurrentMemoryContext. The
memory on the stack is freed automatically.
2. Freeing the Relids in SpecialJoinInfo explicitly after
SpecialJoinInfo has been used.

We can not free the object trees in SpecialJoinInfo since those may be
referenced in the paths.

Similar to my previous emails [1], the memory consumption for given
queries is measured using attached patch (0001). The table definitions
and helper functions can be found in setup.sql and queries.sql.
Following table shows the reduction in the memory using the attached
patch (0002).

Number of      | without     |             |         | Absolute    |
joining tables | patch       | with patch  | % diff  | diff        |
--------------------------------------------------------------------
             2 |    40.9 MiB |    39.9 MiB |   2.27% |   925.7 KiB |
             3 |   151.7 MiB |   142.6 MiB |   5.97% |     9.0 MiB |
             4 |   464.0 MiB |   418.4 MiB |   9.83% |    45.6 MiB |
             5 |  1663.9 MiB |  1419.4 MiB |  14.69% |   244.5 MiB |

Since the memory consumed by SpecialJoinInfos is exponentially
proportional to the number of tables being joined, we see that the
patch is able to save more memory at higher number of tables joined.

The attached patch (0002) frees members of individual
SpecialJoinInfos. It might have some impact on the planning time. We
may improve that by allocating the members in a separate memory
context. The memory context will be allocated and deleted in each
invocation of try_partitionwise_join(). I am assuming that deleting a
memory context is more efficient than freeing bits of memory multiple
times. But I have not tried that approach.

Another approach would be to track the SpecialJoinInfo translations
(similar to RestrictListInfo translations proposed in other email
thread [2]) and avoid repeatedly translating the same SpecialJoinInfo.
The approach will have similar disadvantages that it may increase size
of SpecialJoinInfo structure even when planning the queries which do
not need partitionwise join. So I haven't tried that approach yet.

At this stage, I am looking for opinions on
1. whether it's worth reducing this memory
2. any suggestions/comments on which approach from above is more
suitable or if there are other approaches

References
[1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAExHW5s=bCLMMq8n_bN6iU+Pjau0DS3z_6Dn6iLE69ESmsPMJQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachment

On Thu, Jul 27, 2023 at 10:10 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
The attached patch (0002) fixes this issue by
1. declaring child SpecialJoinInfo as a local variable, thus
allocating memory on the stack instead of CurrentMemoryContext. The
memory on the stack is freed automatically.
2. Freeing the Relids in SpecialJoinInfo explicitly after
SpecialJoinInfo has been used.

It doesn't seem too expensive to translate SpecialJoinInfos, so I think
it's OK to construct and free the SpecialJoinInfo for a child join on
the fly.  So the approach in 0002 looks reasonable to me.  But there is
something that needs to be fixed in 0002.

+   bms_free(child_sjinfo->commute_above_l);
+   bms_free(child_sjinfo->commute_above_r);
+   bms_free(child_sjinfo->commute_below_l);
+   bms_free(child_sjinfo->commute_below_r);

These four members in SpecialJoinInfo only contain outer join relids.
They do not need to be translated.  So they would reference the same
memory areas in child_sjinfo as in parent_sjinfo.  We should not free
them, otherwise they would become dangling pointers in parent sjinfo.

Thanks
Richard 

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
Hi Richard,

On Fri, Jul 28, 2023 at 2:03 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> It doesn't seem too expensive to translate SpecialJoinInfos, so I think
> it's OK to construct and free the SpecialJoinInfo for a child join on
> the fly.  So the approach in 0002 looks reasonable to me.  But there is
> something that needs to be fixed in 0002.

Thanks for the review. I am fine if going ahead with 0002 is enough.

>
> +   bms_free(child_sjinfo->commute_above_l);
> +   bms_free(child_sjinfo->commute_above_r);
> +   bms_free(child_sjinfo->commute_below_l);
> +   bms_free(child_sjinfo->commute_below_r);
>
> These four members in SpecialJoinInfo only contain outer join relids.
> They do not need to be translated.  So they would reference the same
> memory areas in child_sjinfo as in parent_sjinfo.  We should not free
> them, otherwise they would become dangling pointers in parent sjinfo.
>

Good catch. Saved my time. I would have caught this rather hard way
when running regression. Thanks a lot.

I think we should 1. add an assert to make sure that commute_above_*
do not require any transations i.e. they do not contain any parent
relids of the child rels. 2. Do not free those. 3. Add a comment about
keeping the build and free functions in sync. I will work on those
next.

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:


On Fri, Jul 28, 2023 at 7:28 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>
> +   bms_free(child_sjinfo->commute_above_l);
> +   bms_free(child_sjinfo->commute_above_r);
> +   bms_free(child_sjinfo->commute_below_l);
> +   bms_free(child_sjinfo->commute_below_r);
>
> These four members in SpecialJoinInfo only contain outer join relids.
> They do not need to be translated.  So they would reference the same
> memory areas in child_sjinfo as in parent_sjinfo.  We should not free
> them, otherwise they would become dangling pointers in parent sjinfo.
>


Attached patchset fixing those.
0001 - patch to report planning memory, with to explain.out regression output fix. We may consider committing this as well.
0002 - with your comment addressed above.
 

I think we should 1. add an assert to make sure that commute_above_*
do not require any transations i.e. they do not contain any parent
relids of the child rels.

Looking at the definitions of commute_above and commute_below relids and OJ relids, I don't think these assertions are necessary. So didn't add those.
 
2. Do not free those.

Done
 
3. Add a comment about
keeping the build and free functions in sync.

Done.

--
Best Wishes,
Ashutosh Bapat
Attachment

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
On Fri, Aug 4, 2023 at 2:11 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Attached patchset fixing those.
> 0001 - patch to report planning memory, with to explain.out regression output fix. We may consider committing this as
well.
> 0002 - with your comment addressed above.

0003 - Added this patch for handling SpecialJoinInfos for inner joins.
These SpecialJoinInfos are created on the fly for parent joins. They
can be created on the fly for child joins as well without requiring
any translations. Thus they do not require any new memory. This patch
is intended to be merged into 0002 finally.

Added to the upcoming commitfest https://commitfest.postgresql.org/44/4500/


--
Best Wishes,
Ashutosh Bapat

Attachment

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Amit Langote
Date:
Hi Ashutosh,

On Wed, Aug 16, 2023 at 2:28 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Fri, Aug 4, 2023 at 2:11 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > Attached patchset fixing those.
> > 0001 - patch to report planning memory, with to explain.out regression output fix. We may consider committing this
aswell. 
> > 0002 - with your comment addressed above.
>
> 0003 - Added this patch for handling SpecialJoinInfos for inner joins.
> These SpecialJoinInfos are created on the fly for parent joins. They
> can be created on the fly for child joins as well without requiring
> any translations. Thus they do not require any new memory. This patch
> is intended to be merged into 0002 finally.

I read this thread and have been reading the latest patch.  At first
glance, it seems quite straightforward to me.  I agree with Richard
that pfree()'ing 4 bitmapsets may not be a lot of added overhead.  I
will study the patch a bit more.

Just one comment on 0003:

+   /*
+    * Dummy SpecialJoinInfos do not have any translated fields and hence have
+    * nothing to free.
+    */
+   if (child_sjinfo->jointype == JOIN_INNER)
+       return;

Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)?

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
On Wed, Sep 20, 2023 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> I read this thread and have been reading the latest patch.  At first
> glance, it seems quite straightforward to me.  I agree with Richard
> that pfree()'ing 4 bitmapsets may not be a lot of added overhead.  I
> will study the patch a bit more.

Thanks.

>
> Just one comment on 0003:
>
> +   /*
> +    * Dummy SpecialJoinInfos do not have any translated fields and hence have
> +    * nothing to free.
> +    */
> +   if (child_sjinfo->jointype == JOIN_INNER)
> +       return;
>
> Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)?

try_partitionwise_join() calls free_child_sjinfo_members()
unconditionally i.e. it will be called even SpecialJoinInfos of INNER
join. Above condition filters those out early. In fact if we convert
into an Assert, in a production build the rest of the code will free
Relids which are referenced somewhere else causing dangling pointers.

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Amit Langote
Date:
On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Wed, Sep 20, 2023 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Just one comment on 0003:
> >
> > +   /*
> > +    * Dummy SpecialJoinInfos do not have any translated fields and hence have
> > +    * nothing to free.
> > +    */
> > +   if (child_sjinfo->jointype == JOIN_INNER)
> > +       return;
> >
> > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)?
>
> try_partitionwise_join() calls free_child_sjinfo_members()
> unconditionally i.e. it will be called even SpecialJoinInfos of INNER
> join. Above condition filters those out early. In fact if we convert
> into an Assert, in a production build the rest of the code will free
> Relids which are referenced somewhere else causing dangling pointers.

You're right.  I hadn't realized that the parent SpecialJoinInfo
passed to try_partitionwise_join() might itself be a dummy.  I guess I
should've read the whole thing before commenting.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
On Thu, Sep 21, 2023 at 6:37 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > On Wed, Sep 20, 2023 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Just one comment on 0003:
> > >
> > > +   /*
> > > +    * Dummy SpecialJoinInfos do not have any translated fields and hence have
> > > +    * nothing to free.
> > > +    */
> > > +   if (child_sjinfo->jointype == JOIN_INNER)
> > > +       return;
> > >
> > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)?
> >
> > try_partitionwise_join() calls free_child_sjinfo_members()
> > unconditionally i.e. it will be called even SpecialJoinInfos of INNER
> > join. Above condition filters those out early. In fact if we convert
> > into an Assert, in a production build the rest of the code will free
> > Relids which are referenced somewhere else causing dangling pointers.
>
> You're right.  I hadn't realized that the parent SpecialJoinInfo
> passed to try_partitionwise_join() might itself be a dummy.  I guess I
> should've read the whole thing before commenting.

Maybe there's something to improve in terms of readability, I don't know.

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Amit Langote
Date:
Hi Ashutosh,

On Thu, Sep 21, 2023 at 1:20 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Thu, Sep 21, 2023 at 6:37 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > > On Wed, Sep 20, 2023 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > Just one comment on 0003:
> > > >
> > > > +   /*
> > > > +    * Dummy SpecialJoinInfos do not have any translated fields and hence have
> > > > +    * nothing to free.
> > > > +    */
> > > > +   if (child_sjinfo->jointype == JOIN_INNER)
> > > > +       return;
> > > >
> > > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)?
> > >
> > > try_partitionwise_join() calls free_child_sjinfo_members()
> > > unconditionally i.e. it will be called even SpecialJoinInfos of INNER
> > > join. Above condition filters those out early. In fact if we convert
> > > into an Assert, in a production build the rest of the code will free
> > > Relids which are referenced somewhere else causing dangling pointers.
> >
> > You're right.  I hadn't realized that the parent SpecialJoinInfo
> > passed to try_partitionwise_join() might itself be a dummy.  I guess I
> > should've read the whole thing before commenting.
>
> Maybe there's something to improve in terms of readability, I don't know.

Here are some comments.

Please merge 0003 into 0002.

+   /*
+    * But the list of operator OIDs and the list of expressions may be
+    * referenced somewhere else. Do not free those.
+    */

I don't see build_child_join_sjinfo() copying any list of OIDs, so I'm
not sure what the comment is referring to.  Also, instead of "the list
of expressions", it might be more useful to mention semi_rhs_expr,
because that's the only one being copied.

The comment for SpecialJoinInfo mentions that they are stored in
PlannerInfo.join_info_list, but the child SpecialJoinInfos used by
partitionwise joins are not, which I noticed has not been mentioned
anywhere.  Should we make a note of that in the SpecialJoinInfo's
comment?

Just out of curiosity, is their not being present in join_info_list
problematic in some manner, such as missed optimization opportunities
for child joins?  I noticed there is a loop over join_info_list in
add_paths_to_join_rel(), which does get called for child joinrels.  I
know this a bit off-topic for the thread, but thought to ask in case
you've already thought about it.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
On Wed, Sep 27, 2023 at 2:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Here are some comments.

Thanks for your review.

>
> Please merge 0003 into 0002.

Done.

>
> +   /*
> +    * But the list of operator OIDs and the list of expressions may be
> +    * referenced somewhere else. Do not free those.
> +    */
>
> I don't see build_child_join_sjinfo() copying any list of OIDs, so I'm
> not sure what the comment is referring to.  Also, instead of "the list
> of expressions", it might be more useful to mention semi_rhs_expr,
> because that's the only one being copied.

list of OID is semi_operators. It's copied as is from parent
SpecialJoinInfo. But the way it's mentioned in the comment is
confusing. I have modified the prologue of function to provide a
general guidance on what can be freed and what should not be. and then
specific examples. Let me know if this is more clear.

>
> The comment for SpecialJoinInfo mentions that they are stored in
> PlannerInfo.join_info_list, but the child SpecialJoinInfos used by
> partitionwise joins are not, which I noticed has not been mentioned
> anywhere.  Should we make a note of that in the SpecialJoinInfo's
> comment?

Good point. Since SpecialJoinInfos for JOIN_INNER are mentioned there,
it makes sense to mention transient child SpecialJoinInfos as well.
Done.

>
> Just out of curiosity, is their not being present in join_info_list
> problematic in some manner, such as missed optimization opportunities
> for child joins?  I noticed there is a loop over join_info_list in
> add_paths_to_join_rel(), which does get called for child joinrels.  I
> know this a bit off-topic for the thread, but thought to ask in case
> you've already thought about it.

The function has a comment and code to take care of this at the very beginning
/*
* PlannerInfo doesn't contain the SpecialJoinInfos created for joins
* between child relations, even if there is a SpecialJoinInfo node for
* the join between the topmost parents. So, while calculating Relids set
* representing the restriction, consider relids of topmost parent of
* partitions.
*/
if (joinrel->reloptkind == RELOPT_OTHER_JOINREL)
joinrelids = joinrel->top_parent_relids;
else
joinrelids = joinrel->relids;

PFA latest patch set
0001 - same as previous patch set
0002 - 0002 and 0003 from previous patch set squashed together
0003 - changes addressing above comments.

--
Best Wishes,
Ashutosh Bapat

Attachment

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Amit Langote
Date:
On Wed, Sep 27, 2023 at 8:07 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Wed, Sep 27, 2023 at 2:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Just out of curiosity, is their not being present in join_info_list
> > problematic in some manner, such as missed optimization opportunities
> > for child joins?  I noticed there is a loop over join_info_list in
> > add_paths_to_join_rel(), which does get called for child joinrels.  I
> > know this a bit off-topic for the thread, but thought to ask in case
> > you've already thought about it.
>
> The function has a comment and code to take care of this at the very beginning
> /*
> * PlannerInfo doesn't contain the SpecialJoinInfos created for joins
> * between child relations, even if there is a SpecialJoinInfo node for
> * the join between the topmost parents. So, while calculating Relids set
> * representing the restriction, consider relids of topmost parent of
> * partitions.
> */
> if (joinrel->reloptkind == RELOPT_OTHER_JOINREL)
> joinrelids = joinrel->top_parent_relids;
> else
> joinrelids = joinrel->relids;

Ah, that's accounted for.  Thanks.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Amit Langote
Date:
On Wed, Sep 27, 2023 at 8:07 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Wed, Sep 27, 2023 at 2:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > +   /*
> > +    * But the list of operator OIDs and the list of expressions may be
> > +    * referenced somewhere else. Do not free those.
> > +    */
> >
> > I don't see build_child_join_sjinfo() copying any list of OIDs, so I'm
> > not sure what the comment is referring to.  Also, instead of "the list
> > of expressions", it might be more useful to mention semi_rhs_expr,
> > because that's the only one being copied.
>
> list of OID is semi_operators. It's copied as is from parent
> SpecialJoinInfo. But the way it's mentioned in the comment is
> confusing. I have modified the prologue of function to provide a
> general guidance on what can be freed and what should not be. and then
> specific examples. Let me know if this is more clear.

Thanks.  I would've kept the notes about specific members inside the
function.  Also, no need to have a note for pointers that are not
deep-copied to begin with, such as semi_operators.  IOW, something
like the following would have sufficed:

@@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root,
SpecialJoinInfo *parent_sjinfo,
 /*
  * free_child_sjinfo_members
  *     Free memory consumed by members of a child SpecialJoinInfo.
+ *
+ * Only members that are translated copies of their counterpart in the parent
+ * SpecialJoinInfo are freed here.  However, members that could be referenced
+ * elsewhere are not freed.
  */
 static void
 free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
@@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
    bms_free(child_sjinfo->syn_lefthand);
    bms_free(child_sjinfo->syn_righthand);

-   /*
-    * But the list of operator OIDs and the list of expressions may be
-    * referenced somewhere else. Do not free those.
-    */
+   /* semi_rhs_exprs may be referenced, so don't free. */
 }

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
On Fri, Sep 29, 2023 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
> IOW, something
> like the following would have sufficed:
>
> @@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root,
> SpecialJoinInfo *parent_sjinfo,
>  /*
>   * free_child_sjinfo_members
>   *     Free memory consumed by members of a child SpecialJoinInfo.
> + *
> + * Only members that are translated copies of their counterpart in the parent
> + * SpecialJoinInfo are freed here.  However, members that could be referenced
> + * elsewhere are not freed.
>   */
>  static void
>  free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
> @@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
>     bms_free(child_sjinfo->syn_lefthand);
>     bms_free(child_sjinfo->syn_righthand);
>
> -   /*
> -    * But the list of operator OIDs and the list of expressions may be
> -    * referenced somewhere else. Do not free those.
> -    */
> +   /* semi_rhs_exprs may be referenced, so don't free. */
>  }

Works for me. PFA patchset with these changes. I have still left the
changes addressing your comments as a separate patch for easier
review.

--
Best Wishes,
Ashutosh Bapat

Attachment

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
Rebased patchset. No changes in actual patches. 0001 is squashed
version of patches submitted at
https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com.

On Tue, Oct 3, 2023 at 6:42 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, Sep 29, 2023 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > IOW, something
> > like the following would have sufficed:
> >
> > @@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root,
> > SpecialJoinInfo *parent_sjinfo,
> >  /*
> >   * free_child_sjinfo_members
> >   *     Free memory consumed by members of a child SpecialJoinInfo.
> > + *
> > + * Only members that are translated copies of their counterpart in the parent
> > + * SpecialJoinInfo are freed here.  However, members that could be referenced
> > + * elsewhere are not freed.
> >   */
> >  static void
> >  free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
> > @@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
> >     bms_free(child_sjinfo->syn_lefthand);
> >     bms_free(child_sjinfo->syn_righthand);
> >
> > -   /*
> > -    * But the list of operator OIDs and the list of expressions may be
> > -    * referenced somewhere else. Do not free those.
> > -    */
> > +   /* semi_rhs_exprs may be referenced, so don't free. */
> >  }
>
> Works for me. PFA patchset with these changes. I have still left the
> changes addressing your comments as a separate patch for easier
> review.
>
> --
> Best Wishes,
> Ashutosh Bapat



--
Best Wishes,
Ashutosh Bapat



On Wed, 4 Oct 2023 at 04:02, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, Sep 29, 2023 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > IOW, something
> > like the following would have sufficed:
> >
> > @@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root,
> > SpecialJoinInfo *parent_sjinfo,
> >  /*
> >   * free_child_sjinfo_members
> >   *     Free memory consumed by members of a child SpecialJoinInfo.
> > + *
> > + * Only members that are translated copies of their counterpart in the parent
> > + * SpecialJoinInfo are freed here.  However, members that could be referenced
> > + * elsewhere are not freed.
> >   */
> >  static void
> >  free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
> > @@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
> >     bms_free(child_sjinfo->syn_lefthand);
> >     bms_free(child_sjinfo->syn_righthand);
> >
> > -   /*
> > -    * But the list of operator OIDs and the list of expressions may be
> > -    * referenced somewhere else. Do not free those.
> > -    */
> > +   /* semi_rhs_exprs may be referenced, so don't free. */
> >  }
>
> Works for me. PFA patchset with these changes. I have still left the
> changes addressing your comments as a separate patch for easier
> review.

CFBot shows that the patch does not apply anymore as in [1]:

=== Applying patches on top of PostgreSQL commit ID
55627ba2d334ce98e1f5916354c46472d414bda6 ===
=== applying patch
./0001-Report-memory-used-for-planning-a-query-in--20231003.patch
...
Hunk #1 succeeded at 89 (offset -3 lines).
patching file src/test/regress/expected/explain.out
Hunk #5 FAILED at 285.
Hunk #6 succeeded at 540 (offset 4 lines).
1 out of 6 hunks FAILED -- saving rejects to file
src/test/regress/expected/explain.out.rej

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4500.log

Regards,
Vignesh



On Fri, Jan 26, 2024 at 8:42 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 4 Oct 2023 at 04:02, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Fri, Sep 29, 2023 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > IOW, something
> > > like the following would have sufficed:
.. snip...
> >
> > Works for me. PFA patchset with these changes. I have still left the
> > changes addressing your comments as a separate patch for easier
> > review.
>
> CFBot shows that the patch does not apply anymore as in [1]:
>
> === Applying patches on top of PostgreSQL commit ID
> 55627ba2d334ce98e1f5916354c46472d414bda6 ===
> === applying patch
> ./0001-Report-memory-used-for-planning-a-query-in--20231003.patch
> ...
> Hunk #1 succeeded at 89 (offset -3 lines).
> patching file src/test/regress/expected/explain.out
> Hunk #5 FAILED at 285.
> Hunk #6 succeeded at 540 (offset 4 lines).
> 1 out of 6 hunks FAILED -- saving rejects to file
> src/test/regress/expected/explain.out.rej
>
> Please post an updated version for the same.
>
> [1] - http://cfbot.cputube.org/patch_46_4500.log

Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
addressing Amit's comments is still a separate patch for him to
review.

--
Best Wishes,
Ashutosh Bapat

Attachment

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Andrei Lepikhov
Date:
On 30/1/2024 12:44, Ashutosh Bapat wrote:
> Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
> addressing Amit's comments is still a separate patch for him to
> review.
Thanks for this improvement. Working with partitions, I frequently see 
peaks of memory consumption during planning. So, maybe one more case can 
be resolved here.
Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do 
we really need it as a separate routine? It might be better to inline 
this code.
Patch 0002 adds valuable comments, and I'm OK with that.

Also, as I remember, some extensions, such as pg_hint_plan, call 
build_child_join_sjinfo. It is OK to break the interface with a major 
version. But what if they need child_sjinfo a bit longer and collect 
links to this structure? I don't think it is a real stopper, but it is 
worth additional analysis.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
On Wed, Feb 14, 2024 at 9:50 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
>
> On 30/1/2024 12:44, Ashutosh Bapat wrote:
> > Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
> > addressing Amit's comments is still a separate patch for him to
> > review.
> Thanks for this improvement. Working with partitions, I frequently see
> peaks of memory consumption during planning. So, maybe one more case can
> be resolved here.
> Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do
> we really need it as a separate routine? It might be better to inline
> this code.

try_partitionwise_join() is already 200 lines long. A separate
function is better than adding more lines to try_partitionwise_join().
Also if someone wants to call build_child_join_sjinfo() outside
try_partitionwise_join() may find free_child_sjinfo_members() handy.

> Patch 0002 adds valuable comments, and I'm OK with that.
>
> Also, as I remember, some extensions, such as pg_hint_plan, call
> build_child_join_sjinfo. It is OK to break the interface with a major
> version. But what if they need child_sjinfo a bit longer and collect
> links to this structure? I don't think it is a real stopper, but it is
> worth additional analysis.

If these extensions call build_child_join_sjinfo() and do not call
free_child_sjinfo_members, they can keep their child sjinfo as long as
they want. I didn't understand your concern.

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Andrei Lepikhov
Date:
On 14/2/2024 13:32, Ashutosh Bapat wrote:
> On Wed, Feb 14, 2024 at 9:50 AM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>>
>> On 30/1/2024 12:44, Ashutosh Bapat wrote:
>>> Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
>>> addressing Amit's comments is still a separate patch for him to
>>> review.
>> Thanks for this improvement. Working with partitions, I frequently see
>> peaks of memory consumption during planning. So, maybe one more case can
>> be resolved here.
>> Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do
>> we really need it as a separate routine? It might be better to inline
>> this code.
> 
> try_partitionwise_join() is already 200 lines long. A separate
> function is better than adding more lines to try_partitionwise_join().
> Also if someone wants to call build_child_join_sjinfo() outside
> try_partitionwise_join() may find free_child_sjinfo_members() handy.
Make sense, thanks.
>> Patch 0002 adds valuable comments, and I'm OK with that.
>>
>> Also, as I remember, some extensions, such as pg_hint_plan, call
>> build_child_join_sjinfo. It is OK to break the interface with a major
>> version. But what if they need child_sjinfo a bit longer and collect
>> links to this structure? I don't think it is a real stopper, but it is
>> worth additional analysis.
> 
> If these extensions call build_child_join_sjinfo() and do not call
> free_child_sjinfo_members, they can keep their child sjinfo as long as
> they want. I didn't understand your concern.
Nothing special. This patch makes external code responsible for 
allocation of the structure and it adds more paths to do something 
wrong. Current code looks good.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Hi,

I took a quick look at this patch today. I certainly agree with the
intent to reduce the amount of memory during planning, assuming it's not
overly disruptive. And I think this patch is fairly localized and looks
sensible.

That being said I'm a big fan of using a local variable on stack and
filling it. I'd probably go with the usual palloc/pfree, because that
makes it much easier to use - the callers would not be responsible for
allocating the SpecialJoinInfo struct. Sure, it's a little bit of
overhead, but with the AllocSet caching I doubt it's measurable.

I did put this through check-world on amd64/arm64, with valgrind,
without any issue. I also tried the scripts shared by Ashutosh in his
initial message (with some minor fixes, adding MEMORY to explain etc).

The results with the 20240130 patches are like this:

   tables    master    patched
  -----------------------------
        2      40.8       39.9
        3     151.7      142.6
        4     464.0      418.5
        5    1663.9     1419.5

That's certainly a nice improvement, and it even reduces the amount of
time for planning (the 5-table join goes from 18s to 17s on my laptop).
That's nice, although 17 seconds for planning is not ... great.

That being said, the amount of remaining memory needed by planning is
still pretty high - we save ~240MB for a join of 5 tables, but we still
need ~1.4GB. Yes, this is a bit extreme example, and it probably is not
very common to join 5 tables with 1000 partitions each ...

Do we know what are the other places consuming the 1.4GB of memory?
Considering my recent thread about scalability, where malloc() turned
out to be one of the main culprits, I wonder if maybe there's a lot to
gain by reducing the memory usage ... Our attitude to memory usage is
that it doesn't really matter if we keep it allocated for a bit, because
we'll free it shortly. And that may be true for "modest" memory usage,
but with 1.4GB that doesn't seem great, and the malloc overhead can be
pretty bad.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

From
Ashutosh Bapat
Date:
On Sun, Feb 18, 2024 at 10:55 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> I took a quick look at this patch today. I certainly agree with the
> intent to reduce the amount of memory during planning, assuming it's not
> overly disruptive. And I think this patch is fairly localized and looks
> sensible.

Thanks for looking at the patch-set.

>
> That being said I'm a big fan of using a local variable on stack and
> filling it. I'd probably go with the usual palloc/pfree, because that
> makes it much easier to use - the callers would not be responsible for
> allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> overhead, but with the AllocSet caching I doubt it's measurable.

You are suggesting that instead of declaring a local variable of type
SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
SpecialJoinInfo which will be freed by free_child_sjinfo()
(free_child_sjinfo_members in the patch). I am fine with that.

>
> I did put this through check-world on amd64/arm64, with valgrind,
> without any issue. I also tried the scripts shared by Ashutosh in his
> initial message (with some minor fixes, adding MEMORY to explain etc).
>
> The results with the 20240130 patches are like this:
>
>    tables    master    patched
>   -----------------------------
>         2      40.8       39.9
>         3     151.7      142.6
>         4     464.0      418.5
>         5    1663.9     1419.5
>
> That's certainly a nice improvement, and it even reduces the amount of
> time for planning (the 5-table join goes from 18s to 17s on my laptop).
> That's nice, although 17 seconds for planning is not ... great.
>
> That being said, the amount of remaining memory needed by planning is
> still pretty high - we save ~240MB for a join of 5 tables, but we still
> need ~1.4GB. Yes, this is a bit extreme example, and it probably is not
> very common to join 5 tables with 1000 partitions each ...
>

Yes.

> Do we know what are the other places consuming the 1.4GB of memory?

Please find the breakdown at [1]. Investigating further, I found that
memory consumed by Bitmapsets is not explicitly visible through that
breakdown. But Bitmapsets consume a lot of memory in this case. I
shared google slides with raw numbers in [2]. The Gslides can be found
at https://docs.google.com/presentation/d/1S9BiAADhX-Fv9tDbx5R5Izq4blAofhZMhHcO1c-wzfI/edit#slide=id.p.
According to that measurement, Bitmapset objects created while
planning a 5-way join between partitioned tables with 1000 partitions
each consumed 769.795 MiB as compared to 19.728 KiB consumed by
Bitmapsets when planning a 5-way non-partitioned join. See slide 3 for
detailed numbers.

> Considering my recent thread about scalability, where malloc() turned
> out to be one of the main culprits, I wonder if maybe there's a lot to
> gain by reducing the memory usage ... Our attitude to memory usage is
> that it doesn't really matter if we keep it allocated for a bit, because
> we'll free it shortly. And that may be true for "modest" memory usage,
> but with 1.4GB that doesn't seem great, and the malloc overhead can be
> pretty bad.
>

That probably explains why we see some modest speedups because of my
memory saving patches. Thanks for sharing it.

[1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAExHW5t0cPNjJRPRtt%3D%2Bc5SiTeBPHvx%3DSd2n8EO%2B7XdVuE8_YQ%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat



Hi Ashutosh,

On Mon, Feb 19, 2024 at 10:01 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Sun, Feb 18, 2024 at 10:55 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > Hi,
> >
> > I took a quick look at this patch today. I certainly agree with the
> > intent to reduce the amount of memory during planning, assuming it's not
> > overly disruptive. And I think this patch is fairly localized and looks
> > sensible.
>
> Thanks for looking at the patch-set.
> >
> > That being said I'm a big fan of using a local variable on stack and
> > filling it. I'd probably go with the usual palloc/pfree, because that
> > makes it much easier to use - the callers would not be responsible for
> > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > overhead, but with the AllocSet caching I doubt it's measurable.
>
> You are suggesting that instead of declaring a local variable of type
> SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> SpecialJoinInfo which will be freed by free_child_sjinfo()
> (free_child_sjinfo_members in the patch). I am fine with that.

Agree with Tomas about using makeNode() / pfree().  Having the pfree()
kind of makes it extra-evident that those SpecialJoinInfos are
transitory.

> > I did put this through check-world on amd64/arm64, with valgrind,
> > without any issue. I also tried the scripts shared by Ashutosh in his
> > initial message (with some minor fixes, adding MEMORY to explain etc).
> >
> > The results with the 20240130 patches are like this:
> >
> >    tables    master    patched
> >   -----------------------------
> >         2      40.8       39.9
> >         3     151.7      142.6
> >         4     464.0      418.5
> >         5    1663.9     1419.5

Could you please post the numbers with the palloc() / pfree() version?

--
Thanks, Amit Langote



Hi Amit,


On Fri, Mar 15, 2024 at 11:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > That being said I'm a big fan of using a local variable on stack and
> > filling it. I'd probably go with the usual palloc/pfree, because that
> > makes it much easier to use - the callers would not be responsible for
> > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > overhead, but with the AllocSet caching I doubt it's measurable.
>
> You are suggesting that instead of declaring a local variable of type
> SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> SpecialJoinInfo which will be freed by free_child_sjinfo()
> (free_child_sjinfo_members in the patch). I am fine with that.

Agree with Tomas about using makeNode() / pfree().  Having the pfree()
kind of makes it extra-evident that those SpecialJoinInfos are
transitory.

Attached patch-set

0001 - original patch as is
0002 - addresses your first set of comments
0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo structure.

I will squash both 0002 and 0003 into 0001 once you review those changes and are fine with those.
 

> > I did put this through check-world on amd64/arm64, with valgrind,
> > without any issue. I also tried the scripts shared by Ashutosh in his
> > initial message (with some minor fixes, adding MEMORY to explain etc).
> >
> > The results with the 20240130 patches are like this:
> >
> >    tables    master    patched
> >   -----------------------------
> >         2      40.8       39.9
> >         3     151.7      142.6
> >         4     464.0      418.5
> >         5    1663.9     1419.5

Could you please post the numbers with the palloc() / pfree() version?


Here are they
 tables | master  | patched
--------+---------+---------
      2 | 29 MB   | 28 MB
      3 | 102 MB  | 93 MB
      4 | 307 MB  | 263 MB
      5 | 1076 MB | 843 MB

The numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure memory that time is different from the one that we committed. So there's a slight difference in the way we measure memory as well.

--
Best Wishes,
Ashutosh Bapat
Attachment
On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Hi Amit,


On Fri, Mar 15, 2024 at 11:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > That being said I'm a big fan of using a local variable on stack and
> > filling it. I'd probably go with the usual palloc/pfree, because that
> > makes it much easier to use - the callers would not be responsible for
> > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > overhead, but with the AllocSet caching I doubt it's measurable.
>
> You are suggesting that instead of declaring a local variable of type
> SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> SpecialJoinInfo which will be freed by free_child_sjinfo()
> (free_child_sjinfo_members in the patch). I am fine with that.

Agree with Tomas about using makeNode() / pfree().  Having the pfree()
kind of makes it extra-evident that those SpecialJoinInfos are
transitory.

Attached patch-set

0001 - original patch as is
0002 - addresses your first set of comments
0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo structure.

I will squash both 0002 and 0003 into 0001 once you review those changes and are fine with those.

Thanks for the new patches.

> > I did put this through check-world on amd64/arm64, with valgrind,
> > without any issue. I also tried the scripts shared by Ashutosh in his
> > initial message (with some minor fixes, adding MEMORY to explain etc).
> >
> > The results with the 20240130 patches are like this:
> >
> >    tables    master    patched
> >   -----------------------------
> >         2      40.8       39.9
> >         3     151.7      142.6
> >         4     464.0      418.5
> >         5    1663.9     1419.5

Could you please post the numbers with the palloc() / pfree() version?


Here are they
 tables | master  | patched
--------+---------+---------
      2 | 29 MB   | 28 MB
      3 | 102 MB  | 93 MB
      4 | 307 MB  | 263 MB
      5 | 1076 MB | 843 MB

The numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure memory that time is different from the one that we committed. So there's a slight difference in the way we measure memory as well.

Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically, to see if the palloc / frees add noticeable overhead.


On Mon, Mar 18, 2024 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Hi Amit,


On Fri, Mar 15, 2024 at 11:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > That being said I'm a big fan of using a local variable on stack and
> > filling it. I'd probably go with the usual palloc/pfree, because that
> > makes it much easier to use - the callers would not be responsible for
> > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > overhead, but with the AllocSet caching I doubt it's measurable.
>
> You are suggesting that instead of declaring a local variable of type
> SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> SpecialJoinInfo which will be freed by free_child_sjinfo()
> (free_child_sjinfo_members in the patch). I am fine with that.

Agree with Tomas about using makeNode() / pfree().  Having the pfree()
kind of makes it extra-evident that those SpecialJoinInfos are
transitory.

Attached patch-set

0001 - original patch as is
0002 - addresses your first set of comments
0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo structure.

I will squash both 0002 and 0003 into 0001 once you review those changes and are fine with those.

Thanks for the new patches.

> > I did put this through check-world on amd64/arm64, with valgrind,
> > without any issue. I also tried the scripts shared by Ashutosh in his
> > initial message (with some minor fixes, adding MEMORY to explain etc).
> >
> > The results with the 20240130 patches are like this:
> >
> >    tables    master    patched
> >   -----------------------------
> >         2      40.8       39.9
> >         3     151.7      142.6
> >         4     464.0      418.5
> >         5    1663.9     1419.5

Could you please post the numbers with the palloc() / pfree() version?


Here are they
 tables | master  | patched
--------+---------+---------
      2 | 29 MB   | 28 MB
      3 | 102 MB  | 93 MB
      4 | 307 MB  | 263 MB
      5 | 1076 MB | 843 MB

The numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure memory that time is different from the one that we committed. So there's a slight difference in the way we measure memory as well.

Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically, to see if the palloc / frees add noticeable overhead.

No problem. Here you go

 tables |  master  | patched  | perc_change
--------+----------+----------+-------------
      2 |   477.87 |   492.32 |       -3.02
      3 |  1970.83 |  1989.52 |       -0.95
      4 |  6305.01 |  6268.81 |        0.57
      5 | 19261.56 | 18684.86 |        2.99

+ve change indicates reduced planning time. It seems that the planning time improves as the number of tables increases. But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If you want, I can repeat the experiment.

--
Best Wishes,
Ashutosh Bapat
On Mon, Mar 18, 2024 at 8:57 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Mon, Mar 18, 2024 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>>> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
>>>> Could you please post the numbers with the palloc() / pfree() version?
>>>
>>> Here are they
>>>  tables | master  | patched
>>> --------+---------+---------
>>>       2 | 29 MB   | 28 MB
>>>       3 | 102 MB  | 93 MB
>>>       4 | 307 MB  | 263 MB
>>>       5 | 1076 MB | 843 MB
>>>
>>> The numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure
memorythat time is different from the one that we committed. So there's a slight difference in the way we measure
memoryas well. 
>>
>>
>> Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically,
tosee if the palloc / frees add noticeable overhead. 
>
> No problem. Here you go
>
>  tables |  master  | patched  | perc_change
> --------+----------+----------+-------------
>       2 |   477.87 |   492.32 |       -3.02
>       3 |  1970.83 |  1989.52 |       -0.95
>       4 |  6305.01 |  6268.81 |        0.57
>       5 | 19261.56 | 18684.86 |        2.99
>
> +ve change indicates reduced planning time. It seems that the planning time improves as the number of tables
increases.But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If
youwant, I can repeat the experiment. 

Thanks.  I also wanted to see cpu time difference between the two
approaches of SpecialJoinInfo allocation -- on stack and on heap.  So
comparing times with the previous version of the patch using stack
allocation and the new one with palloc.  I'm hoping that the new
approach is only worse in the noise range.

--
Thanks, Amit Langote





On Mon, Mar 18, 2024 at 5:40 PM Amit Langote <amitlangote09@gmail.com> wrote:

>>
>> Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically, to see if the palloc / frees add noticeable overhead.
>
> No problem. Here you go
>
>  tables |  master  | patched  | perc_change
> --------+----------+----------+-------------
>       2 |   477.87 |   492.32 |       -3.02
>       3 |  1970.83 |  1989.52 |       -0.95
>       4 |  6305.01 |  6268.81 |        0.57
>       5 | 19261.56 | 18684.86 |        2.99
>
> +ve change indicates reduced planning time. It seems that the planning time improves as the number of tables increases. But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If you want, I can repeat the experiment.

Thanks.  I also wanted to see cpu time difference between the two
approaches of SpecialJoinInfo allocation -- on stack and on heap.  So
comparing times with the previous version of the patch using stack
allocation and the new one with palloc.  I'm hoping that the new
approach is only worse in the noise range.

Ah, sorry, I didn't read it carefully. Alvaro made me realise that I have been gathering numbers using assert enabled builds, so they are not that reliable. Here they are with non-assert enabled builds.

planning time (in ms) reported by EXPLAIN.
 tables |  master  | stack_alloc | perc_change_1 |  palloc  | perc_change_2 | total_perc_change
--------+----------+-------------+---------------+----------+---------------+-------------------
      2 |    338.1 |      333.92 |          1.24 |   332.16 |          0.53 |              1.76
      3 |  1507.93 |     1475.76 |          2.13 |  1472.79 |          0.20 |              2.33
      4 |  5099.45 |     4980.35 |          2.34 |   4947.3 |          0.66 |              2.98
      5 | 15442.64 |    15531.94 |         -0.58 | 15393.41 |          0.89 |              0.32

stack_alloc = timings with SpecialJoinInfo on stack
perc_change_1 = percentage change of above wrt master
palloc = timings with palloc'ed SpecialJoinInfo
perc_change_2 = percentage change of above wrt timings with stack_alloc
total_perc_change = percentage change between master and all patches

total change is within noise. Timing with palloc is better than that with SpecialJoinInfo on stack but only marginally. I don't think that means palloc based allocation is better or worse than stack based.

You requested CPU time difference between stack based SpecialJoinInfo and palloc'ed SpecialJoinInfo. Here it is
tables | stack_alloc |  palloc  | perc_change
--------+-------------+----------+-------------
      2 |    0.438204 | 0.438986 |       -0.18
      3 |    1.224672 | 1.238781 |       -1.15
      4 |    3.511317 | 3.663334 |       -4.33
      5 |    9.283856 | 9.340516 |       -0.61

Yes, there's a consistent degradation albeit within noise levels.

The memory measurements
 tables | master | with_patch
--------+--------+------------
      2 | 26 MB  | 26 MB
      3 | 93 MB  | 84 MB
      4 | 277 MB | 235 MB
      5 | 954 MB | 724 MB

The first row shows the same value because of rounding. The actual values there are 27740432 and 26854704 resp.

Please let me know if you need anything.

--
Best Wishes,
Ashutosh Bapat
Hi Ashutosh,

On Tue, Mar 19, 2024 at 12:47 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Mon, Mar 18, 2024 at 5:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> >>
>> >> Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches.
Specifically,to see if the palloc / frees add noticeable overhead. 
>> >
>> > No problem. Here you go
>> >
>> >  tables |  master  | patched  | perc_change
>> > --------+----------+----------+-------------
>> >       2 |   477.87 |   492.32 |       -3.02
>> >       3 |  1970.83 |  1989.52 |       -0.95
>> >       4 |  6305.01 |  6268.81 |        0.57
>> >       5 | 19261.56 | 18684.86 |        2.99
>> >
>> > +ve change indicates reduced planning time. It seems that the planning time improves as the number of tables
increases.But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If
youwant, I can repeat the experiment. 
>>
>> Thanks.  I also wanted to see cpu time difference between the two
>> approaches of SpecialJoinInfo allocation -- on stack and on heap.  So
>> comparing times with the previous version of the patch using stack
>> allocation and the new one with palloc.  I'm hoping that the new
>> approach is only worse in the noise range.
>
>
> Ah, sorry, I didn't read it carefully. Alvaro made me realise that I have been gathering numbers using assert enabled
builds,so they are not that reliable. Here they are with non-assert enabled builds. 
>
> planning time (in ms) reported by EXPLAIN.
>  tables |  master  | stack_alloc | perc_change_1 |  palloc  | perc_change_2 | total_perc_change
> --------+----------+-------------+---------------+----------+---------------+-------------------
>       2 |    338.1 |      333.92 |          1.24 |   332.16 |          0.53 |              1.76
>       3 |  1507.93 |     1475.76 |          2.13 |  1472.79 |          0.20 |              2.33
>       4 |  5099.45 |     4980.35 |          2.34 |   4947.3 |          0.66 |              2.98
>       5 | 15442.64 |    15531.94 |         -0.58 | 15393.41 |          0.89 |              0.32
>
> stack_alloc = timings with SpecialJoinInfo on stack
> perc_change_1 = percentage change of above wrt master
> palloc = timings with palloc'ed SpecialJoinInfo
> perc_change_2 = percentage change of above wrt timings with stack_alloc
> total_perc_change = percentage change between master and all patches
>
> total change is within noise. Timing with palloc is better than that with SpecialJoinInfo on stack but only
marginally.I don't think that means palloc based allocation is better or worse than stack based. 
>
> You requested CPU time difference between stack based SpecialJoinInfo and palloc'ed SpecialJoinInfo. Here it is
> tables | stack_alloc |  palloc  | perc_change
> --------+-------------+----------+-------------
>       2 |    0.438204 | 0.438986 |       -0.18
>       3 |    1.224672 | 1.238781 |       -1.15
>       4 |    3.511317 | 3.663334 |       -4.33
>       5 |    9.283856 | 9.340516 |       -0.61
>
> Yes, there's a consistent degradation albeit within noise levels.
>
> The memory measurements
>  tables | master | with_patch
> --------+--------+------------
>       2 | 26 MB  | 26 MB
>       3 | 93 MB  | 84 MB
>       4 | 277 MB | 235 MB
>       5 | 954 MB | 724 MB
>
> The first row shows the same value because of rounding. The actual values there are 27740432 and 26854704 resp.
>
> Please let me know if you need anything.

Thanks for repeating the benchmark.  So I don't see a problem with
keeping the existing palloc() way of allocating the
SpecialJoinInfos().  We're adding a few cycles by adding
free_child_join_sjinfo() (see my delta patch about the rename), but
the reduction in memory consumption due to it, which is our goal here,
far outweighs what looks like a minor CPU slowdown.

I have squashed 0002, 0003 into 0001, done some changes of my own, and
attached the delta patch as 0002.  I've listed the changes in the
commit message.  Let me know if anything seems amiss.

I'd like to commit 0001 + 0002 + any other changes based on your
comments (if any) sometime early next week.


--
Thanks, Amit Langote

Attachment


On Fri, Mar 22, 2024 at 2:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Please let me know if you need anything.

Thanks for repeating the benchmark.  So I don't see a problem with
keeping the existing palloc() way of allocating the
SpecialJoinInfos().  We're adding a few cycles by adding
free_child_join_sjinfo() (see my delta patch about the rename), but
the reduction in memory consumption due to it, which is our goal here,
far outweighs what looks like a minor CPU slowdown.

I have squashed 0002, 0003 into 0001, done some changes of my own, and
attached the delta patch as 0002.  I've listed the changes in the
commit message.  Let me know if anything seems amiss.

Thanks Amit for your changes.

> * Revert (IMO) unnecessary modifications of build_child_join_sjinfo().
> For example, hunks to rename sjinfo to child_sjinfo seem unnecessary,
> because it's clear from the context that they are "child" sjinfos.
>

Ok.

> * Rename free_child_sjinfo_members() to free_child_join_sjinfo() to
> be symmetric with build_child_join_sjinfo().  Note that the function
> is charged with freeing the sjinfo itself too.  Like
> build_child_join_sjinfo(), name the argument sjinfo, not
> child_sjinfo.

Yes. It should have been done in my patch.

>
> * Don't set the child sjinfo pointer to NULL after freeing it.  While
> a good convention in general, it's not really common in the planner
> code, so doing it here seems a bit overkill

That function is the last line in the block where pointer variable is declared.
As of now there is no danger of the dangling pointer being used.

>
> * Rename make_dummy_sjinfo() to init_dummy_sjinfo() because the
> function is not really making it per se, only initializing fields
> in the SpecialJoinInfo struct made available by the caller.

Right.

>
> * Make init_dummy_sjinfo() take Relids instead of RelOptInfos because
> that's what's needed.  Also allows to reduce changes to
> build_child_join_sjinfo()'s signature.

I remember, in one of the versions it was Relids and then I changed to
RelOptInfos for some reason. The latest version seems to use just Relids. So
this looks better.

>
> * Update the reason mentioned in a comment in free_child_join_sjinfo()
> about why semi_rhs_expr is not freed -- it may be freed, but there's
> no way to "deep" free it, so we leave it alone.
>
> * Update a comment somewhere about why SpecialJoinInfo can be
> "transient" sometimes.

With the later code, semi_rhs_expr is indeed free'able. It wasn't so when I
wrote the patches. Thanks for updating the comment. I wish we would have
freeObject(). Nothing that can be done for this patch.

Attached patches
Squashed your 0001 and 0002 into 0001. I have also reworded the commit message to reflect the latest code changes.
0002 has some minor edits. Please feel free to include or reject when committing the patch.
 
--
Best Wishes,
Ashutosh Bapat
Attachment
On Fri, Mar 22, 2024 at 7:48 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> With the later code, semi_rhs_expr is indeed free'able. It wasn't so when I
> wrote the patches. Thanks for updating the comment. I wish we would have
> freeObject(). Nothing that can be done for this patch.

Yes.

> Attached patches
> Squashed your 0001 and 0002 into 0001. I have also reworded the commit message to reflect the latest code changes.
> 0002 has some minor edits. Please feel free to include or reject when committing the patch.

I've pushed this now.

When updating the commit message, I realized that you had made the
right call to divide the the changes around not translating the dummy
SpecialJoinInfos into a separate patch.  So, I pushed it like that.

Thanks for working on this.

--
Thanks, Amit Langote





On Mon, Mar 25, 2024 at 2:47 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Attached patches
> Squashed your 0001 and 0002 into 0001. I have also reworded the commit message to reflect the latest code changes.
> 0002 has some minor edits. Please feel free to include or reject when committing the patch.

I've pushed this now.

Thanks a lot.
 

When updating the commit message, I realized that you had made the
right call to divide the the changes around not translating the dummy
SpecialJoinInfos into a separate patch.  So, I pushed it like that.


Ok. Thanks for separating the changes.

--
Best Wishes,
Ashutosh Bapat

On Mon, Mar 25, 2024 at 5:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
I've pushed this now.

When updating the commit message, I realized that you had made the
right call to divide the the changes around not translating the dummy
SpecialJoinInfos into a separate patch.  So, I pushed it like that.

Sorry I'm late for the party.  I noticed that in commit 6190d828cd the
comment of init_dummy_sjinfo() mentions the non-existing 'rel1' and
'rel2', which should be addressed.  Also, I think we can make function
consider_new_or_clause() call init_dummy_sjinfo() to simplify the code a
bit.

Attached is a trivial patch for the fixes.

Thanks
Richard
Attachment
On Mon, Mar 25, 2024 at 7:25 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Mar 25, 2024 at 5:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> I've pushed this now.
>>
>> When updating the commit message, I realized that you had made the
>> right call to divide the the changes around not translating the dummy
>> SpecialJoinInfos into a separate patch.  So, I pushed it like that.
>
>
> Sorry I'm late for the party.  I noticed that in commit 6190d828cd the
> comment of init_dummy_sjinfo() mentions the non-existing 'rel1' and
> 'rel2', which should be addressed.

Thanks for catching that.  Oops.

>  Also, I think we can make function
> consider_new_or_clause() call init_dummy_sjinfo() to simplify the code a
> bit.

Indeed.

> Attached is a trivial patch for the fixes.

Will apply shortly.

--
Thanks, Amit Langote





On Mon, Mar 25, 2024 at 4:01 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Mar 25, 2024 at 7:25 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Mar 25, 2024 at 5:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> I've pushed this now.
>>
>> When updating the commit message, I realized that you had made the
>> right call to divide the the changes around not translating the dummy
>> SpecialJoinInfos into a separate patch.  So, I pushed it like that.
>
>
> Sorry I'm late for the party.  I noticed that in commit 6190d828cd the
> comment of init_dummy_sjinfo() mentions the non-existing 'rel1' and
> 'rel2', which should be addressed.

Thanks for catching that.  Oops.

Good catch.
 

>  Also, I think we can make function
> consider_new_or_clause() call init_dummy_sjinfo() to simplify the code a
> bit.

Indeed.

Thanks for fixing it.

--
Best Wishes,
Ashutosh Bapat