Thread: Memory consumed by paths during partitionwise join planning

Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
Hi All,

If add_path() and add_partial_path() do not find a new path to be
superior to any existing paths, they free the new path. They free an
existing path if it is found to be inferior to the new path. But not
all paths surviving in a RelOptInfo are used to create paths for
relations which use it as input. Further add_path and add_partial_path
do not free the whole path tree but just the topmost pathnode.  The
subpath nodes are not freed because they may be referenced by other
paths. The subpaths continue to occupy memory even if they are not
used anywhere. As we build the relation tree upwards (base, join,
upper relations) more and more such paths are accumulated and continue
to consume memory till the statement ends. With partitionwise join
involving partitioned tables with thousands of partitions this memory
consumption increases proportional to the number of partitions.

Attached WIP patches address this issue by adding infrastructure to
track references to paths and free unreferenced paths once they can be
deemed as useless. A new member Path::ref_count in a pathnode tracks
how many other objects, like pathlists in RelOptInfo and other paths,
reference that pathnode. As the path nodes are referenced they are
"linked" using link_path() to referencing objects. When the path nodes
are no longer referenced they are "unlinked" using unlink_path() from
the referencing objects. Path nodes are freed using free_path(). The
function unlinks the sub path nodes so that they can be freed when
their reference count drops to 0. The paths whose references reach 0
during unlinking are freed automatically using free_path(). Patch 0002
adds this infrastructure. 0003 and 0004 use these functions in example
cases.

With these patches the memory consumption numbers look like below.
Experiment
----------
Memory consumption is measured using a method similar to the one
described in [1]. The changes to EXPLAIN ANALYZE to report planning
memory are in 0001. Memory consumed when planning a self-join query is
measured. The queries involve partitioned and unpartitioned tables
respectively. The partitioned table has 1000 partitions in it. The
table definitions and helper function can be found in setup.sql and
the queries can be found in queries.sql. This is the simplest setup.
Higher savings may be seen with more complex setups involving indexes,
SQL operators and clauses.

Table 1: Join between unpartitioned tables
Number of tables | without patch  | with patch | % reduction |
being joined     |                |            |             |
--------------------------------------------------------------
               2 |      29.0 KiB  |   28.9 KiB |       0.66% |
               3 |      79.1 KiB  |   76.7 KiB |       3.03% |
               4 |     208.6 KiB  |  198.2 KiB |       4.97% |
               5 |     561.6 KiB  |  526.3 KiB |       6.28% |

Table 2: join between partitioned tables with partitionwise join
enabled (enable_partitionwise_join = true).
Number of tables | without patch  | with patch | % reduction |
being joined     |                |            |             |
----------------------------------------------------------------
               2 |      40.3 MiB  |   40.0 MiB |       0.70% |
               3 |     146.9 MiB  |  143.1 MiB |       2.55% |
               4 |     445.4 MiB  |  430.4 MiB |       3.38% |
               5 |    1563.3 MiB  | 1517.6 MiB |       2.92% |

The patch is not complete because of following issues:

a. 0003 and 0004 patches do not cover all the path nodes. I have
covered only those which I encountered in the queries I ran. If we
accept this proposal, I will work on covering all the path nodes.

b. It does not cover the entire RelOptInfo tree that the planner
creates. The paths in a lower level RelOptInfo can be deemed as
useless only when path creation for all the immediate upper
RelOptInfos is complete. Thus we need access to both upper and lower
level RelOptInfos at the same time. The RelOptInfos created while
planning join are available in standard_join_search(). Thus it's
possible to free unused paths listed in all the RelOptInfos except the
topmost RelOptInfo in this function as done in the patch. But the
topmost RelOptInfo acts as an input to upper RelOptInfo in
grouping_planner() where we don't have access to the RelOptInfos at
lower levels in the join tree. Hence we can't free the unused paths
from topmost RelOptInfo and cascade that effect down the RelOptInfo
tree. This is the reason why we don't see memory reduction in case of
2-way join. This is also the reason why the numbers above are lower
than the actual memory that can be saved. If we decide to move ahead
with this approach, I will work on this too.

c. The patch does not call link_path and unlink_path in all the
required places. We will need some work to identify such places, to
build infrastructure and methods to identify such places in future.

Another approach to fixing this would be to use separate memory
context for creating path nodes and then deleting the entire memory
context at the end of planning once the plan is created. We will need
to reset path lists as well as cheapest_path members in RelOptInfos as
well. This will possibly free up more memory and might be faster. But
I have not tried it. The approach taken in the patch has an advantage
over this one i.e. the paths can be freed at any stage in the planning
using free_unused_paths_from_rel() implemented in the patch. Thus we
can monitor the memory consumption and trigger garbage collection when
it crosses a certain threashold. Or we may implement both the
approaches to clean every bit of paths at the end of planning while
garbage collecting pathnodes when memory consumption goes beyond
threashold. The reference mechanism may have other usages as well.

Suggestions/comments welcome.

References
1. https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachment

Re: Memory consumed by paths during partitionwise join planning

From
David Rowley
Date:
On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> Table 1: Join between unpartitioned tables
> Number of tables | without patch  | with patch | % reduction |
> being joined     |                |            |             |
> --------------------------------------------------------------
>                2 |      29.0 KiB  |   28.9 KiB |       0.66% |
>                3 |      79.1 KiB  |   76.7 KiB |       3.03% |
>                4 |     208.6 KiB  |  198.2 KiB |       4.97% |
>                5 |     561.6 KiB  |  526.3 KiB |       6.28% |

I have mostly just random thoughts and some questions...

Have you done any analysis of the node types that are consuming all
that memory? Are Path and subclasses of Path really that dominant in
this?  The memory savings aren't that impressive and it makes me
wonder if this is worth the trouble.

What's the goal of the memory reduction work?  If it's for
performance, does this increase performance?  Tracking refcounts of
Paths isn't free.

Why do your refcounts start at 1?  This seems weird:

+ /* Free the path if none references it. */
+ if (path->ref_count == 1)

Does ref_count really need to be an int?  There's a 16-bit hole you
could use between param_info and parallel_aware.  I doubt you'd need
32 bits of space for this.

I agree that it would be nice to get rid of the "if (!IsA(old_path,
IndexPath))" hack in add_path() and it seems like this idea could
allow that. However, I think if we were to have something like this
then you'd want all the logic to be neatly contained inside pathnode.c
without adding any burden in other areas to track or check Path
refcounts.

I imagine the patch would be neater if you added some kind of Path
traversal function akin to expression_tree_walker() that allows you to
visit each subpath recursively and run a callback function on each.

David



Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Wed, Nov 29, 2023 at 1:10 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > Table 1: Join between unpartitioned tables
> > Number of tables | without patch  | with patch | % reduction |
> > being joined     |                |            |             |
> > --------------------------------------------------------------
> >                2 |      29.0 KiB  |   28.9 KiB |       0.66% |
> >                3 |      79.1 KiB  |   76.7 KiB |       3.03% |
> >                4 |     208.6 KiB  |  198.2 KiB |       4.97% |
> >                5 |     561.6 KiB  |  526.3 KiB |       6.28% |
>
> I have mostly just random thoughts and some questions...
>
> Have you done any analysis of the node types that are consuming all
> that memory? Are Path and subclasses of Path really that dominant in
> this? The memory savings aren't that impressive and it makes me
> wonder if this is worth the trouble.

This thread and patch targets saving memory consumed by Path nodes
i.e. Path and its subclasses. Breakdown of memory consumed by various
nodes can be found in [1] for partitioned table queries.

>
> What's the goal of the memory reduction work?  If it's for
> performance, does this increase performance?  Tracking refcounts of
> Paths isn't free.

This memory reduction work is part of work to reduce planner's memory
consumption while planning queries involving partitioned tables with a
large number (thousands) of partitions [1]. The second table (copies
below) in my first email in this thread gives the memory saved by
freeing unused path nodes when planning queries involving partitioned
tables with a thousand partitions each.

Table 2: join between partitioned tables with partitionwise join
enabled (enable_partitionwise_join = true).
Number of tables | without patch  | with patch | % reduction |
being joined     |                |            |             |
------------------------------
----------------------------------
               2 |      40.3 MiB  |   40.0 MiB |       0.70% |
               3 |     146.9 MiB  |  143.1 MiB |       2.55% |
               4 |     445.4 MiB  |  430.4 MiB |       3.38% |
               5 |    1563.3 MiB  | 1517.6 MiB |       2.92% |

%wise the memory savings are not great but because of sheer amount of
memory used, the savings are in MBs. Also I don't think I managed to
free all the unused paths for the reasons listed in my original mail.
But I was doubtful whether the work is worth it. So I halted my work.
I see you think that it's not worth it. So one vote against it.


>
> Why do your refcounts start at 1?  This seems weird:
>
> + /* Free the path if none references it. */
> + if (path->ref_count == 1)

Refcounts start from 0. This code is from free_unused_paths_from_rel()
which frees paths in the rel->pathlist. At this stage a path is
referenced from rel->pathlist, hence the count will >= 1 and we should
be freeing paths with refcount > 1 i.e. referenced elsewhere.

>
> Does ref_count really need to be an int?  There's a 16-bit hole you
> could use between param_info and parallel_aware.  I doubt you'd need
> 32 bits of space for this.

16 bit space allows the refcount to go upto 65535 (or twice of that).
This is somewhere between 18C9 and 19C9. If a (the cheapest) path in a
given relation in 19 relation query is referenced in all the joins
that that relation is part of, this refcount will be exhausted. If we
aren't dealing with queries involving those many relations already, we
will soon. Maybe we should add code to protect from overflows.

>
> I agree that it would be nice to get rid of the "if (!IsA(old_path,
> IndexPath))" hack in add_path() and it seems like this idea could
> allow that. However, I think if we were to have something like this
> then you'd want all the logic to be neatly contained inside pathnode.c
> without adding any burden in other areas to track or check Path
> refcounts.

The code is in following files
 src/backend/optimizer/path/allpaths.c:
The definitions of free_unused_paths_from_rel() and
free_unused_paths() can be moved to pathnode.c

 src/backend/optimizer/util/pathnode.c
 src/include/nodes/pathnodes.h
 src/include/optimizer/pathnode.h
Those three together I consider as pathnode.c. But let me know if you
feel otherwise.

src/backend/optimizer/path/joinpath.c
this is the only place where other than pathnode.c where we use
link_path(). That's required to stop add_path from freeing a
materialized path that is tried multiple times.We can't add_path that
path to rel since it will get kicked out by the path that it's
materialising for. But I think it's still path creation code so should
be ok. Let me know if you feel otherwise.

There may be other places but I can find those out and consider them
case by case basis.

>
> I imagine the patch would be neater if you added some kind of Path
> traversal function akin to expression_tree_walker() that allows you to
> visit each subpath recursively and run a callback function on each.

Yes. Agreed.

I am fine to work on this further if the community thinks it is
acceptable. It seems from your point of view the worth of this work is
as follows
a. memory savings - not worth it
b. getting rid of !IsA(old_path, IndexPath) - worth it
c. problem of same path being added to multiple relations - not worth
it since you have other solution

Can't judge whether that means it's acceptable or not.

[1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by paths during partitionwise join planning

From
David Rowley
Date:
On Fri, 1 Dec 2023 at 19:59, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> I am fine to work on this further if the community thinks it is
> acceptable. It seems from your point of view the worth of this work is
> as follows
> a. memory savings - not worth it
> b. getting rid of !IsA(old_path, IndexPath) - worth it
> c. problem of same path being added to multiple relations - not worth
> it since you have other solution
>
> Can't judge whether that means it's acceptable or not.

I think it's worth trying to get this working and we can run some
performance tests to see if it's cheap enough to use.

I've not had enough time to fully process your patches, but on a quick
look, I don't quite understand why link_path() does not have to
recursively increment ref_counts in each of its subpaths. It seems
you're doing the opposite in free_path(), so it seems like this would
break for cases like a SortPath on say, a LimitPath over a SeqScan.
When you call create_sort_path you'll only increment the refcount on
the LimitPath. The SeqScan path then has a refcount 1 lower than it
should. If something calls unlink_path on the SeqScan path that could
put the refcount to 0 and break the SortPath by freeing a Path
indirectly referenced by the sort.

Recursively incrementing the refcounts would slow the patch down a
bit, so we'd need to test the performance of this.  I think at the
rate we create and free join paths in complex join problems we could
end up bumping refcounts quite often.

Another way to fix the pfree issues was to add infrastructure to
shallow copy paths.  We could shallow copy paths whenever we borrow a
Path from another RelOptInfo and then just reset the parent to the new
RelOptInfo.  That unfortunately makes your memory issue a bit worse.
We discussed this a bit in [1]. It also seems there was some
discussion about wrapping a Path up in a ProjectionPath in there. That
unfortunately also takes us in the opposite direction in terms of your
memory reduction goals.

Maybe we can try to move forward with your refcount idea and see how
the performance looks.  If that's intolerable then that might help us
decide on the next best alternative solution.

David

[1] https://postgr.es/m/CAApHDvo8tiB5xbJa94f4Mo8Of2fPJ2zG+zQQQTGr-ThjSsymQQ@mail.gmail.com



Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Thu, Dec 7, 2023 at 6:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 1 Dec 2023 at 19:59, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > I am fine to work on this further if the community thinks it is
> > acceptable. It seems from your point of view the worth of this work is
> > as follows
> > a. memory savings - not worth it
> > b. getting rid of !IsA(old_path, IndexPath) - worth it
> > c. problem of same path being added to multiple relations - not worth
> > it since you have other solution
> >
> > Can't judge whether that means it's acceptable or not.
>
> I think it's worth trying to get this working and we can run some
> performance tests to see if it's cheap enough to use.
>
> I've not had enough time to fully process your patches, but on a quick
> look, I don't quite understand why link_path() does not have to
> recursively increment ref_counts in each of its subpaths. It seems
> you're doing the opposite in free_path(), so it seems like this would
> break for cases like a SortPath on say, a LimitPath over a SeqScan.
> When you call create_sort_path you'll only increment the refcount on
> the LimitPath. The SeqScan path then has a refcount 1 lower than it
> should. If something calls unlink_path on the SeqScan path that could
> put the refcount to 0 and break the SortPath by freeing a Path
> indirectly referenced by the sort.

Let me explain how linking and unlinking works. You may skip over it
if you are comfortable with this logic already.
refcounts count how many other paths or objects are referencing a
given path. E.g. we have three path chains as follows
1. joinpath_1->joinpath_2->seqpath_1,
2. joinpath_3->joinpath_4->seqpath_1,
3. joinpath_5->joinpath_2->seqpath_1

Please note that this is not full path tree/forest. It is difficult to
describe the whole path forest in text format. But this should be
sufficient to get the gist of how linking and unlinking works.

Let's say all these paths are listed in pathlists of respective
RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all
part of the topmost RelOptInfo. Then the refcounts will be as follows
joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo)
joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo)
joinpath_4 - 2 (joinpath_3, RelOptInfo)
seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo)

If joinpath_3 is chosen finally to be converted into a plan,
joinpath_1, joinpath_5 will be unlinked, their refcount drops to 0 and
they will be freed.
unlinking and freeing them reduces refcount of joinpath 2 to 1. When
we will cleanse the pathlist of corresponding RelOptInfo, joinpath_2's
refcount will reduce to 0 and it will be freed.
freeing joinpath_2 will reduce refcount of seqpath to 2 (joinpath_4
and RelOptInfo).
joinpath_3 and joinpath_4 will have their refcounts unchanged.

This will leave the paths joinpath_3, joinpath_4 and seqpath_1 to be
converted to plans. Rest of the paths will be freed.

If we increment the refcount all the way down the path forest, they
won't really be refcounts anymore since they will count indirect
references as well. Then unlink_path also has to traverse the entire
tree resetting refcounts. Both of those are unnecessary. Please note
that it's only free_path which calls unlink_path on the referenced
paths not unlink_path itself.

In your example, the path chain looks like this
sort_path->limit_path->seqpath with refcounts 1, 1, 2 respectively. I
am assuming that seqpath is referenced by pathlist of its RelOptInfo
and limitpath. sortpath is referenced by its RelOptInfo or is chosen
as the final path. limitpath is not referenced in a RelOptInfo but is
referenced in sortpath.

Now the only things that can call unlink_path on seqpath are free_path
on limitpath OR pathlist cleaning of its RelOptInfo. In both the cases
the refcount will correctly report the number of objects referencing
it. So I don't see a problem here. If you can provide me a query that
does this, I will test it.


> Recursively incrementing the refcounts would slow the patch down a
> bit, so we'd need to test the performance of this.  I think at the
> rate we create and free join paths in complex join problems we could
> end up bumping refcounts quite often.

This won't be necessary as explained above.

>
> Another way to fix the pfree issues was to add infrastructure to
> shallow copy paths.  We could shallow copy paths whenever we borrow a
> Path from another RelOptInfo and then just reset the parent to the new
> RelOptInfo.  That unfortunately makes your memory issue a bit worse.
> We discussed this a bit in [1]. It also seems there was some
> discussion about wrapping a Path up in a ProjectionPath in there. That
> unfortunately also takes us in the opposite direction in terms of your
> memory reduction goals.

Yeah.

>
> Maybe we can try to move forward with your refcount idea and see how
> the performance looks.  If that's intolerable then that might help us
> decide on the next best alternative solution.

I will report performance numbers (planning time) with my current
patchset which has limitation listed in [1]. If performance gets
worse, we will abandon this idea. If not, I will work on completing
the patch. Does that work for you?

[1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by paths during partitionwise join planning

From
David Rowley
Date:
On Fri, 8 Dec 2023 at 18:02, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> given path. E.g. we have three path chains as follows
> 1. joinpath_1->joinpath_2->seqpath_1,
> 2. joinpath_3->joinpath_4->seqpath_1,
> 3. joinpath_5->joinpath_2->seqpath_1
>
> Please note that this is not full path tree/forest. It is difficult to
> describe the whole path forest in text format. But this should be
> sufficient to get the gist of how linking and unlinking works.
>
> Let's say all these paths are listed in pathlists of respective
> RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all
> part of the topmost RelOptInfo. Then the refcounts will be as follows
> joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo)
> joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo)
> joinpath_4 - 2 (joinpath_3, RelOptInfo)
> seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo)

I think this likely works fine providing you always unlink paths from
the root of the path tree.  When you start unlinking from non-root
paths I think you could start to see problems unless you increment
refcounts recursively.

The problem I had when working on improving the union planner was when
building the final paths for a union child.

Let's say I have the query:  SELECT a,b FROM t1 UNION SELECT x,y FROM t2;

When planning the first union child, I'd like to provide the union
planner with a sorted path and also the cheapest scan path on t1 to
allow it to Hash Aggregate on the cheapest path.

Let's say the planner produces the following paths on t1.

SeqScan_t1

When I create the final union child paths I want to try and produce a
few sorted paths to allow MergeAppend to work. I'll loop over each
path in t1's pathlist.

SeqScan_t1 isn't sorted, so I'll make Sort1->SeqScan_t1 add add_path
that to final_rel.

Now, I also want to allow cheap Hash Aggregates to implement the
UNION, so I'll include SeqScan_t1 without sorting it.

If I now do add_path(final_rel, SeqScan_t1), add_path() could see that
the SeqScan cost is fuzzily the same as the Sort->SeqScan_t1 and since
the sort version has better PathKeys, add_path will unlink SeqScan_t1.

Now, I don't think your scheme for refcounting paths is broken by this
because you'll increment the refcount of the SeqScan_t1 when the Sort
uses it as its subpath, but if instead of a Sort->SeqScan that was
IncrementalSort->Sort->SeqScan, then suddenly you're not incrementing
the refcount for the SeqScan when you create the Incremental Sort due
to the Sort being between the two.

So, if I now do add_path(final_rel, Sort2->Incremental_Sort->SeqScan_t1)

Sort1 is rejected due to cost and we unlink it. Sort1 refcount gets to
0 and we free the path and the SeqScan_t1's refcount goes down by 1
but does not reach 0.

We then add_path(final_rel, SeqScan_t1) but this path is fuzzily the
same cost as Sort2 so we reject SeqScan_t1 due to Sort2 having better
pathkeys and we unlink SeqScan_t1.  Refcount on SeqScan_t1 gets to 0
and we free the path. We need SeqScan_t1 for the incremental sort.

Perhaps in reality here, since SeqScan_t1 exists in the base rel's
pathlist we'd still have a refcount from that, but I assume at some
point you want to start freeing up paths from RelOptInfos that we're
done with, in which case the SeqScan_t1's refcount would reach 0 at
that point and we'd crash during create plan of the
Sort2->Incremental_Sort->SeqScan_t1 plan.

Have I misunderstood something here?  As far as I see it with your
non-recursive refcounts, it's only safe to free from the root of a
pathtree and isn't safe when unlinking paths that are the subpath of
another path unless the unlinking is done from the root.

David



Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Fri, Dec 8, 2023 at 1:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 8 Dec 2023 at 18:02, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > given path. E.g. we have three path chains as follows
> > 1. joinpath_1->joinpath_2->seqpath_1,
> > 2. joinpath_3->joinpath_4->seqpath_1,
> > 3. joinpath_5->joinpath_2->seqpath_1
> >
> > Please note that this is not full path tree/forest. It is difficult to
> > describe the whole path forest in text format. But this should be
> > sufficient to get the gist of how linking and unlinking works.
> >
> > Let's say all these paths are listed in pathlists of respective
> > RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all
> > part of the topmost RelOptInfo. Then the refcounts will be as follows
> > joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo)
> > joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo)
> > joinpath_4 - 2 (joinpath_3, RelOptInfo)
> > seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo)
>
> I think this likely works fine providing you always unlink paths from
> the root of the path tree.  When you start unlinking from non-root
> paths I think you could start to see problems unless you increment
> refcounts recursively.
>
> The problem I had when working on improving the union planner was when
> building the final paths for a union child.
>
> Let's say I have the query:  SELECT a,b FROM t1 UNION SELECT x,y FROM t2;
>
> When planning the first union child, I'd like to provide the union
> planner with a sorted path and also the cheapest scan path on t1 to
> allow it to Hash Aggregate on the cheapest path.
>
> Let's say the planner produces the following paths on t1.
>
> SeqScan_t1
>
> When I create the final union child paths I want to try and produce a
> few sorted paths to allow MergeAppend to work. I'll loop over each
> path in t1's pathlist.
>
> SeqScan_t1 isn't sorted, so I'll make Sort1->SeqScan_t1 add add_path
> that to final_rel.
>
> Now, I also want to allow cheap Hash Aggregates to implement the
> UNION, so I'll include SeqScan_t1 without sorting it.
>
> If I now do add_path(final_rel, SeqScan_t1), add_path() could see that
> the SeqScan cost is fuzzily the same as the Sort->SeqScan_t1 and since
> the sort version has better PathKeys, add_path will unlink SeqScan_t1.
>
> Now, I don't think your scheme for refcounting paths is broken by this
> because you'll increment the refcount of the SeqScan_t1 when the Sort
> uses it as its subpath, but if instead of a Sort->SeqScan that was
> IncrementalSort->Sort->SeqScan, then suddenly you're not incrementing
> the refcount for the SeqScan when you create the Incremental Sort due
> to the Sort being between the two.
>
> So, if I now do add_path(final_rel, Sort2->Incremental_Sort->SeqScan_t1)
>
> Sort1 is rejected due to cost and we unlink it. Sort1 refcount gets to
> 0 and we free the path and the SeqScan_t1's refcount goes down by 1
> but does not reach 0.

What is Sort1? Sort1->Seqscan_t1 OR incremental_sort->Sort->seqscan_t1?

I am getting lost here. But I think you have misunderstood the code so
this question is irrelevant. See next.

>
> We then add_path(final_rel, SeqScan_t1) but this path is fuzzily the
> same cost as Sort2 so we reject SeqScan_t1 due to Sort2 having better
> pathkeys and we unlink SeqScan_t1.  Refcount on SeqScan_t1 gets to 0
> and we free the path. We need SeqScan_t1 for the incremental sort.

the path being presented to add_path is never passed to unlink_path().
If it's suboptimal to any existing path, it is passed to free_path()
which in its current form will throw an error. But that can fixed.
free_path can just ignore a path with refcount > 0, if we want to
present the same path to add_path multiple times.

>
> Perhaps in reality here, since SeqScan_t1 exists in the base rel's
> pathlist we'd still have a refcount from that, but I assume at some
> point you want to start freeing up paths from RelOptInfos that we're
> done with, in which case the SeqScan_t1's refcount would reach 0 at
> that point and we'd crash during create plan of the
> Sort2->Incremental_Sort->SeqScan_t1 plan.
>
> Have I misunderstood something here?  As far as I see it with your
> non-recursive refcounts, it's only safe to free from the root of a
> pathtree and isn't safe when unlinking paths that are the subpath of
> another path unless the unlinking is done from the root.
>

As long as we call link_path every time we reference a path, we could
start freeing paths from anywhere. The reference count takes care of
not freeing referenced paths. Of course, I will need to fix
free_path() as above.

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Thu, Dec 7, 2023 at 6:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> Maybe we can try to move forward with your refcount idea and see how
> the performance looks.  If that's intolerable then that might help us
> decide on the next best alternative solution.
>

Here are performance numbers

setup

create table t1 (a integer primary key, b integer);
create table t1_parted (a integer primary key, b integer) partition by range(a);
create 1000 partitions of t1

query (a five way self-join)
select * from t1 a, t1 b, t1 c, t1 d, t1 e where a.a = b.a and b.a =
c.a and c.a = d.a and d.a = e.a -- unpartitioned table case
select * from t1_parted a, t1_parted b, t1_parted c, t1_parted d,
t1_parted e where a.a = b.a and b.a = c.a and c.a =     d.a and d.a =
e.a; -- partitioned table case

The numbers with patches attached to [1] with limitations listed in
the email are thus

Ran each query 10 times through EXPLAIN (SUMMARY) and averaged
planning time with and without patch.
unpartitioned case
without patch: 0.25
with patch: 0.19
this improvement is probably misleading. The planning time for this
query change a lot.

partitioned case (without partitionwise join)
without patch: 14580.65
with patch: 14625.12
% degradation: 0.3%

partitioned case (with partitionwise join)
without patch: 23273.69
with patch: 23328.52
 % degradation: 0.2%

That looks pretty small considering the benefits. What do you think?

[1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
Forgot to mention,

On Thu, Dec 14, 2023 at 5:34 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Dec 7, 2023 at 6:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > Maybe we can try to move forward with your refcount idea and see how
> > the performance looks.  If that's intolerable then that might help us
> > decide on the next best alternative solution.
> >
>
> Here are performance numbers
>
> setup
>
> create table t1 (a integer primary key, b integer);
> create table t1_parted (a integer primary key, b integer) partition by range(a);
> create 1000 partitions of t1
>
> query (a five way self-join)
> select * from t1 a, t1 b, t1 c, t1 d, t1 e where a.a = b.a and b.a =
> c.a and c.a = d.a and d.a = e.a -- unpartitioned table case
> select * from t1_parted a, t1_parted b, t1_parted c, t1_parted d,
> t1_parted e where a.a = b.a and b.a = c.a and c.a =     d.a and d.a =
> e.a; -- partitioned table case
>
> The numbers with patches attached to [1] with limitations listed in
> the email are thus
>
> Ran each query 10 times through EXPLAIN (SUMMARY) and averaged
> planning time with and without patch.
> unpartitioned case
> without patch: 0.25
> with patch: 0.19
> this improvement is probably misleading. The planning time for this
> query change a lot.
>
> partitioned case (without partitionwise join)
> without patch: 14580.65
> with patch: 14625.12
> % degradation: 0.3%
>
> partitioned case (with partitionwise join)
> without patch: 23273.69
> with patch: 23328.52
>  % degradation: 0.2%
>
> That looks pretty small considering the benefits. What do you think?
>
> [1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

If you want to experiment, please use attached patches. There's a fix
for segfault during initdb in them. The patches are still raw.

--
Best Wishes,
Ashutosh Bapat

Attachment

Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

> >
> > That looks pretty small considering the benefits. What do you think?
> >
> > [1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com
>
> If you want to experiment, please use attached patches. There's a fix
> for segfault during initdb in them. The patches are still raw.

First patch is no longer required. Here's rebased set

The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

--
Best Wishes,
Ashutosh Bapat

Attachment

Re: Memory consumed by paths during partitionwise join planning

From
Andrei Lepikhov
Date:
On 6/2/2024 19:51, Ashutosh Bapat wrote:
> On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
> The patches are raw. make check has some crashes that I need to fix. I
> am waiting to hear whether this is useful and whether the design is on
> the right track.
Let me write words of opinion on that feature.
I generally like the idea of a refcount field. We had problems 
generating alternative paths in extensions and designing the Asymmetric 
Join feature [1] when we proposed an alternative path one level below 
and called the add_path() routine. We had lost the path in the path list 
referenced from the upper RelOptInfo. This approach allows us to make 
more handy solutions instead of hacking with a copy/restore pathlist.
But I'm not sure about freeing unreferenced paths. I would have to see 
alternatives in the pathlist.
About partitioning. As I discovered planning issues connected to 
partitions, the painful problem is a rule, according to which we are 
trying to use all nomenclature of possible paths for each partition. 
With indexes, it quickly increases optimization work. IMO, this can help 
a 'symmetrical' approach, which could restrict the scope of possible 
pathways for upcoming partitions if we filter some paths in a set of 
previously planned partitions.
Also, I am glad to see a positive opinion about the path_walker() 
routine. Somewhere else, for example, in [2], it seems we need it too.

[1] 
https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CAMbWs496%2BN%3DUAjOc%3DrcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw%40mail.gmail.com
-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Thu, Feb 15, 2024 at 9:41 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
>
> On 6/2/2024 19:51, Ashutosh Bapat wrote:
> > On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
> > The patches are raw. make check has some crashes that I need to fix. I
> > am waiting to hear whether this is useful and whether the design is on
> > the right track.
> Let me write words of opinion on that feature.
> I generally like the idea of a refcount field. We had problems
> generating alternative paths in extensions and designing the Asymmetric
> Join feature [1] when we proposed an alternative path one level below
> and called the add_path() routine. We had lost the path in the path list
> referenced from the upper RelOptInfo. This approach allows us to make
> more handy solutions instead of hacking with a copy/restore pathlist.

Thanks for your comments. I am glad that you find this useful.

> But I'm not sure about freeing unreferenced paths. I would have to see
> alternatives in the pathlist.

I didn't understand this. Can you please elaborate? A path in any
pathlist is referenced. An unreferenced path should not be in any
pathlist.

> About partitioning. As I discovered planning issues connected to
> partitions, the painful problem is a rule, according to which we are
> trying to use all nomenclature of possible paths for each partition.
> With indexes, it quickly increases optimization work. IMO, this can help
> a 'symmetrical' approach, which could restrict the scope of possible
> pathways for upcoming partitions if we filter some paths in a set of
> previously planned partitions.

filter or free?

> Also, I am glad to see a positive opinion about the path_walker()
> routine. Somewhere else, for example, in [2], it seems we need it too.
>

I agree that we should introduce path_walker() yes.

I am waiting for David's opinion about the performance numbers shared
in [1] before working further on the patches and bring them out of WIP
state.

[1] postgr.es/m/CAExHW5uDkMQL8SicV3_=AYcsWwMTNR8GzJo310J7sv7TMLoL6Q@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by paths during partitionwise join planning

From
Andrei Lepikhov
Date:
On 15/2/2024 19:06, Ashutosh Bapat wrote:
> On Thu, Feb 15, 2024 at 9:41 AM Andrei Lepikhov
>> But I'm not sure about freeing unreferenced paths. I would have to see
>> alternatives in the pathlist.
> 
> I didn't understand this. Can you please elaborate? A path in any
> pathlist is referenced. An unreferenced path should not be in any
> pathlist.
I mean that at some point, an extension can reconsider the path tree 
after building the top node of this path. I vaguely recall that we 
already have (or had) kind of such optimization in the core where part 
of the plan changes after it has been built.
Live example: right now, I am working on the code like MSSQL has - a 
combination of NestLoop and HashJoin paths and switching between them in 
real-time. It requires both paths in the path list at the moment when 
extensions are coming. Even if one of them isn't referenced from the 
upper pathlist, it may still be helpful for the extension.

>> About partitioning. As I discovered planning issues connected to
>> partitions, the painful problem is a rule, according to which we are
>> trying to use all nomenclature of possible paths for each partition.
>> With indexes, it quickly increases optimization work. IMO, this can help
>> a 'symmetrical' approach, which could restrict the scope of possible
>> pathways for upcoming partitions if we filter some paths in a set of
>> previously planned partitions.
> 
> filter or free?
Filter.
I meant that Postres tries to apply IndexScan, BitmapScan, 
IndexOnlyScan, and other strategies, passing throughout the partition 
indexes. The optimizer spends a lot of time doing that. So, why not 
introduce a symmetrical strategy and give away from the search some 
indexes of types of scan based on the pathifying experience of previous 
partitions of the same table: if you have dozens of partitions, Is it 
beneficial for the system to find a bit more optimal IndexScan on one 
partition having SeqScans on 999 other?

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Fri, Feb 16, 2024 at 8:42 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> Live example: right now, I am working on the code like MSSQL has - a
> combination of NestLoop and HashJoin paths and switching between them in
> real-time. It requires both paths in the path list at the moment when
> extensions are coming. Even if one of them isn't referenced from the
> upper pathlist, it may still be helpful for the extension.

There is no guarantee that every path presented to add_path will be
preserved. Suboptimal paths are freed as and when add_path discovers
that they are suboptimal. So I don't think an extension can rely on
existence of a path. But having a refcount makes it easy to preserve
the required paths by referencing them.

>
> >> About partitioning. As I discovered planning issues connected to
> >> partitions, the painful problem is a rule, according to which we are
> >> trying to use all nomenclature of possible paths for each partition.
> >> With indexes, it quickly increases optimization work. IMO, this can help
> >> a 'symmetrical' approach, which could restrict the scope of possible
> >> pathways for upcoming partitions if we filter some paths in a set of
> >> previously planned partitions.
> >
> > filter or free?
> Filter.
> I meant that Postres tries to apply IndexScan, BitmapScan,
> IndexOnlyScan, and other strategies, passing throughout the partition
> indexes. The optimizer spends a lot of time doing that. So, why not
> introduce a symmetrical strategy and give away from the search some
> indexes of types of scan based on the pathifying experience of previous
> partitions of the same table: if you have dozens of partitions, Is it
> beneficial for the system to find a bit more optimal IndexScan on one
> partition having SeqScans on 999 other?
>
IIUC, you are suggesting that instead of planning each
partition/partitionwise join, we only create paths with the strategies
which were found to be optimal with previous partitions. That's a good
heuristic but it won't work if partition properties - statistics,
indexes etc. differ between groups of partitions.

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by paths during partitionwise join planning

From
Andrei Lepikhov
Date:
On 19/2/2024 19:25, Ashutosh Bapat wrote:
> On Fri, Feb 16, 2024 at 8:42 AM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> Live example: right now, I am working on the code like MSSQL has - a
>> combination of NestLoop and HashJoin paths and switching between them in
>> real-time. It requires both paths in the path list at the moment when
>> extensions are coming. Even if one of them isn't referenced from the
>> upper pathlist, it may still be helpful for the extension.
> 
> There is no guarantee that every path presented to add_path will be
> preserved. Suboptimal paths are freed as and when add_path discovers
> that they are suboptimal. So I don't think an extension can rely on
> existence of a path. But having a refcount makes it easy to preserve
> the required paths by referencing them.
I don't insist, just provide my use case. It would be ideal if you would 
provide some external routines for extensions that allow for sticking 
the path in pathlist even when it has terrible cost estimation.
> 
>>
>>>> About partitioning. As I discovered planning issues connected to
>>>> partitions, the painful problem is a rule, according to which we are
>>>> trying to use all nomenclature of possible paths for each partition.
>>>> With indexes, it quickly increases optimization work. IMO, this can help
>>>> a 'symmetrical' approach, which could restrict the scope of possible
>>>> pathways for upcoming partitions if we filter some paths in a set of
>>>> previously planned partitions.
>>>
>>> filter or free?
>> Filter.
>> I meant that Postres tries to apply IndexScan, BitmapScan,
>> IndexOnlyScan, and other strategies, passing throughout the partition
>> indexes. The optimizer spends a lot of time doing that. So, why not
>> introduce a symmetrical strategy and give away from the search some
>> indexes of types of scan based on the pathifying experience of previous
>> partitions of the same table: if you have dozens of partitions, Is it
>> beneficial for the system to find a bit more optimal IndexScan on one
>> partition having SeqScans on 999 other?
>>
> IIUC, you are suggesting that instead of planning each
> partition/partitionwise join, we only create paths with the strategies
> which were found to be optimal with previous partitions. That's a good
> heuristic but it won't work if partition properties - statistics,
> indexes etc. differ between groups of partitions.
Sure, but the "Symmetry" strategy assumes that on the scope of a 
thousand partitions, especially with parallel append involved, it 
doesn't cause sensible performance degradation if we find a bit 
suboptimal path in a small subset of partitions. Does it make sense?
As I see, when people use 10-100 partitions for the table, they usually 
strive to keep indexes symmetrical for all partitions.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Tue, Feb 20, 2024 at 8:19 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
>
> On 19/2/2024 19:25, Ashutosh Bapat wrote:
> > On Fri, Feb 16, 2024 at 8:42 AM Andrei Lepikhov
> > <a.lepikhov@postgrespro.ru> wrote:
> >> Live example: right now, I am working on the code like MSSQL has - a
> >> combination of NestLoop and HashJoin paths and switching between them in
> >> real-time. It requires both paths in the path list at the moment when
> >> extensions are coming. Even if one of them isn't referenced from the
> >> upper pathlist, it may still be helpful for the extension.
> >
> > There is no guarantee that every path presented to add_path will be
> > preserved. Suboptimal paths are freed as and when add_path discovers
> > that they are suboptimal. So I don't think an extension can rely on
> > existence of a path. But having a refcount makes it easy to preserve
> > the required paths by referencing them.
> I don't insist, just provide my use case. It would be ideal if you would
> provide some external routines for extensions that allow for sticking
> the path in pathlist even when it has terrible cost estimation.

With refcounts you can reference it and store it somewhere other than
pathlist. The path won't be lost until it is dereferrenced.
RelOptInfo::Pathlist is for optimal paths.

> >
> >>
> >>
> > IIUC, you are suggesting that instead of planning each
> > partition/partitionwise join, we only create paths with the strategies
> > which were found to be optimal with previous partitions. That's a good
> > heuristic but it won't work if partition properties - statistics,
> > indexes etc. differ between groups of partitions.
> Sure, but the "Symmetry" strategy assumes that on the scope of a
> thousand partitions, especially with parallel append involved, it
> doesn't cause sensible performance degradation if we find a bit
> suboptimal path in a small subset of partitions. Does it make sense?
> As I see, when people use 10-100 partitions for the table, they usually
> strive to keep indexes symmetrical for all partitions.
>

I agree that we need something like that. In order to do that, we need
machinery to prove that all partitions have similar properties. Once
that is proved, we can skip creating paths for similar partitions. But
that's out of scope of this work and complements it.

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by paths during partitionwise join planning

From
Andrei Lepikhov
Date:
On 6/2/2024 13:51, Ashutosh Bapat wrote:
> On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> 
>>>
>>> That looks pretty small considering the benefits. What do you think?
>>>
>>> [1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com
>>
>> If you want to experiment, please use attached patches. There's a fix
>> for segfault during initdb in them. The patches are still raw.
> 
> First patch is no longer required. Here's rebased set
> 
> The patches are raw. make check has some crashes that I need to fix. I
> am waiting to hear whether this is useful and whether the design is on
> the right track.
I think this work is promising, especially in the scope of partitioning.
I've analysed how it works by basing these patches on the current 
master. You propose freeing unused paths after the end of the standard 
path search.
In my view, upper paths generally consume less memory than join and scan 
paths. This is primarily due to the limited options provided by Postgres 
so far.
At the same time, this technique (while highly useful in general) adds 
fragility and increases complexity: a developer needs to remember to 
link the path using the pointer in different places of the code.
So, maybe go the alternative way? Invent a subquery memory context and 
store all the path allocations there. It can be freed after setrefs 
finishes this subquery planning without pulling up this subquery.

-- 
regards, Andrei Lepikhov




Re: Memory consumed by paths during partitionwise join planning

From
Ashutosh Bapat
Date:
On Thu, Sep 19, 2024 at 4:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 6/2/2024 13:51, Ashutosh Bapat wrote:
> > On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> >
> >>>
> >>> That looks pretty small considering the benefits. What do you think?
> >>>
> >>> [1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com
> >>
> >> If you want to experiment, please use attached patches. There's a fix
> >> for segfault during initdb in them. The patches are still raw.
> >
> > First patch is no longer required. Here's rebased set
> >
> > The patches are raw. make check has some crashes that I need to fix. I
> > am waiting to hear whether this is useful and whether the design is on
> > the right track.
> I think this work is promising, especially in the scope of partitioning.
> I've analysed how it works by basing these patches on the current
> master. You propose freeing unused paths after the end of the standard
> path search.
> In my view, upper paths generally consume less memory than join and scan
> paths. This is primarily due to the limited options provided by Postgres
> so far.
> At the same time, this technique (while highly useful in general) adds
> fragility and increases complexity: a developer needs to remember to
> link the path using the pointer in different places of the code.
> So, maybe go the alternative way? Invent a subquery memory context and
> store all the path allocations there. It can be freed after setrefs
> finishes this subquery planning without pulling up this subquery.
>

Using memory context for subquery won't help with partitioning right?
If the outermost query has a partitioned table with thousands of
partitions, it will still accumulate those paths till the very end of
planning. We could instead use memory context/s to store all the paths
created, then copy the optimal paths into a new memory context at
strategic points and blow up the old memory context. And repeat this
till we choose the final path and create a plan out of it; at that
point we could blow up the memory context containing remaining paths
as well. That will free the paths as soon as they are rendered
useless. I discussed this idea with Alvaro offline. We thought that
this approach needs some code to copy paths and then copying paths
recursively has some overhead of itself. The current work of adding a
reference count, OTOH has potential to bring discipline into the way
we handle paths. We need to avoid risks posed by dangling pointers.
For which Alvaro suggested looking at the way we manage snapshots. But
I didn't get time to explore that idea yet.

--
Best Wishes,
Ashutosh Bapat



Re: Memory consumed by paths during partitionwise join planning

From
Andrei Lepikhov
Date:
On 19/9/2024 13:12, Ashutosh Bapat wrote:
> On Thu, Sep 19, 2024 at 4:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> At the same time, this technique (while highly useful in general) adds
>> fragility and increases complexity: a developer needs to remember to
>> link the path using the pointer in different places of the code.
>> So, maybe go the alternative way? Invent a subquery memory context and
>> store all the path allocations there. It can be freed after setrefs
>> finishes this subquery planning without pulling up this subquery.
>>
> 
> Using memory context for subquery won't help with partitioning right?
> If the outermost query has a partitioned table with thousands of
> partitions, it will still accumulate those paths till the very end of
> planning.
I got it. Just haven't had huge tables in the outer before.

> We could instead use memory context/s to store all the paths
> created, then copy the optimal paths into a new memory context at
> strategic points and blow up the old memory context. And repeat this
> till we choose the final path and create a plan out of it; at that
> point we could blow up the memory context containing remaining paths
> as well. That will free the paths as soon as they are rendered
> useless.
I think any scalable solution should be based on a per-partition 
cleanup. For starters, why not adopt Tom's patch [1] for selectivity 
estimations? We will see the profit in the case of long lists of clauses.

> I discussed this idea with Alvaro offline. We thought that
> this approach needs some code to copy paths and then copying paths
> recursively has some overhead of itself.
It needs path_tree_walker at first. We discussed it before but failed. 
Maybe design it beforehand and use it in re-parameterising code?

> The current work of adding a
> reference count, OTOH has potential to bring discipline into the way
> we handle paths. We need to avoid risks posed by dangling pointers.
Both hands up for having pointer counters: It is painful all the time in 
extensions to invent an approach to safely removing a path you want to 
replace with a custom one. I just want to say it looks too dangerous 
compared to the value of a possible positive outcome.

> For which Alvaro suggested looking at the way we manage snapshots. But
> I didn't get time to explore that idea yet.
Unfortunately, I can't understand this idea without an explanation.

[1] Optimize planner memory consumption for huge arrays
https://www.postgresql.org/message-id/1367418.1708816059@sss.pgh.pa.us

-- 
regards, Andrei Lepikhov