Re: Memory consumed by paths during partitionwise join planning - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Memory consumed by paths during partitionwise join planning |
Date | |
Msg-id | CAExHW5trvsPRo+JCjBzR=OhSqcktUWZ8nt71KBnL1JD8y1ubcA@mail.gmail.com Whole thread Raw |
In response to | Re: Memory consumed by paths during partitionwise join planning (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
On Fri, Dec 8, 2023 at 1:02 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 8 Dec 2023 at 18:02, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > given path. E.g. we have three path chains as follows > > 1. joinpath_1->joinpath_2->seqpath_1, > > 2. joinpath_3->joinpath_4->seqpath_1, > > 3. joinpath_5->joinpath_2->seqpath_1 > > > > Please note that this is not full path tree/forest. It is difficult to > > describe the whole path forest in text format. But this should be > > sufficient to get the gist of how linking and unlinking works. > > > > Let's say all these paths are listed in pathlists of respective > > RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all > > part of the topmost RelOptInfo. Then the refcounts will be as follows > > joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo) > > joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo) > > joinpath_4 - 2 (joinpath_3, RelOptInfo) > > seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo) > > I think this likely works fine providing you always unlink paths from > the root of the path tree. When you start unlinking from non-root > paths I think you could start to see problems unless you increment > refcounts recursively. > > The problem I had when working on improving the union planner was when > building the final paths for a union child. > > Let's say I have the query: SELECT a,b FROM t1 UNION SELECT x,y FROM t2; > > When planning the first union child, I'd like to provide the union > planner with a sorted path and also the cheapest scan path on t1 to > allow it to Hash Aggregate on the cheapest path. > > Let's say the planner produces the following paths on t1. > > SeqScan_t1 > > When I create the final union child paths I want to try and produce a > few sorted paths to allow MergeAppend to work. I'll loop over each > path in t1's pathlist. > > SeqScan_t1 isn't sorted, so I'll make Sort1->SeqScan_t1 add add_path > that to final_rel. > > Now, I also want to allow cheap Hash Aggregates to implement the > UNION, so I'll include SeqScan_t1 without sorting it. > > If I now do add_path(final_rel, SeqScan_t1), add_path() could see that > the SeqScan cost is fuzzily the same as the Sort->SeqScan_t1 and since > the sort version has better PathKeys, add_path will unlink SeqScan_t1. > > Now, I don't think your scheme for refcounting paths is broken by this > because you'll increment the refcount of the SeqScan_t1 when the Sort > uses it as its subpath, but if instead of a Sort->SeqScan that was > IncrementalSort->Sort->SeqScan, then suddenly you're not incrementing > the refcount for the SeqScan when you create the Incremental Sort due > to the Sort being between the two. > > So, if I now do add_path(final_rel, Sort2->Incremental_Sort->SeqScan_t1) > > Sort1 is rejected due to cost and we unlink it. Sort1 refcount gets to > 0 and we free the path and the SeqScan_t1's refcount goes down by 1 > but does not reach 0. What is Sort1? Sort1->Seqscan_t1 OR incremental_sort->Sort->seqscan_t1? I am getting lost here. But I think you have misunderstood the code so this question is irrelevant. See next. > > We then add_path(final_rel, SeqScan_t1) but this path is fuzzily the > same cost as Sort2 so we reject SeqScan_t1 due to Sort2 having better > pathkeys and we unlink SeqScan_t1. Refcount on SeqScan_t1 gets to 0 > and we free the path. We need SeqScan_t1 for the incremental sort. the path being presented to add_path is never passed to unlink_path(). If it's suboptimal to any existing path, it is passed to free_path() which in its current form will throw an error. But that can fixed. free_path can just ignore a path with refcount > 0, if we want to present the same path to add_path multiple times. > > Perhaps in reality here, since SeqScan_t1 exists in the base rel's > pathlist we'd still have a refcount from that, but I assume at some > point you want to start freeing up paths from RelOptInfos that we're > done with, in which case the SeqScan_t1's refcount would reach 0 at > that point and we'd crash during create plan of the > Sort2->Incremental_Sort->SeqScan_t1 plan. > > Have I misunderstood something here? As far as I see it with your > non-recursive refcounts, it's only safe to free from the root of a > pathtree and isn't safe when unlinking paths that are the subpath of > another path unless the unlinking is done from the root. > As long as we call link_path every time we reference a path, we could start freeing paths from anywhere. The reference count takes care of not freeing referenced paths. Of course, I will need to fix free_path() as above. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: