Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date
Msg-id CAFjFpRdkfsRzdAuki1Z6WjoZvDVKq1VjXJFYRFWiDz8qASV3tA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
>>
>> There are some differences in what geqo does and what partition-wise
>> needs to do. geqo tries many joining orders each one in a separate
>> temporary context. The way geqo slices the work, every slice produces
>> a full plan. For partition-wise join I do not see a way to slice the
>> work such that the whole path and corresponding RelOptInfos come from
>> the same slice. So, we can't use the same method as GEQO.
>
> What I was thinking about was the use of this technique for getting
> rid of joinrels:
>
>     root->join_rel_list = list_truncate(root->join_rel_list,
>                                         savelength);
>     root->join_rel_hash = savehash;
>
> makePathNode() serves to segregate paths into a separate memory
> context that can then be destroyed, but as you point out, the path
> lists are still hanging around, and so are the RelOptInfo nodes.  It
> seems to me we could do a lot better using this technique.  Suppose we
> jigger things so that the List objects created by add_path go into
> path_cxt, and so that RelOptInfo nodes also go into path_cxt.  Then
> when we blow up path_cxt we won't have dangling pointers in the
> RelOptInfo objects any more because the RelOptInfos themselves will be
> gone.  The only problem is that the join_rel_list (and join_rel_hash
> if it exists) will be corrupt, but we can fix that using the technique
> demonstrated above.
>
> Of course, that supposes that 0009 can manage to postpone creating
> non-sampled child joinrels until create_partition_join_plan(), which
> it currently doesn't.

Right. We need the child-join's RelOptInfos to estimate sizes, so that
we could sample the largest ones. So postponing it looks difficult.

> In fact, unless I'm missing something, 0009
> hasn't been even slightly adapted to take advantage of the
> infrastructure in 0001; it doesn't seem to reset the path_cxt or
> anything.  That seems like a fairly major omission.

The path_cxt reset introduced by 0001 recycles memory used by all the
paths, including paths created for the children. But that happens only
after all the planning has completed. I thought that's what we
discussed to be done. We could create a separate path context for
every top-level child-join. That will require either copying the
cheapest path-tree into root->glob->path_cxt memory context OR will
require it to be converted to a plan immediately. The first will
require spending CPU cycles and memory in copying path-tree. The later
requires almost all the create_*_append_plan() code to be duplicated
in create_partition_join_plan() which is ugly. In an earlier version
of this patch I had that code, which I got rid of in the latest set of
patches. Between those two the first looks better.

>
> Incidentally, I committed 0002, 0003, and 0005 as a single commit with
> a few tweaks; I think you may need to do a bit of rebasing.

Thanks. I will have fewer patches to rebase now :).

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Microvacuum support for Hash Index
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables