Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date
Msg-id CAFjFpRfS9fW1+9vcbqyPq0WcqxoGN1uPUHGGq_x2eTUorOHu7w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Mar 16, 2017 at 12:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 15, 2017 at 8:49 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>>> Of course, that supposes that 0009 can manage to postpone creating
>>> non-sampled child joinrels until create_partition_join_plan(), which
>>> it currently doesn't.
>>
>> Right. We need the child-join's RelOptInfos to estimate sizes, so that
>> we could sample the largest ones. So postponing it looks difficult.
>
> You have a point.
>
>>> In fact, unless I'm missing something, 0009
>>> hasn't been even slightly adapted to take advantage of the
>>> infrastructure in 0001; it doesn't seem to reset the path_cxt or
>>> anything.  That seems like a fairly major omission.
>>
>> The path_cxt reset introduced by 0001 recycles memory used by all the
>> paths, including paths created for the children. But that happens only
>> after all the planning has completed. I thought that's what we
>> discussed to be done. We could create a separate path context for
>> every top-level child-join.
>
> I don't think we need to create a new context for each top-level
> child-join, but I think we should create a context to be used across
> all top-level child-joins and then reset it after planning each one.

Sorry, that's what I meant by creating a new context for each
top-level child-join. So, we need to copy the required path tree
before resetting the context. I am fine doing that but read on.

> I thought the whole point here was that NOT doing that caused the
> memory usage for partitionwise join to get out of control.  Am I
> confused?

We took a few steps to reduce the memory footprint of partition-wise
join in [1] and [2]. According to the numbers reported in [1] and then
in [2], if the total memory consumed by a planner is 44MB (memory
consumed by paths 150K) for a 5-way non-parition-wise join between
tables with 1000 partitions, partition-wise join consumed 192MB which
is 4.4 times the non-partitino-wise case. The earlier implementation
of blowing away a memory context after each top-level child-join, just
got rid of the paths created for that child-join. The total memory
consumed by paths created for all the child-joins was about 150MB.
Remember that we can not get rid of memory consumed by expressions,
RelOptInfos, RestrictInfos etc. since their pointers will be copied
into the plan nodes.

With changes in 0001, what happens is we accumulate 150MB till the end
of the planning and get rid of it after we have created a plan. So,
till the plan is created we are consuming approx. 192MB + 150MB =
342MB memory and are getting rid of 150MB memory after we have created
the plan. I am not sure whether consuming extra 150MB or for that
matter 342MB in a setup with a thousand partitions is "going out of
control". (342MB is approx. 7.8 time 44MB; not 1000 times, and not
even 10 times). But if you think that we should throw away unused
paths after planning every top-level child-join I am fine with it.

[1] https://www.postgresql.org/message-id/CAFjFpRcZ_M3-JxoiDkdoPS%2B-9Cok4ux9Si%2B4drcRL-62af%3DjWw@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAFjFpRe66z%2Bw9%2BdnAkWGiaB1CU2CUQsLGsqzHzYBoA%3DKJFf%2BPQ%40mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] Parallel Append implementation
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables