Re: Creating foreign key on partitioned table is too slow - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Creating foreign key on partitioned table is too slow
Date
Msg-id CA+HiwqFucUh7hYkfZ6x1MVcs_R24eUfNVuRwdE_FwuwK8XpSZg@mail.gmail.com
Whole thread Raw
In response to Re: Creating foreign key on partitioned table is too slow  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hello,

On Fri, Oct 25, 2019 at 7:18 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On Thu, Oct 24, 2019 at 03:48:57PM -0300, Alvaro Herrera wrote:
> >On 2019-Oct-23, kato-sho@fujitsu.com wrote:
> >
> >> Hello
> >>
> >> To benchmark with tpcb model, I tried to create a foreign key in the partitioned history table, but backend
processkilled by OOM.
 
> >> the number of partitions is 8192. I tried in master(commit: ad4b7aeb84).
> >>
> >> I did the same thing in another server which has 200GB memory, but creating foreign key did not end in 24 hours.
> >
> >Thanks for reporting.

Thank you Kato-san.

>  It sounds like there must be a memory leak here.
> >I am fairly pressed for time at present so I won't be able to
> >investigate this until, at least, mid November.
>
> I've briefly looked into this, and I think the main memory leak is in
> RelationBuildPartitionDesc. It gets called with PortalContext, it
> allocates a lot of memory building the descriptor, copies it into
> CacheContext but does not even try to free anything. So we end up with
> something like this:
...
> The attached patch trivially fixes that by adding a memory context
> tracking all the temporary data, and then just deletes it as a whole at
> the end of the function. This significantly reduces the memory usage for
> me, not sure it's 100% correct.

Thank you Tomas.  I think we have considered this temporary context
fix a number of times before, but it got stalled for one reason or
another ([1] comes to mind as the last thread where this came up).

Another angle to look at this is that our design where PartitionDesc
is rebuilt on relcache reload of the parent relation is not a great
one after all. It seems that we're rightly (?) invalidating the
parent's relcache 8192 times in this case, because its cacheable
foreign key descriptor changes on processing each partition, but
PartitionDesc itself doesn't change.  Having to pointlessly rebuild it
8192 times seems really wasteful.

I recall a discussion where it was proposed to build PartitionDesc
only when needed as opposed on every relcache reload of the parent
relation.  Attached PoC-at-best patch that does that seems to go
through without OOM and passes make check-world.  I think this should
have a very minor impact on select queries.

But...

> FWIW, even with this fix it still takes an awful lot to create the
> foreign key, because the CPU is stuck doing this
>
>     60.78%    60.78%  postgres  postgres            [.] bms_equal
>     32.58%    32.58%  postgres  postgres            [.] get_eclass_for_sort_expr
>      3.83%     3.83%  postgres  postgres            [.] add_child_rel_equivalences
>      0.23%     0.00%  postgres  [unknown]           [.] 0x0000000000000005
>      0.22%     0.00%  postgres  [unknown]           [.] 0000000000000000
>      0.18%     0.18%  postgres  postgres            [.] AllocSetCheck

...we have many problems to solve here. :-(

Thanks,
Amit

[1] https://www.postgresql.org/message-id/CA%2BTgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw%40mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Refactor parse analysis of EXECUTE command
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Block level parallel vacuum