Re: [report] memory leaks in COPY FROM on partitioned table - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [report] memory leaks in COPY FROM on partitioned table
Date
Msg-id 9deb7dc5-4b50-f1d7-3e73-2ff69a4f5122@lab.ntt.co.jp
Whole thread Raw
In response to Re: [report] memory leaks in COPY FROM on partitioned table  (Kohei KaiGai <kaigai@heterodb.com>)
Responses Re: [report] memory leaks in COPY FROM on partitioned table  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Hi Kaigai-san,

Thanks for the report and the patch.

On 2018/07/24 11:43, Kohei KaiGai wrote:
> Further investigation I did:
> 
> CopyFrom() calls ExecFindPartition() to identify the destination child
> table of partitioned table.
> Then, it internally calls get_partition_for_tuple() to get partition
> index according to the key value.
> This invocation is not under the per-tuple context.
> 
> In case of hash-partitioning, get_partition_for_tuple() calls
> hash-function of key data type; which is hash_numeric in my case.
> The hash_numeric fetches the key value using PG_GETARG_NUMERIC(0). It
> internally calls pg_detoast_datum() which may allocate new memory if
> varlena datum is not uncompressed long (32bit) format.
> 
> Once this patch attached, PostgreSQL backend process has been working
> with about 130MB memory consumption for 20min right now (not finished
> yet...)
> Before the patch applied, its memory consumption grows up about
> 10BM/sec, then terminated a few hours later.
> 
> P.S,
> I think do_convert_tuple() in ExecFindPartition() and
> ConvertPartitionTupleSlot() may also allocate memory out of the
> per-tuple context, however, I could not confirmed yet, because my test
> case has TupleConversionMap == NULL.

Your patch takes care of allocation happening inside
get_partition_for_tuple, but as you mention there might be others in its
caller ExecFindPartition.  So, I think we should switch to the per-tuple
context in ExecFindPartition.

When I tried to do that, I discovered that we have to be careful about
releasing some of the memory that's allocated in ExecFindPartition
ourselves instead of relying on the reset of per-tuple context to take
care of it.  That's because some of the structures that ExecFindPartition
assigns the allocated memory to are cleaned up at the end of the query, by
when it's too late to try to release per-tuple memory.  So, the patch I
ended up with is slightly bigger than simply adding a
MemoryContextSwitchTo() call at the beginning of ExecFindPartition.
Please find it attached.

Thanks,
Amit

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Finding database for pg_upgrade missing library
Next
From: Masahiko Sawada
Date:
Subject: Re: Indicate anti-wraparound autovacuum in log_autovacuum_min_duration