Home > mailing lists

Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: COPY FROM WHEN condition
Date	March 28, 2019 12:15:51
Msg-id	20190328121551.hn7r3e2cnazjl2z2@alap3.anarazel.de Whole thread Raw
In response to	Re: COPY FROM WHEN condition (David Rowley <david.rowley@2ndquadrant.com>)
Responses	Re: COPY FROM WHEN condition
List	pgsql-hackers

Tree view

Hi,

On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> I had a look at this and performance has improved again, thanks.
> However, I'm not sure if the patch is exactly what we need, let me
> explain.

I'm not entirely sure either, I just haven't really seen an alternative
that's convincing.

> When I was working on 0d5f05cde01, I think my original idea was just
> to use the bufferedTuples[] array and always multi insert into
> partitioned tables unless something like on insert triggers stopped
> that.  The reason I ended up with all this CIM_MULTI and
> CIM_MULTI_CONDITIONAL stuff is due to Melanie finding a regression
> when the target partition changed on each tuple.  At the time I
> wondered if it might be worth trying to have multiple buffers or to
> partition the existing buffer and store tuples belonging to different
> partitions, then just flush them when they're full.  Looking at the
> patch it seems that you have added an array of slots per partition, so
> does that not kinda make all that CIM_MULTI / CIM_MULTI_CONDITIONAL
> code redundant? ... we could just flush each partition's buffer when
> it gets full.  There's not much of a need to flush the buffer when the
> partition changes since we're using a different buffer for the next
> partition anyway.

Well, there is in a way - that'd lead to pretty significant memory
usage if the tuples are a bit wider.  I think there's a separate
optimization, where we keep track of the partitions with unflushed
changes and track the buffer size across all partitions. But that seems
like a followup patch.

> I also wonder about memory usage here. If we're copying to a
> partitioned table with 10k partitions and we get over
> MAX_BUFFERED_TUPLES in a row for each partition, we'll end up with
> quite a lot of slots.

We do - but they're not actually that large. In comparison to the
catcache, relcache, partition routing etc information per partition it's
not a huge amount of memory.

I'd earlier just destroyed the slots on partition change, but that's not
exactly free either...

Greetings,

Andres Freund

pgsql-hackers by date:

From: Adrien NAYRAT
Date: 28 March 2019, 12:14:39
Subject: Re: Log a sample of transactions

From: Andrew Dunstan
Date: 28 March 2019, 12:24:57
Subject: Re: jsonpath

Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

Previous

Next