Home > mailing lists

Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: COPY FROM WHEN condition
Date	January 29, 2019 07:18:46
Msg-id	CAKJS1f91By42p=oQHOF+n-dAyVTYGYGjHK2qg74xL=5=6MC9ig@mail.gmail.com Whole thread
In response to	Re: COPY FROM WHEN condition (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: COPY FROM WHEN condition Re: COPY FROM WHEN condition
List	pgsql-hackers

Tree view

On Wed, 23 Jan 2019 at 06:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> It turned out to be a tad more complex due to partitioning, because when
> we find the partitions do not match, the tuple is already allocated in
> the "current" context (be it per-tuple or batch). So we can't just free
> the whole context at that point. The old code worked around this by
> alternating two contexts, but that seems a bit too cumbersome to me, so
> the patch simply copies the tuple to the new context. That allows us to
> reset the batch context always, right after emptying the buffer. I need
> to do some benchmarking to see if the extra copy causes any regression.

I agree that fixing the problem mentioned by separating out tuple and
batch contexts is a good idea, but I disagree with removing the
alternating batch context idea.  FWIW the reason the alternating
contexts went in wasn't just for fun, it was on performance grounds.
There's a lengthy discussion in [1] about not adding any new
performance regressions to COPY. To be more specific, Peter complains
about a regression of 5% in [2].

It's pretty disheartening to see the work there being partially undone.

Here are my performance tests of with and without your change to the
memory contexts (I missed where you posted your results).

$ cat bench.pl
for (my $i=0; $i < 8912891; $i++) {
print "1\n1\n2\n2\n";
}
36a1281f86c: (with your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 26825.142 ms (00:26.825)
Time: 27202.117 ms (00:27.202)
Time: 26266.705 ms (00:26.267)

4be058fe9ec: (before your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 25645.460 ms (00:25.645)
Time: 25698.193 ms (00:25.698)
Time: 25737.251 ms (00:25.737)

So looks like your change slows this code down 4% for this test case.
That's about twice as bad as the 2.2% regression that I had to solve
for the multi-insert partition patch (of which you've removed much of
the code from)

I'd really like to see the alternating batch context code being put
back in to fix this. Of course, if you have a better idea, then we can
look into that, but just removing code that was carefully written and
benchmarked without any new benchmarks to prove that it's okay to do
so seems wrong to me.

[1]
https://www.postgresql.org/message-id/flat/CAKJS1f93DeHN%2B9RrD9jYn0iF_o89w2B%2BU8-Ao5V1kd8Cf7oSGQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/2b40468d-6be0-e795-c363-0015bc9fa747%402ndquadrant.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Amit Langote
Date: 29 January 2019, 06:42:43
Subject: Re: Delay locking partitions during query execution

From: "Takashi Menjo"
Date: 29 January 2019, 08:46:07
Subject: RE: static global variable openLogOff in xlog.c seems no longer used

Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

Previous

Next