Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From David Rowley
Subject Re: COPY FROM WHEN condition
Date
Msg-id CAKJS1f91By42p=oQHOF+n-dAyVTYGYGjHK2qg74xL=5=6MC9ig@mail.gmail.com
Whole thread Raw
In response to Re: COPY FROM WHEN condition  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: COPY FROM WHEN condition  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: COPY FROM WHEN condition  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Wed, 23 Jan 2019 at 06:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> It turned out to be a tad more complex due to partitioning, because when
> we find the partitions do not match, the tuple is already allocated in
> the "current" context (be it per-tuple or batch). So we can't just free
> the whole context at that point. The old code worked around this by
> alternating two contexts, but that seems a bit too cumbersome to me, so
> the patch simply copies the tuple to the new context. That allows us to
> reset the batch context always, right after emptying the buffer. I need
> to do some benchmarking to see if the extra copy causes any regression.

I agree that fixing the problem mentioned by separating out tuple and
batch contexts is a good idea, but I disagree with removing the
alternating batch context idea.  FWIW the reason the alternating
contexts went in wasn't just for fun, it was on performance grounds.
There's a lengthy discussion in [1] about not adding any new
performance regressions to COPY. To be more specific, Peter complains
about a regression of 5% in [2].

It's pretty disheartening to see the work there being partially undone.

Here are my performance tests of with and without your change to the
memory contexts (I missed where you posted your results).

$ cat bench.pl
for (my $i=0; $i < 8912891; $i++) {
print "1\n1\n2\n2\n";
}
36a1281f86c: (with your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 26825.142 ms (00:26.825)
Time: 27202.117 ms (00:27.202)
Time: 26266.705 ms (00:26.267)

4be058fe9ec: (before your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 25645.460 ms (00:25.645)
Time: 25698.193 ms (00:25.698)
Time: 25737.251 ms (00:25.737)

So looks like your change slows this code down 4% for this test case.
That's about twice as bad as the 2.2% regression that I had to solve
for the multi-insert partition patch (of which you've removed much of
the code from)

I'd really like to see the alternating batch context code being put
back in to fix this. Of course, if you have a better idea, then we can
look into that, but just removing code that was carefully written and
benchmarked without any new benchmarks to prove that it's okay to do
so seems wrong to me.

[1]
https://www.postgresql.org/message-id/flat/CAKJS1f93DeHN%2B9RrD9jYn0iF_o89w2B%2BU8-Ao5V1kd8Cf7oSGQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/2b40468d-6be0-e795-c363-0015bc9fa747%402ndquadrant.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Delay locking partitions during query execution
Next
From: "Takashi Menjo"
Date:
Subject: RE: static global variable openLogOff in xlog.c seems no longer used