Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: COPY FROM WHEN condition
Date
Msg-id b5fe1ea4-5007-2183-bd11-a8210cca3cd8@2ndquadrant.com
Whole thread Raw
In response to Re: COPY FROM WHEN condition  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: COPY FROM WHEN condition  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers

On 1/29/19 8:18 AM, David Rowley wrote:
> On Wed, 23 Jan 2019 at 06:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> It turned out to be a tad more complex due to partitioning, because when
>> we find the partitions do not match, the tuple is already allocated in
>> the "current" context (be it per-tuple or batch). So we can't just free
>> the whole context at that point. The old code worked around this by
>> alternating two contexts, but that seems a bit too cumbersome to me, so
>> the patch simply copies the tuple to the new context. That allows us to
>> reset the batch context always, right after emptying the buffer. I need
>> to do some benchmarking to see if the extra copy causes any regression.
> 
> I agree that fixing the problem mentioned by separating out tuple and
> batch contexts is a good idea, but I disagree with removing the
> alternating batch context idea.  FWIW the reason the alternating
> contexts went in wasn't just for fun, it was on performance grounds.
> There's a lengthy discussion in [1] about not adding any new
> performance regressions to COPY. To be more specific, Peter complains
> about a regression of 5% in [2].
> 
> It's pretty disheartening to see the work there being partially undone.
> 

Whoops :-(

> Here are my performance tests of with and without your change to the
> memory contexts (I missed where you posted your results).
> 
> $ cat bench.pl
> for (my $i=0; $i < 8912891; $i++) {
> print "1\n1\n2\n2\n";
> }
> 36a1281f86c: (with your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 26825.142 ms (00:26.825)
> Time: 27202.117 ms (00:27.202)
> Time: 26266.705 ms (00:26.267)
> 
> 4be058fe9ec: (before your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 25645.460 ms (00:25.645)
> Time: 25698.193 ms (00:25.698)
> Time: 25737.251 ms (00:25.737)
> 
> So looks like your change slows this code down 4% for this test case.
> That's about twice as bad as the 2.2% regression that I had to solve
> for the multi-insert partition patch (of which you've removed much of
> the code from)
> 
> I'd really like to see the alternating batch context code being put
> back in to fix this. Of course, if you have a better idea, then we can
> look into that, but just removing code that was carefully written and
> benchmarked without any new benchmarks to prove that it's okay to do
> so seems wrong to me.
> 

Yeah, that's not very nice. Do you suggest to revert 36a1281f86c0f, or
shall we try to massage it a bit first? ISTM we could simply create two
batch memory contexts and alternate those, just like with the expression
contexts before.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: pg_basebackup, walreceiver and wal_sender_timeout
Next
From: Amit Langote
Date:
Subject: Re: speeding up planning with partitions