Re: COPY FROM WHEN condition - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: COPY FROM WHEN condition |
Date | |
Msg-id | CAKJS1f91By42p=oQHOF+n-dAyVTYGYGjHK2qg74xL=5=6MC9ig@mail.gmail.com Whole thread Raw |
In response to | Re: COPY FROM WHEN condition (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: COPY FROM WHEN condition
Re: COPY FROM WHEN condition |
List | pgsql-hackers |
On Wed, 23 Jan 2019 at 06:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > It turned out to be a tad more complex due to partitioning, because when > we find the partitions do not match, the tuple is already allocated in > the "current" context (be it per-tuple or batch). So we can't just free > the whole context at that point. The old code worked around this by > alternating two contexts, but that seems a bit too cumbersome to me, so > the patch simply copies the tuple to the new context. That allows us to > reset the batch context always, right after emptying the buffer. I need > to do some benchmarking to see if the extra copy causes any regression. I agree that fixing the problem mentioned by separating out tuple and batch contexts is a good idea, but I disagree with removing the alternating batch context idea. FWIW the reason the alternating contexts went in wasn't just for fun, it was on performance grounds. There's a lengthy discussion in [1] about not adding any new performance regressions to COPY. To be more specific, Peter complains about a regression of 5% in [2]. It's pretty disheartening to see the work there being partially undone. Here are my performance tests of with and without your change to the memory contexts (I missed where you posted your results). $ cat bench.pl for (my $i=0; $i < 8912891; $i++) { print "1\n1\n2\n2\n"; } 36a1281f86c: (with your change) postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|'; COPY 35651564 Time: 26825.142 ms (00:26.825) Time: 27202.117 ms (00:27.202) Time: 26266.705 ms (00:26.267) 4be058fe9ec: (before your change) postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|'; COPY 35651564 Time: 25645.460 ms (00:25.645) Time: 25698.193 ms (00:25.698) Time: 25737.251 ms (00:25.737) So looks like your change slows this code down 4% for this test case. That's about twice as bad as the 2.2% regression that I had to solve for the multi-insert partition patch (of which you've removed much of the code from) I'd really like to see the alternating batch context code being put back in to fix this. Of course, if you have a better idea, then we can look into that, but just removing code that was carefully written and benchmarked without any new benchmarks to prove that it's okay to do so seems wrong to me. [1] https://www.postgresql.org/message-id/flat/CAKJS1f93DeHN%2B9RrD9jYn0iF_o89w2B%2BU8-Ao5V1kd8Cf7oSGQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/2b40468d-6be0-e795-c363-0015bc9fa747%402ndquadrant.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: