Re: Making "COPY partitioned_table FROM" faster - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Making "COPY partitioned_table FROM" faster |
Date | |
Msg-id | CAAKRu_ZvbATWE_22mMdT6ms2cEOQR7018Au12O5Q8C_xKBjigA@mail.gmail.com Whole thread Raw |
In response to | Re: Making "COPY partitioned_table FROM" faster (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Making "COPY partitioned_table FROM" faster
|
List | pgsql-hackers |
On Wed, Jul 25, 2018 at 7:24 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On patched code line 2564, there is a missing parenthesis. It might be better to remove the parentheses entirely and, while you're at it, there is a missing comma.
/*
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. (It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. (It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/
Should be
/*
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now, CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now, CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/
Regarding the performance improvement and code diff from v2 to v3:
The rearranging of the code in v3 makes sense and improves the flow of the code, however, I stared at it for awhile and couldn't quite work out in my head which part caused the significant improvement from v2.
I would have thought that a speedup from patch v2 to v3 would have come from doing multi-inserts less often when the target partition is switching a lot, however, looking at the v2 to v3 diff, I can't see how any of the changes would have caused a decrease in the number of multi-inserts given the same data and target table ddl.
So, I thinking I'm missing something. Which of the changes would cause the performance improvement from patch v2 to v3? My understanding is:
a) Made the avgTuplesPerPartChange >= 1.3 the first condition in the
test that sets leafpart_use_multi_insert.
This would short-circuit checking the other conditions when deciding how to set leafpart_use_multi_insert
b) Added an unlikely() when testing of the partition's resultRelInfo
has set to be initialised. This is only true the first time a
partition is inserted into during the COPY.
The addition of "unlikely" here seems like a good idea because there is a function call (ExecInitPartitionInfo) inside the if statement. However, would that cause such a difference in performance from v2 to v3? What would be the reason? Cache misses? Some kind of pre-evaluation of the expression?
c) Added unlikely() around lastPartitionSampleLineNo test. This will
only be true 1 in 1000, so pretty unlikely.
Even though this makes sense based on the frequency with which this condition will evaluate to true, I don't understand what the performance benefit would be for adding a compiler hint here around such a trivial calculation
d) Moved the nBufferedTuples > 0 test under the insertMethod ==
CIM_MULTI_CONDITIONAL test. It's never going to be > 0 when we're
doing CIM_SINGLE.
This is a good catch. It makes this part of the code more clear, as well. However, it doesn't seem like it would affect performance at all.
Let me know what I'm missing.
pgsql-hackers by date: