INSERT performance: less CPU when no indexes or triggers - Mailing list pgsql-hackers
From | Adam S |
---|---|
Subject | INSERT performance: less CPU when no indexes or triggers |
Date | |
Msg-id | CALWCfdKCPiqrrqd+qK+xDi=9ERCXQ+RQeq9sX48+59PB3eTe0w@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
I've been thinking about INSERT performance and noticed that copyfrom.c (COPY FROM) performs ~4 unnecessary pointer-deferences per record in the case when there's no indexes and no AFTER ROW INSERT triggers (i.e. when you just want to load data really fast!).
I moved the for-loop inside the per-batch if-checks and got a little speedup. Obviously, this only matters for CPU-bound INSERTs with very narrow tables - if there's other overhead (including parsing), this gain disappears into the noise. I'm not a regular contributor, apologies in advance if I got something wrong, and no worries if this is too small to bother. My patch below passes "make check". I'll of course post other wins as I find them, but this one seemed easy.
My reference test comes from a conversation on HN ( https://news.ycombinator.com/item?id=38864213 ) loading 100M tiny records from COPY TO ... BINARY on a GCP c2d-standard-8: https://gcloud-compute.com/c2-standard-8.html (8 vCPU, 32GB, network SSD).
time sh -c "echo \"drop table if exists tbl; create unlogged table tbl(city int2, temp int2);copy tbl FROM '/home/asah/citydata.bin' binary;\" | ./pg/bin/postgres --single -D tmp -p 9999 postgres";
results from 3 runs:
real 0m26.488s, user 0m14.745s, sys 0m3.299s
real 0m28.978s, user 0m14.010s, sys 0m3.288s
real 0m28.920s, user 0m14.028s, sys 0m3.201s
==>
real 0m24.483s, user 0m13.280s, sys 0m3.305s
real 0m28.668s, user 0m13.095s, sys 0m3.501s
patch: (passes "make check" - feel free to drop/replace my comments of course)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c..d3783678e0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -421,13 +421,14 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
buffer->bistate);
MemoryContextSwitchTo(oldcontext);
- for (i = 0; i < nused; i++)
- {
/*
* If there are any indexes, update them for all the inserted
* tuples, and run AFTER ROW INSERT triggers.
*/
if (resultRelInfo->ri_NumIndices > 0)
+ {
+ /* expensive inner loop hidden by if-check */
+ for (i = 0; i < nused; i++)
{
List *recheckIndexes;
@@ -441,6 +442,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
cstate->transition_capture);
list_free(recheckIndexes);
}
+ }
/*
* There's no indexes, but see if we need to run AFTER ROW INSERT
@@ -449,15 +451,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
else if (resultRelInfo->ri_TrigDesc != NULL &&
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+ {
+ /* expensive inner loop hidden by if-check */
+ for (i = 0; i < nused; i++)
{
cstate->cur_lineno = buffer->linenos[i];
ExecARInsertTriggers(estate, resultRelInfo,
slots[i], NIL,
cstate->transition_capture);
- }
ExecClearTuple(slots[i]);
}
+ }
/* Update the row counter and progress of the COPY command */
*processed += nused;
My reference test comes from a conversation on HN ( https://news.ycombinator.com/item?id=38864213 ) loading 100M tiny records from COPY TO ... BINARY on a GCP c2d-standard-8: https://gcloud-compute.com/c2-standard-8.html (8 vCPU, 32GB, network SSD).
time sh -c "echo \"drop table if exists tbl; create unlogged table tbl(city int2, temp int2);copy tbl FROM '/home/asah/citydata.bin' binary;\" | ./pg/bin/postgres --single -D tmp -p 9999 postgres";
results from 3 runs:
real 0m26.488s, user 0m14.745s, sys 0m3.299s
real 0m28.978s, user 0m14.010s, sys 0m3.288s
real 0m28.920s, user 0m14.028s, sys 0m3.201s
==>
real 0m24.483s, user 0m13.280s, sys 0m3.305s
real 0m28.668s, user 0m13.095s, sys 0m3.501s
real 0m28.306s, user 0m13.032s, sys 0m3.505s
On my mac m1 air,
On my mac m1 air,
real 0m11.922s, user 0m10.220s, sys 0m1.302s
real 0m12.761s, user 0m10.137s, sys 0m1.401s
real 0m12.734s, user 0m10.146s, sys 0m1.376s
==>
real 0m12.173s, user 0m9.785s, sys 0m1.221s
real 0m12.462s, user 0m9.691s, sys 0m1.393s
real 0m12.266s, user 0m9.719s, sys 0m1.390s
real 0m12.761s, user 0m10.137s, sys 0m1.401s
real 0m12.734s, user 0m10.146s, sys 0m1.376s
==>
real 0m12.173s, user 0m9.785s, sys 0m1.221s
real 0m12.462s, user 0m9.691s, sys 0m1.393s
real 0m12.266s, user 0m9.719s, sys 0m1.390s
patch: (passes "make check" - feel free to drop/replace my comments of course)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c..d3783678e0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -421,13 +421,14 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
buffer->bistate);
MemoryContextSwitchTo(oldcontext);
- for (i = 0; i < nused; i++)
- {
/*
* If there are any indexes, update them for all the inserted
* tuples, and run AFTER ROW INSERT triggers.
*/
if (resultRelInfo->ri_NumIndices > 0)
+ {
+ /* expensive inner loop hidden by if-check */
+ for (i = 0; i < nused; i++)
{
List *recheckIndexes;
@@ -441,6 +442,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
cstate->transition_capture);
list_free(recheckIndexes);
}
+ }
/*
* There's no indexes, but see if we need to run AFTER ROW INSERT
@@ -449,15 +451,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
else if (resultRelInfo->ri_TrigDesc != NULL &&
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+ {
+ /* expensive inner loop hidden by if-check */
+ for (i = 0; i < nused; i++)
{
cstate->cur_lineno = buffer->linenos[i];
ExecARInsertTriggers(estate, resultRelInfo,
slots[i], NIL,
cstate->transition_capture);
- }
ExecClearTuple(slots[i]);
}
+ }
/* Update the row counter and progress of the COPY command */
*processed += nused;
hope this helps,
adam
pgsql-hackers by date: