Re: Speeding up INSERTs and UPDATEs to partitioned tables - Mailing list pgsql-hackers

From David Rowley
Subject Re: Speeding up INSERTs and UPDATEs to partitioned tables
Date
Msg-id CAKJS1f9BdyEXa52+coxQQsmcwmTb8LdiVL2HD77zG=efyVk4MA@mail.gmail.com
Whole thread Raw
In response to Re: Speeding up INSERTs and UPDATEs to partitioned tables  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On 27 July 2018 at 04:19, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've attached a delta of the changes I made since your v2 delta and
> also a complete updated patch.

I did a very quick performance test of this patch on an AWS m5d.large
instance with fsync=off.

The test setup is the same as is described in my initial email on this thread.

The test compares the performance of INSERTs into a partitioned table
with 10k partitions compared to a non-partitioned table.

Patched with v2 patch on master@39d51fe87

-- partitioned
$ pgbench -n -T 60 -f partbench_insert.sql postgres
transaction type: partbench_insert.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1063764
latency average = 0.056 ms
tps = 17729.375930 (including connections establishing)
tps = 17729.855215 (excluding connections establishing)

-- non-partitioned
$ pgbench -n -T 60 -f partbench__insert.sql postgres
transaction type: partbench__insert.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1147273
latency average = 0.052 ms
tps = 19121.194366 (including connections establishing)
tps = 19121.695469 (excluding connections establishing)

Here we're within 92% of the non-partitioned performance.

Looking back at the first email in this thread where I tested the v1
patch, we were within 82% with:

-- partitioned
tps = 11001.602377 (excluding connections establishing)

-- non-partitioned
tps = 13354.656163 (excluding connections establishing)

Again, same as with the v1 test, the v2 test was done with the locking
of all partitions removed with:

diff --git a/src/backend/executor/execPartition.c
b/src/backend/executor/execPartition.c
index d7b18f52ed..6223c62094 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -80,9 +80,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState
*mtstate, Relation rel)
  PartitionTupleRouting *proute;
  ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;

- /* Lock all the partitions. */
- (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL);
-
  /*
  * Here we attempt to expend as little effort as possible in setting up
  * the PartitionTupleRouting.  Each partition's ResultRelInfo is built
@@ -442,7 +439,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
  * We locked all the partitions in ExecSetupPartitionTupleRouting
  * including the leaf partitions.
  */
- partrel = heap_open(partoid, NoLock);
+ partrel = heap_open(partoid, RowExclusiveLock);

  /*
  * Keep ResultRelInfo and other information for this partition in the

Again, the reduce locking is not meant for commit for this patch.
Changing the locking will require a discussion on its own thread.

And just for fun, the unpatched performance on the partitioned table:

ubuntu@ip-10-0-0-33:~$ pgbench -n -T 60 -f partbench_insert.sql postgres
transaction type: partbench_insert.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 5751
latency average = 10.434 ms
tps = 95.836052 (including connections establishing)
tps = 95.838490 (excluding connections establishing)

(185x increase)

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Upper limit arguments of pg_logical_slot_xxx_changes functionsaccept invalid values
Next
From: Michael Paquier
Date:
Subject: Re: Temporary tables prevent autovacuum, leading to XID wraparound