Re: [HACKERS] Slow synchronous logical replication - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [HACKERS] Slow synchronous logical replication
Date
Msg-id 4e8ad98c-7399-7795-dec9-07952952abb1@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Slow synchronous logical replication  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: [HACKERS] Slow synchronous logical replication
Re: [HACKERS] Slow synchronous logical replication
Re: [HACKERS] Slow synchronous logical replication
List pgsql-hackers
Thank you for explanations.

On 08.10.2017 16:00, Craig Ringer wrote:
> I think it'd be helpful if you provided reproduction instructions,
> test programs, etc, making it very clear when things are / aren't
> related to your changes.

It will be not so easy to provide some reproducing scenario, because 
actually it involves many components (postgres_fdw, pg_pasthman, 
pg_shardman, LR,...)
and requires multinode installation.
But let me try to explain what going on:
So we have implement sharding - splitting data between several remote 
tables using pg_pathman and postgres_fdw.
It means that insert or update of parent table  cause insert or update 
of some derived partitions which is forwarded by postgres_fdw to the 
correspondent node.
Number of shards is significantly larger than number of nodes, i.e. for 
5 nodes we have 50 shards. Which means that at each onde we have 10 shards.
To provide fault tolerance each shard is replicated using logical 
replication to one or more nodes. Right now we considered only 
redundancy level 1 - each shard has only one replica.
So from each node we establish 10 logical replication channels.

We want commit to wait until data is actually stored at all replicas, so 
we are using synchronous replication:
So we set synchronous_commit option to "on" and include all ten 10 
subscriptions in synchronous_standby_names list.

In this setup commit latency is very large (about 100msec and most of 
the time is actually spent in commit) and performance is very bad - 
pgbench shows about 300 TPS for optimal number of clients (about 10, for 
larger number performance is almost the same). Without logical 
replication at the same setup we get about 6000 TPS.

I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr 
function. Each wal sender independently calculates minimal LSN among all 
synchronous replicas and wakeup backends waiting for this LSN. It means 
that transaction performing update of data in one shard will actually 
wait confirmation from replication channels for all shards.
If some shard is updated rarely than other or is not updated at all (for 
example because communication channels between this node is broken), 
then all backens will stuck.
Also all backends are competing for the single SyncRepLock, which also 
can be a contention point.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: "Wood, Dan"
Date:
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] parallelize queries containing initplans