Re: [HACKERS] Slow synchronous logical replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Slow synchronous logical replication
Date
Msg-id CAD21AoDA0eMeq+NvNXTLxGMf1z0GJjdT92ASkUVcQkfin8k58g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Slow synchronous logical replication  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
On Mon, Oct 9, 2017 at 4:37 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> Thank you for explanations.
>
> On 08.10.2017 16:00, Craig Ringer wrote:
>>
>> I think it'd be helpful if you provided reproduction instructions,
>> test programs, etc, making it very clear when things are / aren't
>> related to your changes.
>
>
> It will be not so easy to provide some reproducing scenario, because
> actually it involves many components (postgres_fdw, pg_pasthman,
> pg_shardman, LR,...)
> and requires multinode installation.
> But let me try to explain what going on:
> So we have implement sharding - splitting data between several remote tables
> using pg_pathman and postgres_fdw.
> It means that insert or update of parent table  cause insert or update of
> some derived partitions which is forwarded by postgres_fdw to the
> correspondent node.
> Number of shards is significantly larger than number of nodes, i.e. for 5
> nodes we have 50 shards. Which means that at each onde we have 10 shards.
> To provide fault tolerance each shard is replicated using logical
> replication to one or more nodes. Right now we considered only redundancy
> level 1 - each shard has only one replica.
> So from each node we establish 10 logical replication channels.
>
> We want commit to wait until data is actually stored at all replicas, so we
> are using synchronous replication:
> So we set synchronous_commit option to "on" and include all ten 10
> subscriptions in synchronous_standby_names list.
>
> In this setup commit latency is very large (about 100msec and most of the
> time is actually spent in commit) and performance is very bad - pgbench
> shows about 300 TPS for optimal number of clients (about 10, for larger
> number performance is almost the same). Without logical replication at the
> same setup we get about 6000 TPS.
>
> I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr function.
> Each wal sender independently calculates minimal LSN among all synchronous
> replicas and wakeup backends waiting for this LSN. It means that transaction
> performing update of data in one shard will actually wait confirmation from
> replication channels for all shards.
> If some shard is updated rarely than other or is not updated at all (for
> example because communication channels between this node is broken), then
> all backens will stuck.
> Also all backends are competing for the single SyncRepLock, which also can
> be a contention point.
>

IIUC, I guess you meant to say that in current synchronous logical
replication a transaction has to wait for updated table data to be
replicated even on servers that don't subscribe for the table. If we
change it so that a transaction needs to wait for only the server that
are subscribing for the table it would be more efficiency, for at
least your use case.
We send at least the begin and commit data to all subscriptions and
then wait for the reply from them but can we skip to wait them, for
example, when the walsender actually didn't send any data modified by
the transaction?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
Next
From: Craig Ringer
Date:
Subject: Re: [HACKERS] Slow synchronous logical replication