Logical replication and multimaster - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Logical replication and multimaster
Date
Msg-id 565C7756.9070604@postgrespro.ru
Whole thread Raw
Responses Re: Logical replication and multimaster  (Robert Haas <robertmhaas@gmail.com>)
Re: Logical replication and multimaster  (Craig Ringer <craig@2ndquadrant.com>)
Re: Logical replication and multimaster  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Hello all,

We have implemented ACID multimaster based on logical replication and 
our DTM (distributed transaction manager) plugin.
Good news is that it works and no inconsistency is detected.
But unfortunately it is very very slow...

At standalone PostgreSQL I am able to achieve about 30000 TPS with 10 
clients performing simple depbit-credit transactions.
And with multimaster consisting of three nodes spawned at the same 
system I got about 100 (one hundred) TPS.
There are two main reasons of such awful performance:

1. Logical replication serializes all transactions:  there is single 
connection between wal-sender and receiver BGW.
2. 2PC synchronizes transaction commit at all nodes.

None of these two reasons are show stoppers themselves.
If we remove DTM and do asynchronous logical replication then 
performance of multimaster is increased to 6000 TPS
(please notice that in this test all multimaster node are spawned at the 
same system, sharing its resources,
so 6k is not bad result comparing with 30k at standalone system).
And according to 2ndquadrant results, BDR performance is very close to 
hot standby.

On the other hand our previous experiments with DTM shows only about 2 
times slowdown comparing with vanilla PostgreSQL.
But result of combining DTM and logical replication is frustrating.

I wonder if it is principle limitation of logical replication approach 
which is efficient only for asynchronous replication or it can be 
somehow tuned/extended to efficiently support synchronous replication?

We have also considered alternative approaches:
1. Statement based replication.
2. Trigger-based replication.
3. Replication using custom nodes.

In case of statement based replication it is hard to guarantee identity 
of of data at different nodes.
Approaches 2 and 3 are much harder to implement and requiring to 
"reinvent" substantial part of logical replication.
Them also require some kind of connection pool which can be used to send 
replicated transactions to the peer nodes (to avoid serialization of 
parallel transactions as in case of logical replication).

But looks like there is not so much sense in having multiple network 
connection between one pair of nodes.
It seems to be better to have one connection between nodes, but provide 
parallel execution of received transactions at destination side. But it 
seems to be also nontrivial. We have now in PostgreSQL some 
infrastructure for background works, but there is still no abstraction 
of workers pool and job queue which can provide simple way to organize 
parallel execution of some jobs. I wonder if somebody is working now on 
it or we should try to propose our solution?

Best regards,
Konstantin





pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: [patch] Proposal for \rotate in psql
Next
From: Robert Haas
Date:
Subject: Re: parallel joins, and better parallel explain