Re: Logical replication and multimaster - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Logical replication and multimaster |
Date | |
Msg-id | CA+TgmoY1o3G0B-21zv2Pw5iEkpR8=J42GdsUOs4m0inKka3FEA@mail.gmail.com Whole thread Raw |
In response to | Logical replication and multimaster (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: Logical replication and multimaster
Re: Logical replication and multimaster |
List | pgsql-hackers |
On Mon, Nov 30, 2015 at 11:20 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > We have implemented ACID multimaster based on logical replication and our > DTM (distributed transaction manager) plugin. > Good news is that it works and no inconsistency is detected. > But unfortunately it is very very slow... > > At standalone PostgreSQL I am able to achieve about 30000 TPS with 10 > clients performing simple depbit-credit transactions. > And with multimaster consisting of three nodes spawned at the same system I > got about 100 (one hundred) TPS. > There are two main reasons of such awful performance: > > 1. Logical replication serializes all transactions: there is single > connection between wal-sender and receiver BGW. > 2. 2PC synchronizes transaction commit at all nodes. > > None of these two reasons are show stoppers themselves. > If we remove DTM and do asynchronous logical replication then performance of > multimaster is increased to 6000 TPS > (please notice that in this test all multimaster node are spawned at the > same system, sharing its resources, > so 6k is not bad result comparing with 30k at standalone system). > And according to 2ndquadrant results, BDR performance is very close to hot > standby. Logical decoding only begins decoding a transaction once the transaction is complete. So I would guess that the sequence of operations here is something like this - correct me if I'm wrong: 1. Do the transaction. 2. PREPARE. 3. Replay the transaction. 4. PREPARE the replay. 5. COMMIT PREPARED on original machine. 6. COMMIT PREPARED on replica. Step 3 introduces latency proportional to the amount of work the transaction did, which could be a lot. If you were doing synchronous physical replication, the replay of the COMMIT record would only need to wait for the replay of the commit record itself. But with synchronous logical replication, you've got to wait for the replay of the entire transaction. That's a major bummer, especially if replay is single-threaded and there a large number of backends generating transactions. Of course, the 2PC dance itself can also add latency - that's most likely to be the issue if the transactions are each very short. What I'd suggest is trying to measure where the latency is coming from. You should be able to measure how much time each transaction spends (a) executing, (b) preparing itself, (c) waiting for the replay thread to begin replaying it, (d) waiting for the replay thread to finish replaying it, and (e) committing. Separating (c) and (d) might be a little bit tricky, but I bet it's worth putting some effort in, because the answer is probably important to understanding what sort of change will help here. If (c) is the problem, you might be able to get around it by having multiple processes, though that only helps if applying is slower than decoding. But if (d) is the problem, then the only solution is probably to begin applying the transaction speculatively before it's prepared/committed. I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: