Re: Logical replication and multimaster - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Logical replication and multimaster |
Date | |
Msg-id | 565F5208.3070100@postgrespro.ru Whole thread Raw |
In response to | Re: Logical replication and multimaster (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Logical replication and multimaster
|
List | pgsql-hackers |
<div class="moz-cite-prefix">Thank you for reply.<br /><br /> On 12/02/2015 08:30 PM, Robert Haas wrote:<br /></div><blockquotecite="mid:CA+TgmoY1o3G0B-21zv2Pw5iEkpR8=J42GdsUOs4m0inKka3FEA@mail.gmail.com" type="cite"><br /><pre wrap=""> Logical decoding only begins decoding a transaction once the transaction is complete. So I would guess that the sequence of operations here is something like this - correct me if I'm wrong: 1. Do the transaction. 2. PREPARE. 3. Replay the transaction. 4. PREPARE the replay. 5. COMMIT PREPARED on original machine. 6. COMMIT PREPARED on replica. </pre></blockquote><br /> Logical decoding is started after execution of XLogFlush method.<br /> So atually transaction isnot yet completed at this moment:<br /> - it is not marked as committed in clog<br /> - It is marked as in-progress inprocarray<br /> - locks are not released<br /><br /> We are not using PostgreSQL two-phase commit here.<br /> Instead ofour DTM catches control in TransactionIdCommitTree and sends request to arbiter which in turn wait status of committingtransactions on replicas.<br /> The problem is that transactions are delivered to replica through single channel:logical replication slot.<br /> And while such transaction is waiting acknowledgement from arbiter, it is blockingreplication channel preventing other (parallel transactions) from been replicated and applied.<br /><br /> I haveimplemented pool of background workers. May be it will be useful not only for me.<br /> It consists of one produces-multipleconsumers queue implemented using buffer in shared memory, spinlock and two semaphores.<br /> API is verysimple:<br /><br /> typedef void(*BgwPoolExecutor)(int id, void* work, size_t size);<br /> typedef BgwPool*(*BgwPoolConstructor)(void);<br/><br /> extern void BgwPoolStart(int nWorkers, BgwPoolConstructor constructor);<br/> extern void BgwPoolInit(BgwPool* pool, BgwPoolExecutor executor, char const* dbname, size_t queueSize);<br/> extern void BgwPoolExecute(BgwPool* pool, void* work, size_t size);<br /><br /> You just place in this queuesome bulk of bytes (work, size), it is placed in queue and then first available worker will dequeue it and execute.<br/><br /> Using this pool and larger number of accounts (reducing possibility of conflict), I get better results.<br/> So now receiver of logical replication is not executing transactions directly, instead of it receiver is placingthem in queue and them are executed concurrent by pool of background workers.<br /><br /> At cluster with three nodesresults of out debit-credit benchmark are the following:<br /><br /><table border="1" cellpadding="2" cellspacing="2"height="112" width="366"><tbody><tr><td valign="top"><br /></td><td valign="top">TPS<br /></td></tr><tr><tdvalign="top">Multimaster (ACID transactions)<br /></td><td align="right" valign="top">12500<br /></td></tr><tr><tdvalign="top">Multimaster (async replication)<br /></td><td align="right" valign="top">34800<br /></td></tr><tr><tdvalign="top">Standalone PostgreSQL<br /></td><td align="right" valign="top">44000<br /></td></tr></tbody></table><br/><br /> We tested two modes: when client randomly distribute queries between cluster nodesand when client is working only with one master nodes and other are just used as replicas. Performance is slightly betterin the second case, but the difference is not very large (about 11000 TPS in first case).<br /><br /> Number of workersin pool has signficant imact on performance: with 8 workers we get about 7800 TPS and with 16 workers - 12500.<br/> Also performance greatly depends on number of accounts (and so probability of lock conflicts). In case of 100accounts speed is less than 1000 TPS.<br /><br /><br /><blockquote cite="mid:CA+TgmoY1o3G0B-21zv2Pw5iEkpR8=J42GdsUOs4m0inKka3FEA@mail.gmail.com"type="cite"><pre wrap=""> Step 3 introduces latency proportional to the amount of work the transaction did, which could be a lot. If you were doing synchronous physical replication, the replay of the COMMIT record would only need to wait for the replay of the commit record itself. But with synchronous logical replication, you've got to wait for the replay of the entire transaction. That's a major bummer, especially if replay is single-threaded and there a large number of backends generating transactions. Of course, the 2PC dance itself can also add latency - that's most likely to be the issue if the transactions are each very short. What I'd suggest is trying to measure where the latency is coming from. You should be able to measure how much time each transaction spends (a) executing, (b) preparing itself, (c) waiting for the replay thread to begin replaying it, (d) waiting for the replay thread to finish replaying it, and (e) committing. Separating (c) and (d) might be a little bit tricky, but I bet it's worth putting some effort in, because the answer is probably important to understanding what sort of change will help here. If (c) is the problem, you might be able to get around it by having multiple processes, though that only helps if applying is slower than decoding. But if (d) is the problem, then the only solution is probably to begin applying the transaction speculatively before it's prepared/committed. I think. </pre></blockquote><br />
pgsql-hackers by date: