Re: Synchronous replication patch built on SR - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Synchronous replication patch built on SR |
Date | |
Msg-id | 201004302057.o3UKvWj26902@momjian.us Whole thread Raw |
In response to | Synchronous replication patch built on SR (zb@cybertec.at) |
Responses |
Re: Synchronous replication patch built on SR
|
List | pgsql-hackers |
Please add it to the next commit-fest: https://commitfest.postgresql.org/action/commitfest_view/inprogress --------------------------------------------------------------------------- zb@cybertec.at wrote: > Resending, my ISP lost my mail yesterday. :-( > > =========================================================== > > Hi, > > attached is a patch that does $SUBJECT, we are submitting it for 9.1. > I have updated it to today's CVS after the "wal_level" GUC went in. > > How does it work? > > First, the walreceiver and the walsender are now able to communicate > in a duplex way on the same connection, so while COPY OUT is > in progress from the primary server, the standby server is able to > issue PQputCopyData() to pass the transaction IDs that were seen > with XLOG_XACT_COMMIT or XLOG_XACT_PREPARE > signatures. I did by adding a new protocol message type, with letter > 'x' that's only acknowledged by the walsender process. The regular > backend was intentionally unchanged so an SQL client gets a protocol > error. A new libpq call called PQsetDuplexCopy() which sends this > new message before sending START_REPLICATION. The primary > makes a note of it in the walsender process' entry. > > I had to move the TransactionIdLatest(xid, nchildren, children) call > that computes latestXid earlier in RecordTransactionCommit(), so > it's in the critical section now, just before the > XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT, rdata) > call. Otherwise, there was a race condition between the primary > and the standby server, where the standby server might have seen > the XLOG_XACT_COMMIT record for some XIDs before the > transaction in the primary server marked itself waiting for this XID, > resulting in stuck transactions. > > I have added 3 new options, two GUCs in postgresql.conf and one > setting in recovery.conf. These options are: > > 1. min_sync_replication_clients = N > > where N is the number of reports for a given transaction before it's > released as committed synchronously. 0 means completely asynchronous, > the value is maximized by the value of max_wal_senders. Anything > in between 0 and max_wal_senders means different levels of partially > synchronous replication. > > 2. strict_sync_replication = boolean > > where the expected number of synchronous reports from standby > servers is further limited to the actual number of connected synchronous > standby servers if the value of this GUC is false. This means that if > no standby servers are connected yet then the replication is asynchronous > and transactions are allowed to finish without waiting for synchronous > reports. If the value of this GUC is true, then transactions wait until > enough synchronous standbys connect and report back. > > 3. synchronous_slave = boolean (in recovery.conf) > > this instructs the standby server to tell the primary that it's a > synchronous > replication server and it will send the committed XIDs back to the primary. > > I also added a contrib module for monitoring the synchronous replication > but it abuses the procarray.c code by exposing the procArray pointer > which is ugly. It's either need to be abandoned or moved to core if or when > this code is discussed enough. :-) > > Best regards, > Zolt?n B?sz?rm?nyi [ Attachment, skipping... ] > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
pgsql-hackers by date: