Re: Synchronous replication patch built on SR - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Synchronous replication patch built on SR |
Date | |
Msg-id | AANLkTinaX876sBFA84EdbjN3cm7xe8848W8C9dn3wfg4@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronous replication patch built on SR (Boszormenyi Zoltan <zb@cybertec.at>) |
Responses |
Re: Synchronous replication patch built on SR
|
List | pgsql-hackers |
Thanks for your reply! On Fri, May 14, 2010 at 10:33 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote: >> In your design, the transaction commit on the master waits for its XID >> to be read from the XLOG_XACT_COMMIT record and replied by the standby. >> Right? This design seems not to be extensible to #2 and #3 since >> walreceiver cannot read XID from the XLOG_XACT_COMMIT record. > > Yes, this was my problem, too. I would have had to > implement a custom interpreter into walreceiver to > process the WAL records and extract the XIDs. Isn't reading the same WAL twice (by walreceiver and startup process) inefficient? In synchronous replication, the overhead of walreceiver directly affects the performance of the master. We should not assign such a hard work to walreceiver, I think. > But at least the supporting details, i.e. not opening another > connection, instead being able to do duplex COPY operations in > a server-acknowledged way is acceptable, no? :-) Though I might not understand your point (sorry), it's OK for the standby to send the reply to the master by using CopyData message. Currently PQputCopyData() cannot be executed in COPY OUT, but we can relax that. >> How about >> using LSN instead of XID? That is, the transaction commit waits until >> the standby has reached its LSN. LSN is more easy-used for walreceiver >> and startup process, I think. >> > > Indeed, using the LSN seems to be more appropriate for > the walreceiver, but how would you extract the information > that a certain LSN means a COMMITted transaction? Or > we could release a locked transaction in case the master receives > an LSN greater than or equal to the transaction's own LSN? Yep, we can ensure that the transaction has been replicated by comparing its own LSN with the smallest LSN in the latest LSNs of each connected "synchronous" standby. > Sending back all the LSNs in case of long transactions would > increase the network traffic compared to sending back only the > XIDs, but the amount is not clear for me. What I am more > worried about is the contention on the ProcArrayLock. > XIDs are rarer then LSNs, no? No. For example, when WAL data sent by walsender at a time has two XLOG_XACT_COMMIT records, in XID approach, walreceiver would need to send two replies. OTOH, in LSN approach, only one reply which indicates the last received location would need to be sent. >> What if the "synchronous" standby starts up from the very old backup? >> The transaction on the master needs to wait until a large amount of >> outstanding WAL has been applied? I think that synchronous replication >> should start with *asynchronous* replication, and should switch to the >> sync level after the gap between servers has become enough small. >> What's your opinion? >> > > It's certainly one option, which I think partly addressed > with the "strict_sync_replication" knob below. > If strict_sync_replication = off, then the master doesn't make > its transactions wait for the synchronous reports, and the client(s) > can work through their WALs. IIRC, the walreceiver connects > to the master only very late in the recovery process, no? No, the master might have a large number of WAL files which the standby doesn't have. >>> I have added 3 new options, two GUCs in postgresql.conf and one >>> setting in recovery.conf. These options are: >>> >>> 1. min_sync_replication_clients = N >>> >>> where N is the number of reports for a given transaction before it's >>> released as committed synchronously. 0 means completely asynchronous, >>> the value is maximized by the value of max_wal_senders. Anything >>> in between 0 and max_wal_senders means different levels of partially >>> synchronous replication. >>> >>> 2. strict_sync_replication = boolean >>> >>> where the expected number of synchronous reports from standby >>> servers is further limited to the actual number of connected synchronous >>> standby servers if the value of this GUC is false. This means that if >>> no standby servers are connected yet then the replication is asynchronous >>> and transactions are allowed to finish without waiting for synchronous >>> reports. If the value of this GUC is true, then transactions wait until >>> enough synchronous standbys connect and report back. >>> >> >> Why are these options necessary? >> >> Can these options cover more than three synchronization levels? >> > > I think I explained it in my mail. > > If min_sync_replication_clients == 0, then the replication is async. > If min_sync_replication_clients == max_wal_senders then the > replication is fully synchronous. > If 0 < min_sync_replication_clients < max_wal_senders then > the replication is partially synchronous, i.e. the master can wait > only for say, 50% of the clients to report back before it's considered > synchronous and the relevant transactions get released from the wait. Seems s/min_sync_replication_clients/max_sync_replication_clients min_sync_replication_clients is required to prevent outside attacker from connecting to the master as "synchronous" standby, and degrading the performance on the master? Other usecase? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: