Re: Synchronous replication patch built on SR - Mailing list pgsql-hackers
From | Boszormenyi Zoltan |
---|---|
Subject | Re: Synchronous replication patch built on SR |
Date | |
Msg-id | 4BF3A442.4010504@cybertec.at Whole thread Raw |
In response to | Re: Synchronous replication patch built on SR (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Synchronous replication patch built on SR
|
List | pgsql-hackers |
Fujii Masao írta: > Thanks for your reply! > > On Fri, May 14, 2010 at 10:33 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > >>> In your design, the transaction commit on the master waits for its XID >>> to be read from the XLOG_XACT_COMMIT record and replied by the standby. >>> Right? This design seems not to be extensible to #2 and #3 since >>> walreceiver cannot read XID from the XLOG_XACT_COMMIT record. >>> >> Yes, this was my problem, too. I would have had to >> implement a custom interpreter into walreceiver to >> process the WAL records and extract the XIDs. >> > > Isn't reading the same WAL twice (by walreceiver and startup process) > inefficient? Yes, and I didn't implement that because it's inefficient. I implemented a minimal communication between StartupXLOG() and the walreceiver. > In synchronous replication, the overhead of walreceiver > directly affects the performance of the master. We should not assign > such a hard work to walreceiver, I think. > Exactly. >> But at least the supporting details, i.e. not opening another >> connection, instead being able to do duplex COPY operations in >> a server-acknowledged way is acceptable, no? :-) >> > > Though I might not understand your point (sorry), it's OK for the standby > to send the reply to the master by using CopyData message. I thought about the same. > Currently > PQputCopyData() cannot be executed in COPY OUT, but we can relax > that. > And I implemented just that, in a way that upon walreceiver startup it sends a new protocol message to the walsender by calling PQsetDuplexCopy() (see my patch) and the walsender response is ACK. This protocol message is intentionally not handled by the normal backend, so plain libpq clients cannot mess up their COPY streams. >>> How about >>> using LSN instead of XID? That is, the transaction commit waits until >>> the standby has reached its LSN. LSN is more easy-used for walreceiver >>> and startup process, I think. >>> >>> >> Indeed, using the LSN seems to be more appropriate for >> the walreceiver, but how would you extract the information >> that a certain LSN means a COMMITted transaction? Or >> we could release a locked transaction in case the master receives >> an LSN greater than or equal to the transaction's own LSN? >> > > Yep, we can ensure that the transaction has been replicated by > comparing its own LSN with the smallest LSN in the latest LSNs > of each connected "synchronous" standby. > > >> Sending back all the LSNs in case of long transactions would >> increase the network traffic compared to sending back only the >> XIDs, but the amount is not clear for me. What I am more >> worried about is the contention on the ProcArrayLock. >> XIDs are rarer then LSNs, no? >> > > No. For example, when WAL data sent by walsender at a time > has two XLOG_XACT_COMMIT records, in XID approach, walreceiver > would need to send two replies. OTOH, in LSN approach, only > one reply which indicates the last received location would > need to be sent. > I see. >>> What if the "synchronous" standby starts up from the very old backup? >>> The transaction on the master needs to wait until a large amount of >>> outstanding WAL has been applied? I think that synchronous replication >>> should start with *asynchronous* replication, and should switch to the >>> sync level after the gap between servers has become enough small. >>> What's your opinion? >>> >>> >> It's certainly one option, which I think partly addressed >> with the "strict_sync_replication" knob below. >> If strict_sync_replication = off, then the master doesn't make >> its transactions wait for the synchronous reports, and the client(s) >> can work through their WALs. IIRC, the walreceiver connects >> to the master only very late in the recovery process, no? >> > > No, the master might have a large number of WAL files which > the standby doesn't have. > We can change the walreceiver so it sends similarly encapsulated messages as the walsender does. In our patch, the walreceiver currently sends the raw XIDs. If we add a minimal protocol encapsulation, we can distinguish between the XIDs (or later LSNs) and the "mark me synchronous from now on" message. The only problem is: what should be the point when such a client becomes synchronous from the master's POV, so the XID/LSN reports will count and transactions are made to wait for this client? As a side note, the async walreceivers' behaviour should be kept so they don't send anything back and the message that PQsetDuplexCopy() sends to the master would then only prepare the walsender that its client will become synchronous in the near future. >>>> I have added 3 new options, two GUCs in postgresql.conf and one >>>> setting in recovery.conf. These options are: >>>> >>>> 1. min_sync_replication_clients = N >>>> >>>> where N is the number of reports for a given transaction before it's >>>> released as committed synchronously. 0 means completely asynchronous, >>>> the value is maximized by the value of max_wal_senders. Anything >>>> in between 0 and max_wal_senders means different levels of partially >>>> synchronous replication. >>>> >>>> 2. strict_sync_replication = boolean >>>> >>>> where the expected number of synchronous reports from standby >>>> servers is further limited to the actual number of connected synchronous >>>> standby servers if the value of this GUC is false. This means that if >>>> no standby servers are connected yet then the replication is asynchronous >>>> and transactions are allowed to finish without waiting for synchronous >>>> reports. If the value of this GUC is true, then transactions wait until >>>> enough synchronous standbys connect and report back. >>>> >>>> >>> Why are these options necessary? >>> >>> Can these options cover more than three synchronization levels? >>> >>> >> I think I explained it in my mail. >> >> If min_sync_replication_clients == 0, then the replication is async. >> If min_sync_replication_clients == max_wal_senders then the >> replication is fully synchronous. >> If 0 < min_sync_replication_clients < max_wal_senders then >> the replication is partially synchronous, i.e. the master can wait >> only for say, 50% of the clients to report back before it's considered >> synchronous and the relevant transactions get released from the wait. >> > > Seems s/min_sync_replication_clients/max_sync_replication_clients > No, "min" is indicating the minimum number of walreceiver reports needed before a transaction can be released from under the waiting. The other reports coming from walreceivers are ignored. > min_sync_replication_clients is required to prevent outside attacker > from connecting to the master as "synchronous" standby, and degrading > the performance on the master? ??? Properly configured pg_hba.conf prevents outside attackers to connect as replication clients, no? > Other usecase? > > Regards, > > -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
pgsql-hackers by date: