Re: Synchronous replay take III - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Synchronous replay take III |
Date | |
Msg-id | CAEepm=18jOAjFdTbhGeEyTy2JWcVGNe25jpQLMrpa3syK0c+WQ@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronous replay take III (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Synchronous replay take III
Re: Synchronous replay take III |
List | pgsql-hackers |
On Tue, Jan 15, 2019 at 11:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Regarding the current (v10 patch) design I have some questions and > comments. Hi Sawada-san, Thanks for your testing and feedback. > The patch introduces new GUC parameter synchronous_replay. We can set > synchronous_commit = off while setting synchronous_replay = on. With > this setting, the backend will synchrnously wait for standbys to > replay. I'm concerned that having two separate GUC parameters > controling the transaction commit behaviour would confuse users. It's > a just idea but maybe we can use 'remote_apply' for synchronous replay > purpose and introduce new parameter for standby server something like > allow_stale_read. That is an interesting idea. That choice means that the new mode always implies synchronous_commit = on (since remote_apply is a "higher" level). I wanted them to be independent, so you could express your durability requirement separately from your visibility requirement. Concretely, if none of your potential sync replay standbys are keeping up and they are all dropped to "unavailable", then you'd be able to see a difference: with your proposal we'd still have a synchronous commit wait, but with mine that could independently be on or off. Generally, I think we are too restrictive in our durability levels, and there was some discussion about whether it's OK to have a strict linear knob (which your idea extends): https://www.postgresql.org/message-id/flat/CAEepm%3D3FFaanSS4sugG%2BApzq2tCVjEYCO2wOQBod2d7GWb%3DDvA%40mail.gmail.com Hmm, perhaps your way would be better for now anyway, just because it's simpler to understand and explain. Perhaps you wouldn't need a separate "allow_stale_read" GUC, you could just set synchronous_commit to a lower level when talking to the standby. (That is, give synchronous_commit a meaning on standbys, whereas currently it has no effect there.) > If while a transaction is waiting for all standbys to replay they > became to unavailable state, should the waiter be released? the patch > seems not to release the waiter. Similarly, wal senders are not aware > of postgresql.conf change while waiting synchronous replay. I think we > should call SyncReplayPotentialStandby() in SyncRepInitConfig(). Good point about the postgresql.conf change. If all the standbys go to unavailable state, then a waiter should be released once they have all either acknowledged that they are unavailable (ie acknowledged that their lease has been revoked, via a reply message with a serial number matching the revocation message), or if that doesn't happen (due to lost network connection, crashed process etc), once the any leases that have been issued have expired (ie a few seconds). Is that not what you see? > With the setting synchronous_standby_names = '' and > synchronous_replay_standby_names = '*' we would get the standby's > status in pg_stat_replication, sync_state = 'async' and sync_replay = > 'available'. It looks odd to me. Yes, this status is correct in > principle. But considering the architecture of PostgreSQL replication > this status is impossible. Yes, this is essentially the same thing that you were arguing against above. Perhaps you are right, and there are no people who would want synchronous replay, but not synchronous commit. > The synchronous_replay_standby_name = '*' setting means that the > backend wait for all standbys connected to the master server to > replay, is that right? In my test, even when some of synchronous > replay standby servers got stuck and then therefore are revoked their > lease, the backend could proceed transactions. It means that it waits for all standbys that are "available" to replay. It doesn't wait for the "unavailable" ones. Most of the patch deals with the transitions between those states. During an available->revoking->unavailable transition, we also wait for the standby to know that it is unavailable (so that it begins to raise errors), and during an unavailable->joining->available transition we also wait for the standby to replay the transition LSN (so that it stops raising errors). That way clients on the standby can rely on the error (or lack of error) to tell them whether their snapshot definitely contains every commit that has returned control on the primary. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: