Re: [Patch] ALTER SYSTEM READ ONLY - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [Patch] ALTER SYSTEM READ ONLY |
Date | |
Msg-id | CAA4eK1+5BDNS08XKXR7UPkq4tDaV66wyZia4kDMvRUm=a_A=Gg@mail.gmail.com Whole thread Raw |
In response to | [Patch] ALTER SYSTEM READ ONLY (amul sul <sulamul@gmail.com>) |
Responses |
Re: [Patch] ALTER SYSTEM READ ONLY
Re: [Patch] ALTER SYSTEM READ ONLY |
List | pgsql-hackers |
On Tue, Jun 16, 2020 at 7:26 PM amul sul <sulamul@gmail.com> wrote: > > Hi, > > Attached patch proposes $Subject feature which forces the system into read-only > mode where insert write-ahead log will be prohibited until ALTER SYSTEM READ > WRITE executed. > > The high-level goal is to make the availability/scale-out situation better. The feature > will help HA setup where the master server needs to stop accepting WAL writes > immediately and kick out any transaction expecting WAL writes at the end, in case > of network down on master or replication connections failures. > > For example, this feature allows for a controlled switchover without needing to shut > down the master. You can instead make the master read-only, wait until the standby > catches up, and then promote the standby. The master remains available for read > queries throughout, and also for WAL streaming, but without the possibility of any > new write transactions. After switchover is complete, the master can be shut down > and brought back up as a standby without needing to use pg_rewind. (Eventually, it > would be nice to be able to make the read-only master into a standby without having > to restart it, but that is a problem for another patch.) > > This might also help in failover scenarios. For example, if you detect that the master > has lost network connectivity to the standby, you might make it read-only after 30 s, > and promote the standby after 60 s, so that you never have two writable masters at > the same time. In this case, there's still some split-brain, but it's still better than what > we have now. > > Design: > ---------- > The proposed feature is built atop of super barrier mechanism commit[1] to coordinate > global state changes to all active backends. Backends which executed > ALTER SYSTEM READ { ONLY | WRITE } command places request to checkpointer > process to change the requested WAL read/write state aka WAL prohibited and WAL > permitted state respectively. When the checkpointer process sees the WAL prohibit > state change request, it emits a global barrier and waits until all backends that > participate in the ProcSignal absorbs it. Once it has done the WAL read/write state in > share memory and control file will be updated so that XLogInsertAllowed() returns > accordingly. > Do we prohibit the checkpointer to write dirty pages and write a checkpoint record as well? If so, will the checkpointer process writes the current dirty pages and writes a checkpoint record or we skip that as well? > If there are open transactions that have acquired an XID, the sessions are killed > before the barrier is absorbed. > What about prepared transactions? > They can't commit without writing WAL, and they > can't abort without writing WAL, either, so we must at least abort the transaction. We > don't necessarily need to kill the session, but it's hard to avoid in all cases because > (1) if there are subtransactions active, we need to force the top-level abort record to > be written immediately, but we can't really do that while keeping the subtransactions > on the transaction stack, and (2) if the session is idle, we also need the top-level abort > record to be written immediately, but can't send an error to the client until the next > command is issued without losing wire protocol synchronization. For now, we just use > FATAL to kill the session; maybe this can be improved in the future. > > Open transactions that don't have an XID are not killed, but will get an ERROR if they > try to acquire an XID later, or if they try to write WAL without acquiring an XID (e.g. VACUUM). > What if vacuum is on an unlogged relation? Do we allow writes via vacuum to unlogged relation? > To make that happen, the patch adds a new coding rule: a critical section that will write > WAL must be preceded by a call to CheckWALPermitted(), AssertWALPermitted(), or > AssertWALPermitted_HaveXID(). The latter variants are used when we know for certain > that inserting WAL here must be OK, either because we have an XID (we would have > been killed by a change to read-only if one had occurred) or for some other reason. > > The ALTER SYSTEM READ WRITE command can be used to reverse the effects of > ALTER SYSTEM READ ONLY. Both ALTER SYSTEM READ ONLY and ALTER > SYSTEM READ WRITE update not only the shared memory state but also the control > file, so that changes survive a restart. > > The transition between read-write and read-only is a pretty major transition, so we emit > log message for each successful execution of a ALTER SYSTEM READ {ONLY | WRITE} > command. Also, we have added a new GUC system_is_read_only which returns "on" > when the system is in WAL prohibited state or recovery. > > Another part of the patch that quite uneasy and need a discussion is that when the > shutdown in the read-only state we do skip shutdown checkpoint and at a restart, first > startup recovery will be performed and latter the read-only state will be restored to > prohibit further WAL write irrespective of recovery checkpoint succeed or not. The > concern is here if this startup recovery checkpoint wasn't ok, then it will never happen > even if it's later put back into read-write mode. > I am not able to understand this problem. What do you mean by "recovery checkpoint succeed or not", do you add a try..catch and skip any error while performing recovery checkpoint? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: