Thread: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
Hi, While the database is performing end-of-recovery checkpoint, the control file gets updated with db state as "shutting down" in CreateCheckPoint (see the code snippet at [1]) and at the end it sets it back to "shut down" for a brief moment and then finally to "in production". If the end-of-recovery checkpoint takes a lot of time or the db goes down during the end-of-recovery checkpoint for whatever reasons, the control file ends up having the wrong db state. Should we add a new db state something like DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or something else to represent the correct state? Thoughts? [1] void CreateCheckPoint(int flags) { /* * An end-of-recovery checkpoint is really a shutdown checkpoint, just * issued at a different time. */ if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) shutdown = true; else shutdown = false; if (shutdown) { LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); ControlFile->state = DB_SHUTDOWNING; UpdateControlFile(); LWLockRelease(ControlFileLock); } if (shutdown) ControlFile->state = DB_SHUTDOWNED; Regards, Bharath Rupireddy.
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
"Bossart, Nathan"
Date:
On 12/6/21, 4:34 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > While the database is performing end-of-recovery checkpoint, the > control file gets updated with db state as "shutting down" in > CreateCheckPoint (see the code snippet at [1]) and at the end it sets > it back to "shut down" for a brief moment and then finally to "in > production". If the end-of-recovery checkpoint takes a lot of time or > the db goes down during the end-of-recovery checkpoint for whatever > reasons, the control file ends up having the wrong db state. > > Should we add a new db state something like > DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or > something else to represent the correct state? This seems like a reasonable change to me. From a quick glance, it looks like it should be a simple fix that wouldn't add too much divergence between the shutdown and end-of-recovery checkpoint code paths. Nathan
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Kyotaro Horiguchi
Date:
At Mon, 6 Dec 2021 19:28:03 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in > On 12/6/21, 4:34 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > While the database is performing end-of-recovery checkpoint, the > > control file gets updated with db state as "shutting down" in > > CreateCheckPoint (see the code snippet at [1]) and at the end it sets > > it back to "shut down" for a brief moment and then finally to "in > > production". If the end-of-recovery checkpoint takes a lot of time or > > the db goes down during the end-of-recovery checkpoint for whatever > > reasons, the control file ends up having the wrong db state. > > > > Should we add a new db state something like > > DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or > > something else to represent the correct state? > > This seems like a reasonable change to me. From a quick glance, it > looks like it should be a simple fix that wouldn't add too much > divergence between the shutdown and end-of-recovery checkpoint code > paths. Technically end-of-crash-recovery checkpointis actually a kind of shutdown checkpoint. In other words, a server that needs to run a crash recovery actually is once shut down then enters normal operation mode internally. So if the server crashed after the end-of-recovery checkpoint finished and before it enters DB_IN_PRODUCTION state, the server would start with a clean startup next time. We could treat DB_IN_END_OF_RECOVERY_CHECKPOINT as safe state to skip recovery but I don't think we need to preserve that behavior. In other places, server log and ps display specifically, we already make distinction between end-of-recovery checkopint and shutdown checkpoint. Finally, I agree to Nathan that it should be simple enough. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Tue, Dec 7, 2021 at 12:58 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 12/6/21, 4:34 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > While the database is performing end-of-recovery checkpoint, the > > control file gets updated with db state as "shutting down" in > > CreateCheckPoint (see the code snippet at [1]) and at the end it sets > > it back to "shut down" for a brief moment and then finally to "in > > production". If the end-of-recovery checkpoint takes a lot of time or > > the db goes down during the end-of-recovery checkpoint for whatever > > reasons, the control file ends up having the wrong db state. > > > > Should we add a new db state something like > > DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or > > something else to represent the correct state? > > This seems like a reasonable change to me. From a quick glance, it > looks like it should be a simple fix that wouldn't add too much > divergence between the shutdown and end-of-recovery checkpoint code > paths. Here's a patch that I've come up with. Please see if this looks okay and let me know if we want to take it forward so that I can add a CF entry. The new status one would see is as follows: bharath@bharathubuntu3:~/postgres/inst/bin$ ./pg_controldata -D data pg_control version number: 1500 Catalog version number: 202111301 Database system identifier: 7038867865889221935 Database cluster state: in end-of-recovery checkpoint pg_control last modified: Tue Dec 7 08:06:18 2021 Latest checkpoint location: 0/14D24A0 Latest checkpoint's REDO location: 0/14D24A0 Latest checkpoint's REDO WAL file: 000000010000000000000001 Regards, Bharath Rupireddy.
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
"Bossart, Nathan"
Date:
On 12/7/21, 12:10 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > Here's a patch that I've come up with. Please see if this looks okay > and let me know if we want to take it forward so that I can add a CF > entry. Overall, the patch seems reasonable to me. + case DB_IN_END_OF_RECOVERY_CHECKPOINT: + ereport(LOG, + (errmsg("database system was interrupted while in end-of-recovery checkpoint at %s", + str_time(ControlFile->time)))); + break; I noticed that some (but not all) of the surrounding messages say "last known up at" the control file time. I'm curious why you chose not to use that phrasing in this case. Nathan
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Wed, Dec 8, 2021 at 2:50 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 12/7/21, 12:10 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > Here's a patch that I've come up with. Please see if this looks okay > > and let me know if we want to take it forward so that I can add a CF > > entry. > > Overall, the patch seems reasonable to me. Thanks for reviewing this. > + case DB_IN_END_OF_RECOVERY_CHECKPOINT: > + ereport(LOG, > + (errmsg("database system was interrupted while in end-of-recovery checkpoint at%s", > + str_time(ControlFile->time)))); > + break; > > I noticed that some (but not all) of the surrounding messages say > "last known up at" the control file time. I'm curious why you chose > not to use that phrasing in this case. If state is DB_IN_END_OF_RECOVERY_CHECKPOINT that means the db was interrupted while in end-of-recovery checkpoint, so I used the phrasing similar to DB_IN_CRASH_RECOVERY and DB_IN_ARCHIVE_RECOVERY cases. I would like to keep it as-is (in the v1 patch) unless anyone has other thoughts here? (errmsg("database system was interrupted while in recovery at %s", (errmsg("database system was interrupted while in recovery at log time %s", I added a CF entry here - https://commitfest.postgresql.org/36/3442/ Regards, Bharath Rupireddy.
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
"Bossart, Nathan"
Date:
On 12/7/21, 5:21 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > On Wed, Dec 8, 2021 at 2:50 AM Bossart, Nathan <bossartn@amazon.com> wrote: >> I noticed that some (but not all) of the surrounding messages say >> "last known up at" the control file time. I'm curious why you chose >> not to use that phrasing in this case. > > If state is DB_IN_END_OF_RECOVERY_CHECKPOINT that means the db was > interrupted while in end-of-recovery checkpoint, so I used the > phrasing similar to DB_IN_CRASH_RECOVERY and DB_IN_ARCHIVE_RECOVERY > cases. I would like to keep it as-is (in the v1 patch) unless anyone > has other thoughts here? > (errmsg("database system was interrupted while in recovery at %s", > (errmsg("database system was interrupted while in recovery at log time %s", I think that's alright. The only other small suggestion I have would be to say "during end-of-recovery checkpoint" instead of "while in end-of-recovery checkpoint." Another option we might want to consider is to just skip updating the state entirely for end-of-recovery checkpoints. The state would instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. I don't know if it's crucial to have a dedicated control file state for end-of-recovery checkpoints. Nathan
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Wed, Dec 8, 2021 at 9:49 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 12/7/21, 5:21 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Dec 8, 2021 at 2:50 AM Bossart, Nathan <bossartn@amazon.com> wrote: > >> I noticed that some (but not all) of the surrounding messages say > >> "last known up at" the control file time. I'm curious why you chose > >> not to use that phrasing in this case. > > > > If state is DB_IN_END_OF_RECOVERY_CHECKPOINT that means the db was > > interrupted while in end-of-recovery checkpoint, so I used the > > phrasing similar to DB_IN_CRASH_RECOVERY and DB_IN_ARCHIVE_RECOVERY > > cases. I would like to keep it as-is (in the v1 patch) unless anyone > > has other thoughts here? > > (errmsg("database system was interrupted while in recovery at %s", > > (errmsg("database system was interrupted while in recovery at log time %s", > > I think that's alright. The only other small suggestion I have would > be to say "during end-of-recovery checkpoint" instead of "while in > end-of-recovery checkpoint." "while in" is being used by DB_IN_CRASH_RECOVERY and DB_IN_ARCHIVE_RECOVERY messages. I don't think it's a good idea to deviate from that and use "during". > Another option we might want to consider is to just skip updating the > state entirely for end-of-recovery checkpoints. The state would > instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. I > don't know if it's crucial to have a dedicated control file state for > end-of-recovery checkpoints. Please note that end-of-recovery can take a while in production systems (we have observed such things working with our customers) and anything can happen during that period of time. The end-of-recovery checkpoint is not something that gets finished momentarily. Therefore, having a separate db state in the control file is useful. Regards, Bharath Rupireddy.
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
"Bossart, Nathan"
Date:
On 12/7/21, 8:42 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > On Wed, Dec 8, 2021 at 9:49 AM Bossart, Nathan <bossartn@amazon.com> wrote: >> I think that's alright. The only other small suggestion I have would >> be to say "during end-of-recovery checkpoint" instead of "while in >> end-of-recovery checkpoint." > > "while in" is being used by DB_IN_CRASH_RECOVERY and > DB_IN_ARCHIVE_RECOVERY messages. I don't think it's a good idea to > deviate from that and use "during". Fair enough. I don't have a strong opinion about this. >> Another option we might want to consider is to just skip updating the >> state entirely for end-of-recovery checkpoints. The state would >> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. I >> don't know if it's crucial to have a dedicated control file state for >> end-of-recovery checkpoints. > > Please note that end-of-recovery can take a while in production > systems (we have observed such things working with our customers) and > anything can happen during that period of time. The end-of-recovery > checkpoint is not something that gets finished momentarily. Therefore, > having a separate db state in the control file is useful. Is there some useful distinction between the states for users? ISTM that users will be waiting either way, and I don't know that an extra control file state will help all that much. The main reason I bring up this option is that the list of states is pretty short and appears to be intended to indicate the high-level status of the server. Most of the states are over 20 years old, and the newest one is over 10 years old, so I don't think new states can be added willy-nilly. Of course, I could be off-base and others might agree that this new state would be nice to have. Nathan
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Wed, Dec 8, 2021 at 10:59 AM Bossart, Nathan <bossartn@amazon.com> wrote: > >> Another option we might want to consider is to just skip updating the > >> state entirely for end-of-recovery checkpoints. The state would > >> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. I > >> don't know if it's crucial to have a dedicated control file state for > >> end-of-recovery checkpoints. > > > > Please note that end-of-recovery can take a while in production > > systems (we have observed such things working with our customers) and > > anything can happen during that period of time. The end-of-recovery > > checkpoint is not something that gets finished momentarily. Therefore, > > having a separate db state in the control file is useful. > > Is there some useful distinction between the states for users? ISTM > that users will be waiting either way, and I don't know that an extra > control file state will help all that much. The main reason I bring > up this option is that the list of states is pretty short and appears > to be intended to indicate the high-level status of the server. Most > of the states are over 20 years old, and the newest one is over 10 > years old, so I don't think new states can be added willy-nilly. Firstly, updating the control file with "DB_SHUTDOWNING" and "DB_SHUTDOWNED" for end-of-recovery checkpoint is wrong. I don't think having DB_IN_CRASH_RECOVERY for end-of-recovery checkpoint is a great idea. We have a checkpoint (which most of the time takes a while) in between the states DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. The state DB_IN_END_OF_RECOVERY_CHECKPOINT added by the v1 patch at [1] (in this thread) helps users to understand and clearly distinguish what state the db is in. IMHO, the age of the code doesn't stop us adding/fixing/improving the code. > Of course, I could be off-base and others might agree that this new > state would be nice to have. Let's see what others have to say about this. [1] - https://www.postgresql.org/message-id/CALj2ACVn5M8xgQ3RD%3D6rSTbbXRBdBWZ%3DTTOBOY_5%2BedMCkWjHA%40mail.gmail.com Regards, Bharath Rupireddy.
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Kyotaro Horiguchi
Date:
At Wed, 8 Dec 2021 11:47:30 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Wed, Dec 8, 2021 at 10:59 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > >> Another option we might want to consider is to just skip updating the > > >> state entirely for end-of-recovery checkpoints. The state would > > >> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. I > > >> don't know if it's crucial to have a dedicated control file state for > > >> end-of-recovery checkpoints. FWIW I find it simple but sufficient since I regarded the end-of-recovery checkpoint as a part of recovery. In that case what is strange here is only that the state transition passes the DB_SHUTDOWN(ING/ED) states. On the other hand, when a server is going to shutdown, the state stays at DB_IN_PRODUCTION if there are clinging clients even if the shutdown procedure has been already started and no new clients can connect to the server. There's no reason we need to be so particular about states for recovery-end. > > > Please note that end-of-recovery can take a while in production > > > systems (we have observed such things working with our customers) and > > > anything can happen during that period of time. The end-of-recovery > > > checkpoint is not something that gets finished momentarily. Therefore, > > > having a separate db state in the control file is useful. > > > > Is there some useful distinction between the states for users? ISTM > > that users will be waiting either way, and I don't know that an extra > > control file state will help all that much. The main reason I bring > > up this option is that the list of states is pretty short and appears > > to be intended to indicate the high-level status of the server. Most > > of the states are over 20 years old, and the newest one is over 10 > > years old, so I don't think new states can be added willy-nilly. > > Firstly, updating the control file with "DB_SHUTDOWNING" and > "DB_SHUTDOWNED" for end-of-recovery checkpoint is wrong. I don't think > having DB_IN_CRASH_RECOVERY for end-of-recovery checkpoint is a great > idea. We have a checkpoint (which most of the time takes a while) in > between the states DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. The state > DB_IN_END_OF_RECOVERY_CHECKPOINT added by the v1 patch at [1] (in this > thread) helps users to understand and clearly distinguish what state > the db is in. > > IMHO, the age of the code doesn't stop us adding/fixing/improving the code. > > > Of course, I could be off-base and others might agree that this new > > state would be nice to have. > > Let's see what others have to say about this. I see it a bit too complex for the advantage. When end-of-recovery checkpoint takes so long, that state is shown in server log, which operators would look into before the control file. > [1] - https://www.postgresql.org/message-id/CALj2ACVn5M8xgQ3RD%3D6rSTbbXRBdBWZ%3DTTOBOY_5%2BedMCkWjHA%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Wed, Dec 8, 2021 at 1:05 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 8 Dec 2021 11:47:30 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > On Wed, Dec 8, 2021 at 10:59 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > > >> Another option we might want to consider is to just skip updating the > > > >> state entirely for end-of-recovery checkpoints. The state would > > > >> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. I > > > >> don't know if it's crucial to have a dedicated control file state for > > > >> end-of-recovery checkpoints. > > FWIW I find it simple but sufficient since I regarded the > end-of-recovery checkpoint as a part of recovery. In that case what > is strange here is only that the state transition passes the > DB_SHUTDOWN(ING/ED) states. > > On the other hand, when a server is going to shutdown, the state stays > at DB_IN_PRODUCTION if there are clinging clients even if the shutdown > procedure has been already started and no new clients can connect to > the server. There's no reason we need to be so particular about states > for recovery-end. > > I see it a bit too complex for the advantage. When end-of-recovery > checkpoint takes so long, that state is shown in server log, which > operators would look into before the control file. Thanks for your thoughts. I'm fine either way, hence attaching two patches here with and I will leave it for the committer 's choice. 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch -- adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file. 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch -- just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in case of end-of-recovery checkpoint so that the state will be DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION. Regards, Bharath Rupireddy.
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
"Bossart, Nathan"
Date:
On 12/8/21, 3:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > Thanks for your thoughts. I'm fine either way, hence attaching two > patches here with and I will leave it for the committer 's choice. > 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch -- > adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file. > 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch -- > just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in > case of end-of-recovery checkpoint so that the state will be > DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION. I've bumped this one to ready-for-committer. For the record, my preference is the second patch (for the reasons discussed upthread). Both patches might benefit from a small comment or two, too. Nathan
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Wed, Dec 8, 2021 at 11:02 PM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 12/8/21, 3:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > Thanks for your thoughts. I'm fine either way, hence attaching two > > patches here with and I will leave it for the committer 's choice. > > 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch -- > > adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file. > > 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch -- > > just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in > > case of end-of-recovery checkpoint so that the state will be > > DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION. > > I've bumped this one to ready-for-committer. For the record, my > preference is the second patch (for the reasons discussed upthread). > Both patches might benefit from a small comment or two, too. Thanks. I've added a comment to the patch v2-0001-Skip-control-file-db-state-updation-during-end-of.patch. The other patch remains the same as the new state DB_IN_END_OF_RECOVERY_CHECKPOINT introduced there says it all. Regards, Bharath Rupireddy.
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Jaime Casanova
Date:
On Thu, Dec 09, 2021 at 07:41:52AM +0530, Bharath Rupireddy wrote: > On Wed, Dec 8, 2021 at 11:02 PM Bossart, Nathan <bossartn@amazon.com> wrote: > > > > On 12/8/21, 3:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > > Thanks for your thoughts. I'm fine either way, hence attaching two > > > patches here with and I will leave it for the committer 's choice. > > > 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch -- > > > adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file. > > > 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch -- > > > just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in > > > case of end-of-recovery checkpoint so that the state will be > > > DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION. > > > > I've bumped this one to ready-for-committer. For the record, my > > preference is the second patch (for the reasons discussed upthread). > > Both patches might benefit from a small comment or two, too. > > Thanks. I've added a comment to the patch > v2-0001-Skip-control-file-db-state-updation-during-end-of.patch. The > other patch remains the same as the new state > DB_IN_END_OF_RECOVERY_CHECKPOINT introduced there says it all. > AFAIU is one patch or the other but not both, isn't it? This habit of putting two conflicting versions of patches on the same thread causes http://cfbot.cputube.org/ to fail. Now; I do think that the secondd patch, the one that just skips update of the state in control file, is the way to go. The other patch adds too much complexity for a small return. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Mon, Jan 10, 2022 at 10:58 AM Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > On Thu, Dec 09, 2021 at 07:41:52AM +0530, Bharath Rupireddy wrote: > > On Wed, Dec 8, 2021 at 11:02 PM Bossart, Nathan <bossartn@amazon.com> wrote: > > > > > > On 12/8/21, 3:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Thanks for your thoughts. I'm fine either way, hence attaching two > > > > patches here with and I will leave it for the committer 's choice. > > > > 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch -- > > > > adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file. > > > > 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch -- > > > > just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in > > > > case of end-of-recovery checkpoint so that the state will be > > > > DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION. > > > > > > I've bumped this one to ready-for-committer. For the record, my > > > preference is the second patch (for the reasons discussed upthread). > > > Both patches might benefit from a small comment or two, too. > > > > Thanks. I've added a comment to the patch > > v2-0001-Skip-control-file-db-state-updation-during-end-of.patch. The > > other patch remains the same as the new state > > DB_IN_END_OF_RECOVERY_CHECKPOINT introduced there says it all. > Now; I do think that the secondd patch, the one that just skips update > of the state in control file, is the way to go. The other patch adds too > much complexity for a small return. Thanks. Attaching the above patch. Regards, Bharath Rupireddy.
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Mon, Jan 10, 2022 at 11:04:05AM +0530, Bharath Rupireddy wrote: > On Mon, Jan 10, 2022 at 10:58 AM Jaime Casanova > <jcasanov@systemguards.com.ec> wrote: >> Now; I do think that the secondd patch, the one that just skips update >> of the state in control file, is the way to go. The other patch adds too >> much complexity for a small return. > > Thanks. Attaching the above patch. I agree that the addition of DB_IN_END_OF_RECOVERY_CHECKPOINT is not necessary as the control file state will be reflected in a live server once it the instance is ready to write WAL after promotion, as much as I agree that the state stored in the control file because of the end-of-recovery record does not reflect the reality. Now, I also find confusing the state of CreateCheckpoint() once this patch gets applied. Now the code and comments imply that an end-of-recovery checkpoint is a shutdown checkpoint because they perform the same actions, which is fine. Could it be less confusing to remove completely the "shutdown" variable instead and replace those checks with "flags"? What the patch is doing is one step in this direction. -- Michael
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
"Bossart, Nathan"
Date:
On 1/24/22, 9:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Now, I also find confusing the state of CreateCheckpoint() once this > patch gets applied. Now the code and comments imply that an > end-of-recovery checkpoint is a shutdown checkpoint because they > perform the same actions, which is fine. Could it be less confusing > to remove completely the "shutdown" variable instead and replace those > checks with "flags"? What the patch is doing is one step in this > direction. I looked into removing the "shutdown" variable in favor of using "flags" everywhere, but the patch was quite messy and repetitive. I think another way to make things less confusing is to replace "shutdown" with an inverse variable called "online." The attached patch does it this way. Nathan
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Tue, Jan 25, 2022 at 07:20:05PM +0000, Bossart, Nathan wrote: > I looked into removing the "shutdown" variable in favor of using > "flags" everywhere, but the patch was quite messy and repetitive. I > think another way to make things less confusing is to replace > "shutdown" with an inverse variable called "online." The attached > patch does it this way. Yeah, that sounds like a good compromise. At least, I find the whole a bit easier to follow. Heikki was planning to commit a large refactoring of xlog.c, so we'd better wait for that to happen before concluding here. -- Michael
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Wed, Jan 26, 2022 at 6:28 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jan 25, 2022 at 07:20:05PM +0000, Bossart, Nathan wrote: > > I looked into removing the "shutdown" variable in favor of using > > "flags" everywhere, but the patch was quite messy and repetitive. I > > think another way to make things less confusing is to replace > > "shutdown" with an inverse variable called "online." The attached > > patch does it this way. > > Yeah, that sounds like a good compromise. At least, I find the whole > a bit easier to follow. v3 LGTM. > Heikki was planning to commit a large refactoring of xlog.c, so we'd > better wait for that to happen before concluding here. Will that patch set fix the issue reported here? Regards, Bharath Rupireddy.
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Wed, Jan 26, 2022 at 08:09:16AM +0530, Bharath Rupireddy wrote: > Will that patch set fix the issue reported here? Looking at [1], the area of CreateCheckPoint() is left untouched. [1]: https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi -- Michael
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Wed, Jan 26, 2022 at 12:04 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jan 26, 2022 at 08:09:16AM +0530, Bharath Rupireddy wrote: > > Will that patch set fix the issue reported here? > > Looking at [1], the area of CreateCheckPoint() is left untouched. > > [1]: https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi I see. IMHO, we can fix the issue reported here then. Thoughts? Regards, Bharath Rupireddy.
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Wed, Jan 26, 2022 at 12:13:03PM +0530, Bharath Rupireddy wrote: > I see. IMHO, we can fix the issue reported here then. Yes. -- Michael
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Wed, Jan 26, 2022 at 09:57:59AM +0900, Michael Paquier wrote: > Yeah, that sounds like a good compromise. At least, I find the whole > a bit easier to follow. So, I have been checking this idea in details, and spotted what looks like one issue in CreateRestartPoint(), as of: /* * Update pg_control, using current time. Check that it still shows * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; * this is a quick hack to make sure nothing really bad happens if somehow * we get here after the end-of-recovery checkpoint. */ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && ControlFile->checkPointCopy.redo < lastCheckPoint.redo) This change increases the window making this code path reachable if an end-of-recovery checkpoint is triggered but not finished at the end of recovery (possible of course at the end of crash recovery, but DB_IN_ARCHIVE_RECOVERY is also possible when !IsUnderPostmaster with a promotion request), before updating ControlFile->checkPointCopy at the end of the checkpoint because the state could still be DB_IN_ARCHIVE_RECOVERY. The window is wider the longer the end-of-recovery checkpoint. And this would be the case of an instance restarted, when a restart point is created. > Heikki was planning to commit a large refactoring of xlog.c, so we'd > better wait for that to happen before concluding here. I have checked that as well, and both are independent. -- Michael
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Nathan Bossart
Date:
On Thu, Jan 27, 2022 at 02:06:40PM +0900, Michael Paquier wrote: > So, I have been checking this idea in details, and spotted what looks > like one issue in CreateRestartPoint(), as of: > /* > * Update pg_control, using current time. Check that it still shows > * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; > * this is a quick hack to make sure nothing really bad happens if somehow > * we get here after the end-of-recovery checkpoint. > */ > LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); > if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && > ControlFile->checkPointCopy.redo < lastCheckPoint.redo) > > This change increases the window making this code path reachable if an > end-of-recovery checkpoint is triggered but not finished at the end of > recovery (possible of course at the end of crash recovery, but > DB_IN_ARCHIVE_RECOVERY is also possible when !IsUnderPostmaster with a > promotion request), before updating ControlFile->checkPointCopy at the > end of the checkpoint because the state could still be > DB_IN_ARCHIVE_RECOVERY. The window is wider the longer the > end-of-recovery checkpoint. And this would be the case of an instance > restarted, when a restart point is created. I wonder if this is actually a problem in practice. IIUC all of the values updated in this block should be reset at the end of the end-of-recovery checkpoint. Is the intent of the quick hack to prevent those updates after an end-of-recovery checkpoint completes, or is it trying to block them after one begins? It looks like the control file was being updated to DB_SHUTDOWNING at the beginning of end-of-recovery checkpoints when that change was first introduced (2de48a8), so I agree that we'd better be careful with this change. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com/
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Kyotaro Horiguchi
Date:
At Tue, 25 Jan 2022 19:20:05 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in > On 1/24/22, 9:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > > Now, I also find confusing the state of CreateCheckpoint() once this > > patch gets applied. Now the code and comments imply that an > > end-of-recovery checkpoint is a shutdown checkpoint because they > > perform the same actions, which is fine. Could it be less confusing > > to remove completely the "shutdown" variable instead and replace those > > checks with "flags"? What the patch is doing is one step in this > > direction. > > I looked into removing the "shutdown" variable in favor of using > "flags" everywhere, but the patch was quite messy and repetitive. I > think another way to make things less confusing is to replace > "shutdown" with an inverse variable called "online." The attached > patch does it this way. I find that change doesn't work. As Michael said the "shutdown" is implies "shutdown checkpoint". And end-of-recovery checkpoint is done online (means "not-shutdowning"). shutdown_checkpoint works for me. Renaming "shutdown checkpoint" as "exclusive checkpoint" or so also works for me but I think it would cause halation (or zealous objections).. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Kyotaro Horiguchi
Date:
At Thu, 27 Jan 2022 10:47:20 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in > On Thu, Jan 27, 2022 at 02:06:40PM +0900, Michael Paquier wrote: > > So, I have been checking this idea in details, and spotted what looks > > like one issue in CreateRestartPoint(), as of: > > /* > > * Update pg_control, using current time. Check that it still shows > > * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; > > * this is a quick hack to make sure nothing really bad happens if somehow > > * we get here after the end-of-recovery checkpoint. > > */ > > LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); > > if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && > > ControlFile->checkPointCopy.redo < lastCheckPoint.redo) > > > > This change increases the window making this code path reachable if an > > end-of-recovery checkpoint is triggered but not finished at the end of > > recovery (possible of course at the end of crash recovery, but > > DB_IN_ARCHIVE_RECOVERY is also possible when !IsUnderPostmaster with a > > promotion request), before updating ControlFile->checkPointCopy at the > > end of the checkpoint because the state could still be > > DB_IN_ARCHIVE_RECOVERY. The window is wider the longer the > > end-of-recovery checkpoint. And this would be the case of an instance > > restarted, when a restart point is created. > > I wonder if this is actually a problem in practice. IIUC all of the values > updated in this block should be reset at the end of the end-of-recovery > checkpoint. Is the intent of the quick hack to prevent those updates after > an end-of-recovery checkpoint completes, or is it trying to block them > after one begins? It looks like the control file was being updated to > DB_SHUTDOWNING at the beginning of end-of-recovery checkpoints when that > change was first introduced (2de48a8), so I agree that we'd better be > careful with this change. Putting aside the readyness of the patch, I think what the patch intends is to avoid starnge state transitions happen during end-of-recovery checkpoint. So, I'm confused... End-of-recovery checkpoint is requested as CHECKPOINT_WAIT, which seems to me to mean the state is always DB_IN_ARCHIVE_RECOVERY while the checkpoint is running? If correct, if server is killed druing the end-of-recovery checkpoint, the state stays DB_IN_ARCHIVE_RECOVERY instead of DB_SHUTDOWNING or DB_SHUTDOWNED. AFAICS there's no differece between the first two at next startup. I dont' think DB_SHUTDOWNED case is not worth considering here. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Fri, Jan 28, 2022 at 06:32:27PM +0900, Kyotaro Horiguchi wrote: > End-of-recovery checkpoint is requested as CHECKPOINT_WAIT, which > seems to me to mean the state is always DB_IN_ARCHIVE_RECOVERY while > the checkpoint is running? With the patch, yes, we would keep the control file under DB_IN_{ARCHIVE,CRASH}_RECOVERY during the whole period of the end-of-recovery checkpoint. On HEAD, we would have DB_SHUTDOWNING until the end-of-recovery checkpoint completes, after which we switch to DB_SHUTDOWNED for a short time before DB_IN_PRODUCTION. > If correct, if server is killed druing the > end-of-recovery checkpoint, the state stays DB_IN_ARCHIVE_RECOVERY > instead of DB_SHUTDOWNING or DB_SHUTDOWNED. AFAICS there's no > differece between the first two at next startup. One difference is the hint given to the user at the follow-up startup. Crash and archive recovery states can freak people easily as they mention the risk of corruption. Using DB_SHUTDOWNING reduces this window. > I dont' think DB_SHUTDOWNED case is not worth considering here. Well, this patch also means there is a small window where we have DB_IN_ARCHIVE_RECOVERY in the control file with the new timeline and minRecoveryPoint not set, rather than DB_SHUTDOWNED. -- Michael
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Bharath Rupireddy
Date:
On Sat, Jan 29, 2022 at 7:43 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jan 28, 2022 at 06:32:27PM +0900, Kyotaro Horiguchi wrote: > > End-of-recovery checkpoint is requested as CHECKPOINT_WAIT, which > > seems to me to mean the state is always DB_IN_ARCHIVE_RECOVERY while > > the checkpoint is running? > > With the patch, yes, we would keep the control file under > DB_IN_{ARCHIVE,CRASH}_RECOVERY during the whole period of the > end-of-recovery checkpoint. On HEAD, we would have DB_SHUTDOWNING > until the end-of-recovery checkpoint completes, after which we switch > to DB_SHUTDOWNED for a short time before DB_IN_PRODUCTION. > > > If correct, if server is killed druing the > > end-of-recovery checkpoint, the state stays DB_IN_ARCHIVE_RECOVERY > > instead of DB_SHUTDOWNING or DB_SHUTDOWNED. AFAICS there's no > > differece between the first two at next startup. > > One difference is the hint given to the user at the follow-up startup. > Crash and archive recovery states can freak people easily as they > mention the risk of corruption. Using DB_SHUTDOWNING reduces this > window. If the server crashes in end-of-recovery, in the follow-up startup, the server has to start all the recovery right? In that case, DB_IN_{ARCHIVE, CRASH}_RECOVERY would represent the correct state to the user, not the DB_SHUTDOWNING/DB_SHUTDOWNED IMO. There's another option to have a new state DB_IN_END_OF_RECOVERY_CHECKPOINT, if the DB_IN_{ARCHIVE, CRASH}_RECOVERY really scares users of end-of-recovery crash? Regards, Bharath Rupireddy.
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Sat, Jan 29, 2022 at 08:07:23PM +0530, Bharath Rupireddy wrote: > If the server crashes in end-of-recovery, in the follow-up startup, > the server has to start all the recovery right? In that case, > DB_IN_{ARCHIVE, CRASH}_RECOVERY would represent the correct state to > the user, not the DB_SHUTDOWNING/DB_SHUTDOWNED IMO. > > There's another option to have a new state > DB_IN_END_OF_RECOVERY_CHECKPOINT, if the DB_IN_{ARCHIVE, > CRASH}_RECOVERY really scares users of end-of-recovery crash? Well, an end-of-recovery checkpoint is a shutdown checkpoint, and it relies on the same assumptions in terms of checkpoint logic for the last 10 years or so, so the state of the control file is not wrong per-se, either. There are other cases that we may want to worry about with this change, like the fact that unlogged relation reset relies on the cluster to be cleanly shut down when we begin entering the replay loop. And it seems to me that this has not been looked at. A second thing would be the introduction of an invalid LSN minimum recovery point in the control file while in DB_IN_ARCHIVE_RECOVERY when we are done with the checkpoint, joining my point of upthread. At the end of the day, it may be better to just let this stuff be. Another argument for doing nothing is that this could cause hard-to-spot conflicts when it comes to back-patch something. -- Michael
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Nathan Bossart
Date:
On Mon, Jan 31, 2022 at 04:25:21PM +0900, Michael Paquier wrote: > At the end of the day, it may be better to just let this stuff be. > Another argument for doing nothing is that this could cause > hard-to-spot conflicts when it comes to back-patch something. This one has been quiet for a while. Should we mark it as returned-with-feedback? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Fri, Feb 25, 2022 at 01:09:53PM -0800, Nathan Bossart wrote: > This one has been quiet for a while. Should we mark it as > returned-with-feedback? Yes, that's my feeling and I got cold feet about this change. This patch would bring some extra visibility for something that's not incorrect either on HEAD, as end-of-recovery checkpoints are the same things as shutdown checkpoints. And there is an extra argument where back-patching would become a bit more tricky in an area that's already a lot sensitive. -- Michael
Attachment
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Kyotaro Horiguchi
Date:
At Sat, 26 Feb 2022 12:11:15 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Fri, Feb 25, 2022 at 01:09:53PM -0800, Nathan Bossart wrote: > > This one has been quiet for a while. Should we mark it as > > returned-with-feedback? > > Yes, that's my feeling and I got cold feet about this change. This > patch would bring some extra visibility for something that's not > incorrect either on HEAD, as end-of-recovery checkpoints are the same > things as shutdown checkpoints. And there is an extra argument where > back-patching would become a bit more tricky in an area that's already > a lot sensitive. That sounds like we should reject the patch as we don't agree to its objective. If someday end-of-recovery checkpoints functionally diverge from shutdown checkpoints but leave (somehow) the transition alone, we may visit this again but it would be another proposal. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
From
Michael Paquier
Date:
On Mon, Feb 28, 2022 at 10:51:06AM +0900, Kyotaro Horiguchi wrote: > That sounds like we should reject the patch as we don't agree to its > objective. If someday end-of-recovery checkpoints functionally > diverge from shutdown checkpoints but leave (somehow) the transition > alone, we may visit this again but it would be another proposal. The patch has been already withdrawn in the CF app. -- Michael