Thread: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?

Hi,

While the database is performing end-of-recovery checkpoint, the
control file gets updated with db state as "shutting down" in
CreateCheckPoint (see the code snippet at [1]) and at the end it sets
it back to "shut down" for a brief moment and then finally to "in
production". If the end-of-recovery checkpoint takes a lot of time or
the db goes down during the end-of-recovery checkpoint for whatever
reasons, the control file ends up having the wrong db state.

Should we add a new db state something like
DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or
something else to represent the correct state?

Thoughts?

[1]
void
CreateCheckPoint(int flags)
{

    /*
     * An end-of-recovery checkpoint is really a shutdown checkpoint, just
     * issued at a different time.
     */
    if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))
        shutdown = true;
    else
        shutdown = false;

    if (shutdown)
    {
        LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
        ControlFile->state = DB_SHUTDOWNING;
        UpdateControlFile();
        LWLockRelease(ControlFileLock);
    }

    if (shutdown)
        ControlFile->state = DB_SHUTDOWNED;

Regards,
Bharath Rupireddy.



On 12/6/21, 4:34 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> While the database is performing end-of-recovery checkpoint, the
> control file gets updated with db state as "shutting down" in
> CreateCheckPoint (see the code snippet at [1]) and at the end it sets
> it back to "shut down" for a brief moment and then finally to "in
> production". If the end-of-recovery checkpoint takes a lot of time or
> the db goes down during the end-of-recovery checkpoint for whatever
> reasons, the control file ends up having the wrong db state.
>
> Should we add a new db state something like
> DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or
> something else to represent the correct state?

This seems like a reasonable change to me.  From a quick glance, it
looks like it should be a simple fix that wouldn't add too much
divergence between the shutdown and end-of-recovery checkpoint code
paths.

Nathan


At Mon, 6 Dec 2021 19:28:03 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 12/6/21, 4:34 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > While the database is performing end-of-recovery checkpoint, the
> > control file gets updated with db state as "shutting down" in
> > CreateCheckPoint (see the code snippet at [1]) and at the end it sets
> > it back to "shut down" for a brief moment and then finally to "in
> > production". If the end-of-recovery checkpoint takes a lot of time or
> > the db goes down during the end-of-recovery checkpoint for whatever
> > reasons, the control file ends up having the wrong db state.
> >
> > Should we add a new db state something like
> > DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or
> > something else to represent the correct state?
> 
> This seems like a reasonable change to me.  From a quick glance, it
> looks like it should be a simple fix that wouldn't add too much
> divergence between the shutdown and end-of-recovery checkpoint code
> paths.

Technically end-of-crash-recovery checkpointis actually a kind of
shutdown checkpoint. In other words, a server that needs to run a
crash recovery actually is once shut down then enters normal operation
mode internally.  So if the server crashed after the end-of-recovery
checkpoint finished and before it enters DB_IN_PRODUCTION state, the
server would start with a clean startup next time.  We could treat
DB_IN_END_OF_RECOVERY_CHECKPOINT as safe state to skip recovery but I
don't think we need to preserve that behavior.

In other places, server log and ps display specifically, we already
make distinction between end-of-recovery checkopint and shutdown
checkpoint.

Finally, I agree to Nathan that it should be simple enough.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Dec 7, 2021 at 12:58 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/6/21, 4:34 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > While the database is performing end-of-recovery checkpoint, the
> > control file gets updated with db state as "shutting down" in
> > CreateCheckPoint (see the code snippet at [1]) and at the end it sets
> > it back to "shut down" for a brief moment and then finally to "in
> > production". If the end-of-recovery checkpoint takes a lot of time or
> > the db goes down during the end-of-recovery checkpoint for whatever
> > reasons, the control file ends up having the wrong db state.
> >
> > Should we add a new db state something like
> > DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or
> > something else to represent the correct state?
>
> This seems like a reasonable change to me.  From a quick glance, it
> looks like it should be a simple fix that wouldn't add too much
> divergence between the shutdown and end-of-recovery checkpoint code
> paths.

Here's a patch that I've come up with. Please see if this looks okay
and let me know if we want to take it forward so that I can add a CF
entry.

The new status one would see is as follows:
bharath@bharathubuntu3:~/postgres/inst/bin$ ./pg_controldata -D data
pg_control version number:            1500
Catalog version number:               202111301
Database system identifier:           7038867865889221935
Database cluster state:               in end-of-recovery checkpoint
pg_control last modified:             Tue Dec  7 08:06:18 2021
Latest checkpoint location:           0/14D24A0
Latest checkpoint's REDO location:    0/14D24A0
Latest checkpoint's REDO WAL file:    000000010000000000000001

Regards,
Bharath Rupireddy.

Attachment
On 12/7/21, 12:10 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> Here's a patch that I've come up with. Please see if this looks okay
> and let me know if we want to take it forward so that I can add a CF
> entry.

Overall, the patch seems reasonable to me.

+        case DB_IN_END_OF_RECOVERY_CHECKPOINT:
+            ereport(LOG,
+                    (errmsg("database system was interrupted while in end-of-recovery checkpoint at %s",
+                            str_time(ControlFile->time))));
+            break;

I noticed that some (but not all) of the surrounding messages say
"last known up at" the control file time.  I'm curious why you chose
not to use that phrasing in this case.

Nathan


On Wed, Dec 8, 2021 at 2:50 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/7/21, 12:10 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Here's a patch that I've come up with. Please see if this looks okay
> > and let me know if we want to take it forward so that I can add a CF
> > entry.
>
> Overall, the patch seems reasonable to me.

Thanks for reviewing this.

> +               case DB_IN_END_OF_RECOVERY_CHECKPOINT:
> +                       ereport(LOG,
> +                                       (errmsg("database system was interrupted while in end-of-recovery checkpoint
at%s",
 
> +                                                       str_time(ControlFile->time))));
> +                       break;
>
> I noticed that some (but not all) of the surrounding messages say
> "last known up at" the control file time.  I'm curious why you chose
> not to use that phrasing in this case.

If state is DB_IN_END_OF_RECOVERY_CHECKPOINT that means the db was
interrupted while in end-of-recovery checkpoint, so I used the
phrasing similar to DB_IN_CRASH_RECOVERY and DB_IN_ARCHIVE_RECOVERY
cases. I would like to keep it as-is (in the v1 patch) unless anyone
has other thoughts here?
(errmsg("database system was interrupted while in recovery at %s",
(errmsg("database system was interrupted while in recovery at log time %s",

I added a CF entry here - https://commitfest.postgresql.org/36/3442/

Regards,
Bharath Rupireddy.



On 12/7/21, 5:21 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Wed, Dec 8, 2021 at 2:50 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>> I noticed that some (but not all) of the surrounding messages say
>> "last known up at" the control file time.  I'm curious why you chose
>> not to use that phrasing in this case.
>
> If state is DB_IN_END_OF_RECOVERY_CHECKPOINT that means the db was
> interrupted while in end-of-recovery checkpoint, so I used the
> phrasing similar to DB_IN_CRASH_RECOVERY and DB_IN_ARCHIVE_RECOVERY
> cases. I would like to keep it as-is (in the v1 patch) unless anyone
> has other thoughts here?
> (errmsg("database system was interrupted while in recovery at %s",
> (errmsg("database system was interrupted while in recovery at log time %s",

I think that's alright.  The only other small suggestion I have would
be to say "during end-of-recovery checkpoint" instead of "while in
end-of-recovery checkpoint."

Another option we might want to consider is to just skip updating the
state entirely for end-of-recovery checkpoints.  The state would
instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION.  I
don't know if it's crucial to have a dedicated control file state for
end-of-recovery checkpoints.

Nathan


On Wed, Dec 8, 2021 at 9:49 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/7/21, 5:21 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > On Wed, Dec 8, 2021 at 2:50 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> >> I noticed that some (but not all) of the surrounding messages say
> >> "last known up at" the control file time.  I'm curious why you chose
> >> not to use that phrasing in this case.
> >
> > If state is DB_IN_END_OF_RECOVERY_CHECKPOINT that means the db was
> > interrupted while in end-of-recovery checkpoint, so I used the
> > phrasing similar to DB_IN_CRASH_RECOVERY and DB_IN_ARCHIVE_RECOVERY
> > cases. I would like to keep it as-is (in the v1 patch) unless anyone
> > has other thoughts here?
> > (errmsg("database system was interrupted while in recovery at %s",
> > (errmsg("database system was interrupted while in recovery at log time %s",
>
> I think that's alright.  The only other small suggestion I have would
> be to say "during end-of-recovery checkpoint" instead of "while in
> end-of-recovery checkpoint."

"while in" is being used by DB_IN_CRASH_RECOVERY and
DB_IN_ARCHIVE_RECOVERY messages. I don't think it's a good idea to
deviate from that and use "during".

> Another option we might want to consider is to just skip updating the
> state entirely for end-of-recovery checkpoints.  The state would
> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION.  I
> don't know if it's crucial to have a dedicated control file state for
> end-of-recovery checkpoints.

Please note that end-of-recovery can take a while in production
systems (we have observed such things working with our customers) and
anything can happen during that period of time. The end-of-recovery
checkpoint is not something that gets finished momentarily. Therefore,
having a separate db state in the control file is useful.

Regards,
Bharath Rupireddy.



On 12/7/21, 8:42 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Wed, Dec 8, 2021 at 9:49 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>> I think that's alright.  The only other small suggestion I have would
>> be to say "during end-of-recovery checkpoint" instead of "while in
>> end-of-recovery checkpoint."
>
> "while in" is being used by DB_IN_CRASH_RECOVERY and
> DB_IN_ARCHIVE_RECOVERY messages. I don't think it's a good idea to
> deviate from that and use "during".

Fair enough.  I don't have a strong opinion about this.

>> Another option we might want to consider is to just skip updating the
>> state entirely for end-of-recovery checkpoints.  The state would
>> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION.  I
>> don't know if it's crucial to have a dedicated control file state for
>> end-of-recovery checkpoints.
>
> Please note that end-of-recovery can take a while in production
> systems (we have observed such things working with our customers) and
> anything can happen during that period of time. The end-of-recovery
> checkpoint is not something that gets finished momentarily. Therefore,
> having a separate db state in the control file is useful.

Is there some useful distinction between the states for users?  ISTM
that users will be waiting either way, and I don't know that an extra
control file state will help all that much.  The main reason I bring
up this option is that the list of states is pretty short and appears
to be intended to indicate the high-level status of the server.  Most
of the states are over 20 years old, and the newest one is over 10
years old, so I don't think new states can be added willy-nilly.

Of course, I could be off-base and others might agree that this new
state would be nice to have.

Nathan


On Wed, Dec 8, 2021 at 10:59 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> >> Another option we might want to consider is to just skip updating the
> >> state entirely for end-of-recovery checkpoints.  The state would
> >> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION.  I
> >> don't know if it's crucial to have a dedicated control file state for
> >> end-of-recovery checkpoints.
> >
> > Please note that end-of-recovery can take a while in production
> > systems (we have observed such things working with our customers) and
> > anything can happen during that period of time. The end-of-recovery
> > checkpoint is not something that gets finished momentarily. Therefore,
> > having a separate db state in the control file is useful.
>
> Is there some useful distinction between the states for users?  ISTM
> that users will be waiting either way, and I don't know that an extra
> control file state will help all that much.  The main reason I bring
> up this option is that the list of states is pretty short and appears
> to be intended to indicate the high-level status of the server.  Most
> of the states are over 20 years old, and the newest one is over 10
> years old, so I don't think new states can be added willy-nilly.

Firstly, updating the control file with "DB_SHUTDOWNING" and
"DB_SHUTDOWNED" for end-of-recovery checkpoint is wrong. I don't think
having DB_IN_CRASH_RECOVERY for end-of-recovery checkpoint is a great
idea. We have a checkpoint (which most of the time takes a while) in
between the states DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. The state
DB_IN_END_OF_RECOVERY_CHECKPOINT added by the v1 patch at [1] (in this
thread) helps users to understand and clearly distinguish what state
the db is in.

IMHO, the age of the code doesn't stop us adding/fixing/improving the code.

> Of course, I could be off-base and others might agree that this new
> state would be nice to have.

Let's see what others have to say about this.

[1] - https://www.postgresql.org/message-id/CALj2ACVn5M8xgQ3RD%3D6rSTbbXRBdBWZ%3DTTOBOY_5%2BedMCkWjHA%40mail.gmail.com

Regards,
Bharath Rupireddy.



At Wed, 8 Dec 2021 11:47:30 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Wed, Dec 8, 2021 at 10:59 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> > >> Another option we might want to consider is to just skip updating the
> > >> state entirely for end-of-recovery checkpoints.  The state would
> > >> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION.  I
> > >> don't know if it's crucial to have a dedicated control file state for
> > >> end-of-recovery checkpoints.

FWIW I find it simple but sufficient since I regarded the
end-of-recovery checkpoint as a part of recovery.  In that case what
is strange here is only that the state transition passes the
DB_SHUTDOWN(ING/ED) states.

On the other hand, when a server is going to shutdown, the state stays
at DB_IN_PRODUCTION if there are clinging clients even if the shutdown
procedure has been already started and no new clients can connect to
the server. There's no reason we need to be so particular about states
for recovery-end.

> > > Please note that end-of-recovery can take a while in production
> > > systems (we have observed such things working with our customers) and
> > > anything can happen during that period of time. The end-of-recovery
> > > checkpoint is not something that gets finished momentarily. Therefore,
> > > having a separate db state in the control file is useful.
> >
> > Is there some useful distinction between the states for users?  ISTM
> > that users will be waiting either way, and I don't know that an extra
> > control file state will help all that much.  The main reason I bring
> > up this option is that the list of states is pretty short and appears
> > to be intended to indicate the high-level status of the server.  Most
> > of the states are over 20 years old, and the newest one is over 10
> > years old, so I don't think new states can be added willy-nilly.
> 
> Firstly, updating the control file with "DB_SHUTDOWNING" and
> "DB_SHUTDOWNED" for end-of-recovery checkpoint is wrong. I don't think
> having DB_IN_CRASH_RECOVERY for end-of-recovery checkpoint is a great
> idea. We have a checkpoint (which most of the time takes a while) in
> between the states DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION. The state
> DB_IN_END_OF_RECOVERY_CHECKPOINT added by the v1 patch at [1] (in this
> thread) helps users to understand and clearly distinguish what state
> the db is in.
> 
> IMHO, the age of the code doesn't stop us adding/fixing/improving the code.
> 
> > Of course, I could be off-base and others might agree that this new
> > state would be nice to have.
> 
> Let's see what others have to say about this.

I see it a bit too complex for the advantage.  When end-of-recovery
checkpoint takes so long, that state is shown in server log, which
operators would look into before the control file.

> [1] -
https://www.postgresql.org/message-id/CALj2ACVn5M8xgQ3RD%3D6rSTbbXRBdBWZ%3DTTOBOY_5%2BedMCkWjHA%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Wed, Dec 8, 2021 at 1:05 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 8 Dec 2021 11:47:30 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > On Wed, Dec 8, 2021 at 10:59 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> > > >> Another option we might want to consider is to just skip updating the
> > > >> state entirely for end-of-recovery checkpoints.  The state would
> > > >> instead go straight from DB_IN_CRASH_RECOVERY to DB_IN_PRODUCTION.  I
> > > >> don't know if it's crucial to have a dedicated control file state for
> > > >> end-of-recovery checkpoints.
>
> FWIW I find it simple but sufficient since I regarded the
> end-of-recovery checkpoint as a part of recovery.  In that case what
> is strange here is only that the state transition passes the
> DB_SHUTDOWN(ING/ED) states.
>
> On the other hand, when a server is going to shutdown, the state stays
> at DB_IN_PRODUCTION if there are clinging clients even if the shutdown
> procedure has been already started and no new clients can connect to
> the server. There's no reason we need to be so particular about states
> for recovery-end.
>
> I see it a bit too complex for the advantage.  When end-of-recovery
> checkpoint takes so long, that state is shown in server log, which
> operators would look into before the control file.

Thanks for your thoughts. I'm fine either way, hence attaching two
patches here with and I will leave it for the committer 's choice.
1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch --
adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file.
2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch --
just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in
case of end-of-recovery checkpoint so that the state will be
DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION.

Regards,
Bharath Rupireddy.

Attachment
On 12/8/21, 3:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> Thanks for your thoughts. I'm fine either way, hence attaching two
> patches here with and I will leave it for the committer 's choice.
> 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch --
> adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file.
> 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch --
> just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in
> case of end-of-recovery checkpoint so that the state will be
> DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION.

I've bumped this one to ready-for-committer.  For the record, my
preference is the second patch (for the reasons discussed upthread).
Both patches might benefit from a small comment or two, too.

Nathan


On Wed, Dec 8, 2021 at 11:02 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/8/21, 3:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Thanks for your thoughts. I'm fine either way, hence attaching two
> > patches here with and I will leave it for the committer 's choice.
> > 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch --
> > adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file.
> > 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch --
> > just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in
> > case of end-of-recovery checkpoint so that the state will be
> > DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION.
>
> I've bumped this one to ready-for-committer.  For the record, my
> preference is the second patch (for the reasons discussed upthread).
> Both patches might benefit from a small comment or two, too.

Thanks. I've added a comment to the patch
v2-0001-Skip-control-file-db-state-updation-during-end-of.patch. The
other patch remains the same as the new state
DB_IN_END_OF_RECOVERY_CHECKPOINT introduced there says it all.

Regards,
Bharath Rupireddy.

Attachment
On Thu, Dec 09, 2021 at 07:41:52AM +0530, Bharath Rupireddy wrote:
> On Wed, Dec 8, 2021 at 11:02 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> >
> > On 12/8/21, 3:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Thanks for your thoughts. I'm fine either way, hence attaching two
> > > patches here with and I will leave it for the committer 's choice.
> > > 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch --
> > > adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file.
> > > 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch --
> > > just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in
> > > case of end-of-recovery checkpoint so that the state will be
> > > DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION.
> >
> > I've bumped this one to ready-for-committer.  For the record, my
> > preference is the second patch (for the reasons discussed upthread).
> > Both patches might benefit from a small comment or two, too.
> 
> Thanks. I've added a comment to the patch
> v2-0001-Skip-control-file-db-state-updation-during-end-of.patch. The
> other patch remains the same as the new state
> DB_IN_END_OF_RECOVERY_CHECKPOINT introduced there says it all.
> 

AFAIU is one patch or the other but not both, isn't it?

This habit of putting two conflicting versions of patches on the same 
thread causes http://cfbot.cputube.org/ to fail.

Now; I do think that the secondd patch, the one that just skips update
of the state in control file, is the way to go. The other patch adds too
much complexity for a small return.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



On Mon, Jan 10, 2022 at 10:58 AM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> On Thu, Dec 09, 2021 at 07:41:52AM +0530, Bharath Rupireddy wrote:
> > On Wed, Dec 8, 2021 at 11:02 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> > >
> > > On 12/8/21, 3:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > Thanks for your thoughts. I'm fine either way, hence attaching two
> > > > patches here with and I will leave it for the committer 's choice.
> > > > 1) v1-0001-Add-DB_IN_END_OF_RECOVERY_CHECKPOINT-state-for-co.patch --
> > > > adds new db state DB_IN_END_OF_RECOVERY_CHECKPOINT for control file.
> > > > 2) v1-0001-Skip-control-file-db-state-updation-during-end-of.patch --
> > > > just skips setting db state to DB_SHUTDOWNING and DB_SHUTDOWNED in
> > > > case of end-of-recovery checkpoint so that the state will be
> > > > DB_IN_CRASH_RECOVERY which then changes to DB_IN_PRODUCTION.
> > >
> > > I've bumped this one to ready-for-committer.  For the record, my
> > > preference is the second patch (for the reasons discussed upthread).
> > > Both patches might benefit from a small comment or two, too.
> >
> > Thanks. I've added a comment to the patch
> > v2-0001-Skip-control-file-db-state-updation-during-end-of.patch. The
> > other patch remains the same as the new state
> > DB_IN_END_OF_RECOVERY_CHECKPOINT introduced there says it all.

> Now; I do think that the secondd patch, the one that just skips update
> of the state in control file, is the way to go. The other patch adds too
> much complexity for a small return.

Thanks. Attaching the above patch.

Regards,
Bharath Rupireddy.

Attachment
On Mon, Jan 10, 2022 at 11:04:05AM +0530, Bharath Rupireddy wrote:
> On Mon, Jan 10, 2022 at 10:58 AM Jaime Casanova
> <jcasanov@systemguards.com.ec> wrote:
>> Now; I do think that the secondd patch, the one that just skips update
>> of the state in control file, is the way to go. The other patch adds too
>> much complexity for a small return.
>
> Thanks. Attaching the above patch.

I agree that the addition of DB_IN_END_OF_RECOVERY_CHECKPOINT is not
necessary as the control file state will be reflected in a live server
once it the instance is ready to write WAL after promotion, as much as
I agree that the state stored in the control file because of the
end-of-recovery record does not reflect the reality.

Now, I also find confusing the state of CreateCheckpoint() once this
patch gets applied.  Now the code and comments imply that an
end-of-recovery checkpoint is a shutdown checkpoint because they
perform the same actions, which is fine.  Could it be less confusing
to remove completely the "shutdown" variable instead and replace those
checks with "flags"?  What the patch is doing is one step in this
direction.
--
Michael

Attachment
On 1/24/22, 9:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Now, I also find confusing the state of CreateCheckpoint() once this
> patch gets applied.  Now the code and comments imply that an
> end-of-recovery checkpoint is a shutdown checkpoint because they
> perform the same actions, which is fine.  Could it be less confusing
> to remove completely the "shutdown" variable instead and replace those
> checks with "flags"?  What the patch is doing is one step in this
> direction.

I looked into removing the "shutdown" variable in favor of using
"flags" everywhere, but the patch was quite messy and repetitive.  I
think another way to make things less confusing is to replace
"shutdown" with an inverse variable called "online."  The attached
patch does it this way.

Nathan


Attachment
On Tue, Jan 25, 2022 at 07:20:05PM +0000, Bossart, Nathan wrote:
> I looked into removing the "shutdown" variable in favor of using
> "flags" everywhere, but the patch was quite messy and repetitive.  I
> think another way to make things less confusing is to replace
> "shutdown" with an inverse variable called "online."  The attached
> patch does it this way.

Yeah, that sounds like a good compromise.  At least, I find the whole
a bit easier to follow.

Heikki was planning to commit a large refactoring of xlog.c, so we'd
better wait for that to happen before concluding here.
--
Michael

Attachment
On Wed, Jan 26, 2022 at 6:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 25, 2022 at 07:20:05PM +0000, Bossart, Nathan wrote:
> > I looked into removing the "shutdown" variable in favor of using
> > "flags" everywhere, but the patch was quite messy and repetitive.  I
> > think another way to make things less confusing is to replace
> > "shutdown" with an inverse variable called "online."  The attached
> > patch does it this way.
>
> Yeah, that sounds like a good compromise.  At least, I find the whole
> a bit easier to follow.

v3 LGTM.

> Heikki was planning to commit a large refactoring of xlog.c, so we'd
> better wait for that to happen before concluding here.

Will that patch set fix the issue reported here?

Regards,
Bharath Rupireddy.



On Wed, Jan 26, 2022 at 08:09:16AM +0530, Bharath Rupireddy wrote:
> Will that patch set fix the issue reported here?

Looking at [1], the area of CreateCheckPoint() is left untouched.

[1]: https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi
--
Michael

Attachment
On Wed, Jan 26, 2022 at 12:04 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 26, 2022 at 08:09:16AM +0530, Bharath Rupireddy wrote:
> > Will that patch set fix the issue reported here?
>
> Looking at [1], the area of CreateCheckPoint() is left untouched.
>
> [1]: https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi

I see. IMHO, we can fix the issue reported here then. Thoughts?

Regards,
Bharath Rupireddy.



On Wed, Jan 26, 2022 at 12:13:03PM +0530, Bharath Rupireddy wrote:
> I see. IMHO, we can fix the issue reported here then.

Yes.
--
Michael

Attachment
On Wed, Jan 26, 2022 at 09:57:59AM +0900, Michael Paquier wrote:
> Yeah, that sounds like a good compromise.  At least, I find the whole
> a bit easier to follow.

So, I have been checking this idea in details, and spotted what looks
like one issue in CreateRestartPoint(), as of:
    /*
     * Update pg_control, using current time.  Check that it still shows
     * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
     * this is a quick hack to make sure nothing really bad happens if somehow
     * we get here after the end-of-recovery checkpoint.
     */
    LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
    if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
        ControlFile->checkPointCopy.redo < lastCheckPoint.redo)

This change increases the window making this code path reachable if an
end-of-recovery checkpoint is triggered but not finished at the end of
recovery (possible of course at the end of crash recovery, but
DB_IN_ARCHIVE_RECOVERY is also possible when !IsUnderPostmaster with a
promotion request), before updating ControlFile->checkPointCopy at the
end of the checkpoint because the state could still be
DB_IN_ARCHIVE_RECOVERY.  The window is wider the longer the
end-of-recovery checkpoint.  And this would be the case of an instance
restarted, when a restart point is created.

> Heikki was planning to commit a large refactoring of xlog.c, so we'd
> better wait for that to happen before concluding here.

I have checked that as well, and both are independent.
--
Michael

Attachment
On Thu, Jan 27, 2022 at 02:06:40PM +0900, Michael Paquier wrote:
> So, I have been checking this idea in details, and spotted what looks
> like one issue in CreateRestartPoint(), as of:
>     /*
>      * Update pg_control, using current time.  Check that it still shows
>      * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
>      * this is a quick hack to make sure nothing really bad happens if somehow
>      * we get here after the end-of-recovery checkpoint.
>      */
>     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>     if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
>         ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
> 
> This change increases the window making this code path reachable if an
> end-of-recovery checkpoint is triggered but not finished at the end of
> recovery (possible of course at the end of crash recovery, but
> DB_IN_ARCHIVE_RECOVERY is also possible when !IsUnderPostmaster with a
> promotion request), before updating ControlFile->checkPointCopy at the
> end of the checkpoint because the state could still be
> DB_IN_ARCHIVE_RECOVERY.  The window is wider the longer the
> end-of-recovery checkpoint.  And this would be the case of an instance
> restarted, when a restart point is created.

I wonder if this is actually a problem in practice.  IIUC all of the values
updated in this block should be reset at the end of the end-of-recovery
checkpoint.  Is the intent of the quick hack to prevent those updates after
an end-of-recovery checkpoint completes, or is it trying to block them
after one begins?  It looks like the control file was being updated to
DB_SHUTDOWNING at the beginning of end-of-recovery checkpoints when that
change was first introduced (2de48a8), so I agree that we'd better be
careful with this change.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/



At Tue, 25 Jan 2022 19:20:05 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 1/24/22, 9:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> > Now, I also find confusing the state of CreateCheckpoint() once this
> > patch gets applied.  Now the code and comments imply that an
> > end-of-recovery checkpoint is a shutdown checkpoint because they
> > perform the same actions, which is fine.  Could it be less confusing
> > to remove completely the "shutdown" variable instead and replace those
> > checks with "flags"?  What the patch is doing is one step in this
> > direction.
> 
> I looked into removing the "shutdown" variable in favor of using
> "flags" everywhere, but the patch was quite messy and repetitive.  I
> think another way to make things less confusing is to replace
> "shutdown" with an inverse variable called "online."  The attached
> patch does it this way.

I find that change doesn't work.  As Michael said the "shutdown" is
implies "shutdown checkpoint". And end-of-recovery checkpoint is done
online (means "not-shutdowning").  shutdown_checkpoint works for me.
Renaming "shutdown checkpoint" as "exclusive checkpoint" or so also
works for me but I think it would cause halation (or zealous
objections)..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



At Thu, 27 Jan 2022 10:47:20 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> On Thu, Jan 27, 2022 at 02:06:40PM +0900, Michael Paquier wrote:
> > So, I have been checking this idea in details, and spotted what looks
> > like one issue in CreateRestartPoint(), as of:
> >     /*
> >      * Update pg_control, using current time.  Check that it still shows
> >      * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
> >      * this is a quick hack to make sure nothing really bad happens if somehow
> >      * we get here after the end-of-recovery checkpoint.
> >      */
> >     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> >     if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
> >         ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
> > 
> > This change increases the window making this code path reachable if an
> > end-of-recovery checkpoint is triggered but not finished at the end of
> > recovery (possible of course at the end of crash recovery, but
> > DB_IN_ARCHIVE_RECOVERY is also possible when !IsUnderPostmaster with a
> > promotion request), before updating ControlFile->checkPointCopy at the
> > end of the checkpoint because the state could still be
> > DB_IN_ARCHIVE_RECOVERY.  The window is wider the longer the
> > end-of-recovery checkpoint.  And this would be the case of an instance
> > restarted, when a restart point is created.
> 
> I wonder if this is actually a problem in practice.  IIUC all of the values
> updated in this block should be reset at the end of the end-of-recovery
> checkpoint.  Is the intent of the quick hack to prevent those updates after
> an end-of-recovery checkpoint completes, or is it trying to block them
> after one begins?  It looks like the control file was being updated to
> DB_SHUTDOWNING at the beginning of end-of-recovery checkpoints when that
> change was first introduced (2de48a8), so I agree that we'd better be
> careful with this change.

Putting aside the readyness of the patch, I think what the patch
intends is to avoid starnge state transitions happen during
end-of-recovery checkpoint.

So, I'm confused...

End-of-recovery checkpoint is requested as CHECKPOINT_WAIT, which
seems to me to mean the state is always DB_IN_ARCHIVE_RECOVERY while
the checkpoint is running?  If correct, if server is killed druing the
end-of-recovery checkpoint, the state stays DB_IN_ARCHIVE_RECOVERY
instead of DB_SHUTDOWNING or DB_SHUTDOWNED.  AFAICS there's no
differece between the first two at next startup.  I dont' think
DB_SHUTDOWNED case is not worth considering here.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Fri, Jan 28, 2022 at 06:32:27PM +0900, Kyotaro Horiguchi wrote:
> End-of-recovery checkpoint is requested as CHECKPOINT_WAIT, which
> seems to me to mean the state is always DB_IN_ARCHIVE_RECOVERY while
> the checkpoint is running?

With the patch, yes, we would keep the control file under
DB_IN_{ARCHIVE,CRASH}_RECOVERY during the whole period of the
end-of-recovery checkpoint.  On HEAD, we would have DB_SHUTDOWNING
until the end-of-recovery checkpoint completes, after which we switch
to DB_SHUTDOWNED for a short time before DB_IN_PRODUCTION.

> If correct, if server is killed druing the
> end-of-recovery checkpoint, the state stays DB_IN_ARCHIVE_RECOVERY
> instead of DB_SHUTDOWNING or DB_SHUTDOWNED.  AFAICS there's no
> differece between the first two at next startup.

One difference is the hint given to the user at the follow-up startup.
Crash and archive recovery states can freak people easily as they
mention the risk of corruption.  Using DB_SHUTDOWNING reduces this
window.

> I dont' think DB_SHUTDOWNED case is not worth considering here.

Well, this patch also means there is a small window where we have
DB_IN_ARCHIVE_RECOVERY in the control file with the new timeline and
minRecoveryPoint not set, rather than DB_SHUTDOWNED.
--
Michael

Attachment
On Sat, Jan 29, 2022 at 7:43 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jan 28, 2022 at 06:32:27PM +0900, Kyotaro Horiguchi wrote:
> > End-of-recovery checkpoint is requested as CHECKPOINT_WAIT, which
> > seems to me to mean the state is always DB_IN_ARCHIVE_RECOVERY while
> > the checkpoint is running?
>
> With the patch, yes, we would keep the control file under
> DB_IN_{ARCHIVE,CRASH}_RECOVERY during the whole period of the
> end-of-recovery checkpoint.  On HEAD, we would have DB_SHUTDOWNING
> until the end-of-recovery checkpoint completes, after which we switch
> to DB_SHUTDOWNED for a short time before DB_IN_PRODUCTION.
>
> > If correct, if server is killed druing the
> > end-of-recovery checkpoint, the state stays DB_IN_ARCHIVE_RECOVERY
> > instead of DB_SHUTDOWNING or DB_SHUTDOWNED.  AFAICS there's no
> > differece between the first two at next startup.
>
> One difference is the hint given to the user at the follow-up startup.
> Crash and archive recovery states can freak people easily as they
> mention the risk of corruption.  Using DB_SHUTDOWNING reduces this
 > window.

If the server crashes in end-of-recovery, in the follow-up startup,
the server has to start all the recovery right? In that case,
DB_IN_{ARCHIVE, CRASH}_RECOVERY would represent the correct state to
the user, not the DB_SHUTDOWNING/DB_SHUTDOWNED IMO.

There's another option to have a new state
DB_IN_END_OF_RECOVERY_CHECKPOINT, if the DB_IN_{ARCHIVE,
CRASH}_RECOVERY really scares users of end-of-recovery crash?

Regards,
Bharath Rupireddy.



On Sat, Jan 29, 2022 at 08:07:23PM +0530, Bharath Rupireddy wrote:
> If the server crashes in end-of-recovery, in the follow-up startup,
> the server has to start all the recovery right? In that case,
> DB_IN_{ARCHIVE, CRASH}_RECOVERY would represent the correct state to
> the user, not the DB_SHUTDOWNING/DB_SHUTDOWNED IMO.
>
> There's another option to have a new state
> DB_IN_END_OF_RECOVERY_CHECKPOINT, if the DB_IN_{ARCHIVE,
> CRASH}_RECOVERY really scares users of end-of-recovery crash?

Well, an end-of-recovery checkpoint is a shutdown checkpoint, and it
relies on the same assumptions in terms of checkpoint logic for the
last 10 years or so, so the state of the control file is not wrong
per-se, either.  There are other cases that we may want to worry
about with this change, like the fact that unlogged relation reset
relies on the cluster to be cleanly shut down when we begin entering
the replay loop.  And it seems to me that this has not been looked
at.  A second thing would be the introduction of an invalid LSN
minimum recovery point in the control file while in
DB_IN_ARCHIVE_RECOVERY when we are done with the checkpoint, joining
my point of upthread.

At the end of the day, it may be better to just let this stuff be.
Another argument for doing nothing is that this could cause
hard-to-spot conflicts when it comes to back-patch something.
--
Michael

Attachment
On Mon, Jan 31, 2022 at 04:25:21PM +0900, Michael Paquier wrote:
> At the end of the day, it may be better to just let this stuff be.
> Another argument for doing nothing is that this could cause
> hard-to-spot conflicts when it comes to back-patch something.

This one has been quiet for a while.  Should we mark it as
returned-with-feedback?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Fri, Feb 25, 2022 at 01:09:53PM -0800, Nathan Bossart wrote:
> This one has been quiet for a while.  Should we mark it as
> returned-with-feedback?

Yes, that's my feeling and I got cold feet about this change.  This
patch would bring some extra visibility for something that's not
incorrect either on HEAD, as end-of-recovery checkpoints are the same
things as shutdown checkpoints.  And there is an extra argument where
back-patching would become a bit more tricky in an area that's already
a lot sensitive.
--
Michael

Attachment
At Sat, 26 Feb 2022 12:11:15 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Fri, Feb 25, 2022 at 01:09:53PM -0800, Nathan Bossart wrote:
> > This one has been quiet for a while.  Should we mark it as
> > returned-with-feedback?
> 
> Yes, that's my feeling and I got cold feet about this change.  This
> patch would bring some extra visibility for something that's not
> incorrect either on HEAD, as end-of-recovery checkpoints are the same
> things as shutdown checkpoints.  And there is an extra argument where
> back-patching would become a bit more tricky in an area that's already
> a lot sensitive.

That sounds like we should reject the patch as we don't agree to its
objective.  If someday end-of-recovery checkpoints functionally
diverge from shutdown checkpoints but leave (somehow) the transition
alone, we may visit this again but it would be another proposal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Mon, Feb 28, 2022 at 10:51:06AM +0900, Kyotaro Horiguchi wrote:
> That sounds like we should reject the patch as we don't agree to its
> objective.  If someday end-of-recovery checkpoints functionally
> diverge from shutdown checkpoints but leave (somehow) the transition
> alone, we may visit this again but it would be another proposal.

The patch has been already withdrawn in the CF app.
--
Michael

Attachment