Re: Make mesage at end-of-recovery less scary. - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: Make mesage at end-of-recovery less scary.
Date
Msg-id CAE9k0P=rs4+yAc4=+9Nmm+R_BGFj50wQObhYH2OSC9v7WNbVJw@mail.gmail.com
Whole thread Raw
In response to Re: Make mesage at end-of-recovery less scary.  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Make mesage at end-of-recovery less scary.
List pgsql-hackers
Hi,

On Thu, Feb 10, 2022 at 11:47 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 9 Feb 2022 17:31:02 +0530, Ashutosh Sharma <ashu.coek88@gmail.com> wrote in
> > On Wed, Feb 9, 2022 at 1:14 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > > This means archive-recovery is requested but not started yet. That is,
> > > we've just finished crash recovery.  The existing comment cited
> > > together is mentioning that.
> > >
> > > At the end of PITR (or archive recovery), the other code works.
> > >
> >
> > This is quite understandable, the point here is that the message that
> > we are emitting says, we have just finished reading the wal files in
> > the pg_wal directory during crash recovery and are now entering
> > archive recovery when we are actually doing point-in-time recovery
> > which seems a bit misleading.
>
> Here is the messages.
>
> > 2022-02-08 18:00:44.367 IST [86185] LOG:  starting point-in-time
> > recovery to WAL location (LSN) "0/5227790"
> > 2022-02-08 18:00:44.368 IST [86185] LOG:  database system was not
> > properly shut down; automatic recovery in progress
> > 2022-02-08 18:00:44.369 IST [86185] LOG:  redo starts at 0/14DC8D8
> > 2022-02-08 18:00:44.978 IST [86185] DEBUG1:  reached end of WAL at
> > 0/3FFFFD0 on timeline 1 in pg_wal during crash recovery, entering
> > archive recovery
>
> In the first place the last DEBUG1 is not on my part, but one of the
> messages added by this patch says the same thing.  Is your point that
> archive recovery is different thing from PITR?  In regard to the
> difference, I think PITR is a form of archive recovery.
>

No, I haven't tried to compare archive recovery to PITR or vice versa,
instead I was trying to compare crash recovery with PITR. The message
you're emitting says just before entering into the archive recovery is
- "reached end-of-WAL on ... in pg_wal *during crash recovery*,
entering archive recovery".  This message is static and can be emitted
not only during crash recovery, but also during PITR. I think we can
remove the "during crash recovery" part from this message, so "reached
the end of WAL at %X/%X on timeline %u in %s, entering archive
recovery". Also I don't think we need format specifier %s here, it can
be hard-coded with pg_wal as in this case we can only enter archive
recovery after reading wal from pg_wal, so current WAL source has to
be pg_wal, isn't it?

> That being said, after some thoughts on this, I changed my mind that
> we don't need to say what operation was being performed at the
> end-of-WAL.  So in the attached the end-of-WAL message is not
> accompanied by the kind of recovery.
>
> > LOG:  reached end of WAL at 0/3000000 on timeline 1
>
> I removed the archive-source part along with the operation mode.
> Because it make the message untranslatable.  It is now very simple but
> seems enough.
>
> While working on this, I noticed that we need to set EndOfWAL when
> WaitForWALToBecomeAvailable returned with failure.  That means the
> file does not exist at all so it is a kind of end-of-WAL.  In that
> sense the following existing comment in ReadRecord is a bit wrong.
>
> >        * We only end up here without a message when XLogPageRead()
> >        * failed - in that case we already logged something. In
> >        * StandbyMode that only happens if we have been triggered, so we
> >        * shouldn't loop anymore in that case.
>
> Actually there's a case we get there without a message and without
> logged something when a segment file is not found unless we're in
> standby mode.
>
> > > Well. I guess that the "automatic recovery" is ambiguous.  Does it
> > > make sense if the second line were like the follows instead?
> > >
> > > + 2022-02-08 18:00:44.368 IST [86185] LOG:  database system was not properly shut down; crash recovery in
progress
> > >
> >
> > Well, according to me the current message looks fine.
>
> Good to hear. (In the previos version I modified the message by accident..)
>
> > $chkptfile is declared twice in the same scope. We can probably remove
> > the first one.
>
> Ugh.. Fixed.  (I wonder why Perl doesn't complain on this..)
>
>
> In this version 12 I made the following changes.
>
> - Rewrote (halfly reverted) a comment in ReadRecord
>
> - Simplified the "reached end of WAL" message by removing recovery
>   mode and WAL source in ReadRecord.
>
> - XLogPageRead sets EndOfWAL flag in the ENOENT case.
>
> - Removed redundant declaration of the same variable in TAP script.
>

Thanks for the changes. Please note that I am not able to apply the
latest patch on HEAD. Could you please rebase it on HEAD and share the
new version. Thank you.

--
With Regards,
Ashutosh Sharma.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Allow parallel plan for referential integrity checks?
Next
From: Tom Lane
Date:
Subject: Re: sockaddr_un.sun_len vs. reality