Re: Make mesage at end-of-recovery less scary. - Mailing list pgsql-hackers
From | Ashutosh Sharma |
---|---|
Subject | Re: Make mesage at end-of-recovery less scary. |
Date | |
Msg-id | CAE9k0P=9gBHdBff2Rhq9pwa9ghNft_aHHSmO51K+EdDnfnkpFQ@mail.gmail.com Whole thread Raw |
In response to | Re: Make mesage at end-of-recovery less scary. (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Make mesage at end-of-recovery less scary.
|
List | pgsql-hackers |
On Wed, Feb 9, 2022 at 1:14 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Hi, Ashutosh. > > At Tue, 8 Feb 2022 18:35:34 +0530, Ashutosh Sharma <ashu.coek88@gmail.com> wrote in > > Here are some of my review comments on the v11 patch: > > Thank you for taking a look on this. > > > - (errmsg_internal("reached end of WAL in > > pg_wal, entering archive recovery"))); > > + (errmsg_internal("reached end of WAL at %X/%X > > on timeline %u in %s during crash recovery, entering archive > > recovery", > > + LSN_FORMAT_ARGS(ErrRecPtr), > > + replayTLI, > > + xlogSourceNames[currentSource]))); > > > > Why crash recovery? Won't this message get printed even during PITR? > > It is in the if-block with the following condition. > > > * If archive recovery was requested, but we were still doing > > * crash recovery, switch to archive recovery and retry using the > > * offline archive. We have now replayed all the valid WAL in > > * pg_wal, so we are presumably now consistent. > ... > > if (!InArchiveRecovery && ArchiveRecoveryRequested) > > This means archive-recovery is requested but not started yet. That is, > we've just finished crash recovery. The existing comment cited > together is mentioning that. > > At the end of PITR (or archive recovery), the other code works. > This is quite understandable, the point here is that the message that we are emitting says, we have just finished reading the wal files in the pg_wal directory during crash recovery and are now entering archive recovery when we are actually doing point-in-time recovery which seems a bit misleading. > > /* > > * If we haven't emit an error message, we have safely reached the > > * end-of-WAL. > > */ > > if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG) > > { > > char *fmt; > > > > if (StandbyMode) > > fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during standby mode"); > > else if (InArchiveRecovery) > > fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during archive recovery"); > > else > > fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during crash recovery"); > > The last among the above messages is choosed when archive-recovery is > not requested at all. > > > I just did a PITR and could see these messages in the logfile. > > Yeah, the log lines are describing that the server starting with crash > recovery to run PITR. > > > 2022-02-08 18:00:44.367 IST [86185] LOG: starting point-in-time > > recovery to WAL location (LSN) "0/5227790" > > 2022-02-08 18:00:44.368 IST [86185] LOG: database system was not > > properly shut down; automatic recovery in progress > > Well. I guess that the "automatic recovery" is ambiguous. Does it > make sense if the second line were like the follows instead? > > + 2022-02-08 18:00:44.368 IST [86185] LOG: database system was not properly shut down; crash recovery in progress > Well, according to me the current message looks fine. > > Lastly, are we also planning to backport this patch? > > This is apparent a behavioral change, not a bug fix, which I think we > regard as not appropriate for back-patching. > > > As the result, I made the following chages in the version 11. > > 1. Changed the condition for the "end-of-WAL" message from > emode_for_corrupt_record to the EndOfWAL flag. > > 2. Corrected the wording of end-of-wal to end-of-WAL. > > 3. In the "reached end of WAL" message, report the LSN of the > beginning of failed record instead of the beginning of the > last-succeeded record. > > 4. In the changed message in walreceiver.c, I swapped LSN and timeline > so that they are in the same order with other similar messages. > Thanks for sharing this information. == Here is one more comment: One more comment: +# identify REDO WAL file +my $cmd = "pg_controldata -D " . $node->data_dir(); +my $chkptfile; +$cmd = ['pg_controldata', '-D', $node->data_dir()]; +$stdout = ''; +$stderr = ''; +IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr; +ok($stdout =~ /^Latest checkpoint's REDO WAL file:[ \t] *(.+)$/m, + "checkpoint file is identified"); +my $chkptfile = $1; $chkptfile is declared twice in the same scope. We can probably remove the first one. -- With Regards, Ashutosh Sharma.
pgsql-hackers by date: