Re: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Stronger safeguard for archive recovery not to miss data
Date
Msg-id 20210331.150328.1215243902394445240.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Stronger safeguard for archive recovery not to miss data  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Stronger safeguard for archive recovery not to miss data
List pgsql-hackers
At Wed, 31 Mar 2021 02:11:48 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>
>
> On 2021/03/26 22:14, David Steele wrote:
> > On 3/25/21 9:23 PM, Fujii Masao wrote:
> >>
> >> On 2021/03/25 23:21, David Steele wrote:
> >>> On 1/25/21 3:55 AM, Laurenz Albe wrote:
> >>>> On Mon, 2021-01-25 at 08:19 +0000, osumi.takamichi@fujitsu.com wrote:
> >>>>>> I think you should pst another patch where the second, now
> >>>>>> superfluous, error
> >>>>>> message is removed.
> >>>>>
> >>>>> Updated. This patch showed no failure during regression tests
> >>>>> and has been aligned by pgindent.
> >>>>
> >>>> Looks good to me.
> >>>> I'll set it to "ready for committer" again.
> >>>
> >>> Fujii, does the new patch in [1] address your concerns?
> >>
> >> No. I'm still not sure if this patch is good idea... I understand
> >> why this safeguard is necessary. OTOH I'm afraid it increases
> >> a bit the risk that users get unstartable database, i.e., lose whole
> >> database.
> >> But maybe I'm concerned about rare case and my opinion is minority
> >> one.
> >> So I'd like to hear more opinions about this patch.
> > After reviewing the patch I am +1. I think allowing corruption in
> > recovery by default is not a good idea. There is currently a warning
> > but I don't think that is nearly strong enough and is easily missed.
> > Also, "data may be missing" makes this sound like an acceptable
> > situation. What is really happening is corruption is being introduced
> > that may make the cluster unusable or at the very least lead to errors
> > during normal operation.
>
> Ok, now you, Osumi-san and Laurenz agree to this change
> while I'm only the person not to like this. So unless we can hear
> any other objections to this change, probably we should commit the
> patch.

So +1 from me in its direction.

> > If we want to allow recovery to play past this point I think it would
> > make more sense to have a GUC (like ignore_invalid_pages [1]) that
> > allows recovery to proceed and emits a warning instead of fatal.
> > Looking the patch, I see a few things:
> > 1) Typo in the tests:
> > This protection shold apply to recovery mode
> > should be:
> > This protection should apply to recovery mode
> > 2) I don't think it should be the job of this patch to restructure the
> > if conditions, even if it is more efficient. It just obscures the
> > purpose of the patch.
>
> +1
>
> > So, I would revert all the changes in xlog.c except changing the
> > warning to an error:
> > -        ereport(WARNING,
> > -                (errmsg("WAL was generated with wal_level=minimal,
> > -data may be missing"),
> > -                 errhint("This happens if you temporarily set
> > -wal_level=minimal without taking a new base backup.")));
> > +            ereport(FATAL,
> > +                    (errmsg("WAL was generated with
> > wal_level=minimal, cannot continue recovering"),
> > +                     errdetail("This happens if you temporarily set
> > wal_level=minimal on the server."),
> > +                     errhint("Run recovery again from a new base
> > backup taken after setting wal_level higher than minimal")));
> I guess that users usually encounter this error because they have not
> taken base backups yet after setting wal_level to higher than minimal
> and have to use the old base backup for archive recovery. So I'm not
> sure
> how much only this HINT is helpful for them. Isn't it better to append
> something like "If there is no such backup, recover to the point in
> time
> before wal_level is set to minimal even though which cause data loss,
> to start the server." into HINT?

I agree that the hint doesn't make sense.

HINT:  Restart with archive recovery turned off.  The past backups are no longer usable.  You need to take a new one
afterrestart. 

If it's the replica case, it would be..

HINT:  Start from a fresh standby created from the curent primary server.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Replication slot stats misgivings
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Stronger safeguard for archive recovery not to miss data