Re: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Stronger safeguard for archive recovery not to miss data |
Date | |
Msg-id | 20210331.150328.1215243902394445240.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Stronger safeguard for archive recovery not to miss data (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Stronger safeguard for archive recovery not to miss data
|
List | pgsql-hackers |
At Wed, 31 Mar 2021 02:11:48 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2021/03/26 22:14, David Steele wrote: > > On 3/25/21 9:23 PM, Fujii Masao wrote: > >> > >> On 2021/03/25 23:21, David Steele wrote: > >>> On 1/25/21 3:55 AM, Laurenz Albe wrote: > >>>> On Mon, 2021-01-25 at 08:19 +0000, osumi.takamichi@fujitsu.com wrote: > >>>>>> I think you should pst another patch where the second, now > >>>>>> superfluous, error > >>>>>> message is removed. > >>>>> > >>>>> Updated. This patch showed no failure during regression tests > >>>>> and has been aligned by pgindent. > >>>> > >>>> Looks good to me. > >>>> I'll set it to "ready for committer" again. > >>> > >>> Fujii, does the new patch in [1] address your concerns? > >> > >> No. I'm still not sure if this patch is good idea... I understand > >> why this safeguard is necessary. OTOH I'm afraid it increases > >> a bit the risk that users get unstartable database, i.e., lose whole > >> database. > >> But maybe I'm concerned about rare case and my opinion is minority > >> one. > >> So I'd like to hear more opinions about this patch. > > After reviewing the patch I am +1. I think allowing corruption in > > recovery by default is not a good idea. There is currently a warning > > but I don't think that is nearly strong enough and is easily missed. > > Also, "data may be missing" makes this sound like an acceptable > > situation. What is really happening is corruption is being introduced > > that may make the cluster unusable or at the very least lead to errors > > during normal operation. > > Ok, now you, Osumi-san and Laurenz agree to this change > while I'm only the person not to like this. So unless we can hear > any other objections to this change, probably we should commit the > patch. So +1 from me in its direction. > > If we want to allow recovery to play past this point I think it would > > make more sense to have a GUC (like ignore_invalid_pages [1]) that > > allows recovery to proceed and emits a warning instead of fatal. > > Looking the patch, I see a few things: > > 1) Typo in the tests: > > This protection shold apply to recovery mode > > should be: > > This protection should apply to recovery mode > > 2) I don't think it should be the job of this patch to restructure the > > if conditions, even if it is more efficient. It just obscures the > > purpose of the patch. > > +1 > > > So, I would revert all the changes in xlog.c except changing the > > warning to an error: > > - ereport(WARNING, > > - (errmsg("WAL was generated with wal_level=minimal, > > -data may be missing"), > > - errhint("This happens if you temporarily set > > -wal_level=minimal without taking a new base backup."))); > > + ereport(FATAL, > > + (errmsg("WAL was generated with > > wal_level=minimal, cannot continue recovering"), > > + errdetail("This happens if you temporarily set > > wal_level=minimal on the server."), > > + errhint("Run recovery again from a new base > > backup taken after setting wal_level higher than minimal"))); > I guess that users usually encounter this error because they have not > taken base backups yet after setting wal_level to higher than minimal > and have to use the old base backup for archive recovery. So I'm not > sure > how much only this HINT is helpful for them. Isn't it better to append > something like "If there is no such backup, recover to the point in > time > before wal_level is set to minimal even though which cause data loss, > to start the server." into HINT? I agree that the hint doesn't make sense. HINT: Restart with archive recovery turned off. The past backups are no longer usable. You need to take a new one afterrestart. If it's the replica case, it would be.. HINT: Start from a fresh standby created from the curent primary server. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: