Re: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Stronger safeguard for archive recovery not to miss data
Date
Msg-id 20210406.152421.159594903939861443.horikyota.ntt@gmail.com
Whole thread Raw
In response to RE: Stronger safeguard for archive recovery not to miss data  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Responses RE: Stronger safeguard for archive recovery not to miss data
List pgsql-hackers
At Tue, 6 Apr 2021 04:11:35 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in 
> On Tuesday, April 6, 2021 9:41 AM Fujii Masao <masao.fujii@oss.nttdata.com>
> > On 2021/04/05 23:54, osumi.takamichi@fujitsu.com wrote:
> > >> This makes me think that we should document this risk.... Thought?
> > > +1. We should notify the risk when user changes
> > > the wal_level higher than minimal to minimal to invoke a carefulness
> > > of user for such kind of operation.
> > 
> > I removed the HINT message "or recover to the point in ..." and added the
> > following note into the docs.
> > 
> >      Note that changing <varname>wal_level</varname> to
> >      <literal>minimal</literal> makes any base backups taken before
> >      unavailable for archive recovery and standby server, which may
> >      lead to database loss.
> Thank you for updating the patch. Let's make the sentence more strict.
> 
> My suggestion for this explanation is
> "In order to prevent database corruption, changing
> wal_level to minimal from higher level in the middle of
> WAL archiving requires careful attention. It makes any base backups
> taken before the operation unavailable for archive recovery
> and standby server. Also, it may lead to whole database loss when
> archive recovery fails with an error for that change.
> Take a new base backup immediately after making wal_level back to higher level."

The first sentense looks like somewhat nanny-ish.  The database is not
corrupt at the time of this error. We just lose updates after the last
read segment at this point.  As Fujii-san said, we can continue
recoverying using crash recovery and we will reach having a corrupt
database after that.

About the last sentense, I prefer more flat wording, such as "You need
to take a new base backup..."

> Then, we can be consistent with our new hint message,
> "Use a backup taken after setting wal_level to higher than minimal.".
> 
> Is it better to add something similar to "Take an offline backup when you stop the server
> and change the wal_level" around the end of this part as another option for safeguard, also?

Backup policy is completely a matter of DBAs.  If flipping wal_level
alone highly causes unstartable corruption,,, I think it is a bug.

> For the performance technique part, what we need to explain is same.

Might be good, but in simpler wording.

> Another minor thing I felt we need to do might be to add double quotes to wrap minimal in errhint.

Since the error about hot_standby has gone, either will do for me.

> Other errhints do so when we use it in a sentence.
> 
> There is no more additional comment from me !

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: policies with security definer option for allowing inline optimization
Next
From: Amit Kapila
Date:
Subject: Re: Replication slot stats misgivings