RE: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Stronger safeguard for archive recovery not to miss data
Date
Msg-id OSBPR01MB488847BC24220F046DB6BEB4ED769@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Stronger safeguard for archive recovery not to miss data  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Stronger safeguard for archive recovery not to miss data
List pgsql-hackers
On Tuesday, April 6, 2021 3:24 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> At Tue, 6 Apr 2021 04:11:35 +0000, "osumi.takamichi@fujitsu.com"
> <osumi.takamichi@fujitsu.com> wrote in
> > On Tuesday, April 6, 2021 9:41 AM Fujii Masao
> > <masao.fujii@oss.nttdata.com>
> > > On 2021/04/05 23:54, osumi.takamichi@fujitsu.com wrote:
> > > >> This makes me think that we should document this risk.... Thought?
> > > > +1. We should notify the risk when user changes
> > > > the wal_level higher than minimal to minimal to invoke a
> > > > carefulness of user for such kind of operation.
> > >
> > > I removed the HINT message "or recover to the point in ..." and
> > > added the following note into the docs.
> > >
> > >      Note that changing <varname>wal_level</varname> to
> > >      <literal>minimal</literal> makes any base backups taken before
> > >      unavailable for archive recovery and standby server, which may
> > >      lead to database loss.
> > Thank you for updating the patch. Let's make the sentence more strict.
> >
> > My suggestion for this explanation is
> > "In order to prevent database corruption, changing wal_level to
> > minimal from higher level in the middle of WAL archiving requires
> > careful attention. It makes any base backups taken before the
> > operation unavailable for archive recovery and standby server. Also,
> > it may lead to whole database loss when archive recovery fails with an
> > error for that change.
> > Take a new base backup immediately after making wal_level back to higher
> level."
>
> The first sentense looks like somewhat nanny-ish.  The database is not
> corrupt at the time of this error.
Yes. Excuse me for misleading sentence.
I just wanted to write why the error was introduced,
but it was not necessary.
We should remove and fix the first part of the sentence.

> We just lose updates after the last read
> segment at this point.  As Fujii-san said, we can continue recoverying using
> crash recovery and we will reach having a corrupt database after that.
OK. Thank you for explanation.


> About the last sentence, I prefer more flat wording, such as "You need to take
> a new base backup..."
Either is fine to me.

> > Then, we can be consistent with our new hint message, "Use a backup
> > taken after setting wal_level to higher than minimal.".
>
> > Is it better to add something similar to "Take an offline backup when
> > you stop the server and change the wal_level" around the end of this part as
> another option for safeguard, also?
>
> Backup policy is completely a matter of DBAs.
OK. No problem. No need to add it.

> If flipping wal_level alone
> highly causes unstartable corruption,,, I think it is a bug.
> > For the performance technique part, what we need to explain is same.
>
> Might be good, but in simpler wording.
Yeah, I agree.

> > Another minor thing I felt we need to do might be to add double quotes to
> wrap minimal in errhint.
>
> Since the error about hot_standby has gone, either will do for me.
Thanks for sharing your thoughts.


Best Regards,
    Takamichi Osumi




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Replication slot stats misgivings
Next
From: Amit Langote
Date:
Subject: Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)