RE: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Stronger safeguard for archive recovery not to miss data
Date
Msg-id OSBPR01MB48882EF2845B2C152BBBA9F7ED779@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Stronger safeguard for archive recovery not to miss data  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On  Monday, April 5, 2021 9:16 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/04/05 16:13, Kyotaro Horiguchi wrote:
> > At Mon, 5 Apr 2021 12:34:53 +0900, Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote in
> >>
> >>
> >> On 2021/04/04 11:58, osumi.takamichi@fujitsu.com wrote:
> >>>> IMO it's better to comment why this server restart is necessary.
> >>>> As far as I understand correctly, this is necessary to ensure the
> >>>> WAL file containing the record about the change of wal_level (to
> >>>> minimal) is archived, so that the subsequent archive recovery will
> >>>> be able to replay it.
> >>> OK, added some comments. Further, I felt the way I wrote this part
> >>> was not good at all and self-evident and developers who read this
> >>> test would feel uneasy about that point.
> >>> So, a little bit fixed that test so that we can get clearer
> >>> conviction for wal archive.
> >>
> >> LGTM. Thanks for updating the patch!
> >>
> >> Attached is the updated version of the patch. I applied the following
> >> changes.
> >
> > +                 errhint("Use a backup taken after setting
> wal_level to higher than minimal "
> > +                         "or recover to the point in
> time before wal_level was changed
> > +to minimal even though it may cause data loss.")));
> >
> > Looking the HINT message, I thought that it's hard to find where up to
> > I should recover.
> 
> Yes. And, what's the worse, when archive recovery finds WAL generated with
> wal_level=minimal and fails, "minimal" is saved in pg_control's wal_level.
> This means that subsequent archive recovery always fails at the beginning of
> recovery (before entering WAL replay main loop), in that case.
> So even if recovery_targrt_lsn is specified, archive recovery fails before
> checking that. Any recovery target settings have no effect on that case.
> 
> Maybe we can avoid this, for example, by changing xlog_redo() so that it calls
> CheckRequiredParameterValues() before UpdateControlFile().
> But I'm not sure if this change is safe. Probably we need more time to
> consider this, but right now there is no so much time left at this stage.
> 
> At least the HINT message "or recover to the point in time before wal_level
> was changed to minimal even though it may cause data loss." should be
> removed because it's not helpful at all...
> 
> Ok, so if archive recovery finds WAL generated with wal_level=minimal and
> fails, and also there is no backup taken after wal_level is set to higher than
> minimal, basically [1] we lose whole database. I think that those who set
> wal_level to minimal understand that this setting can cause data loss, for
> example, any data loaded with wal_level=minimal may be lost later. But I'm
> afraid that they might not understand the risk of whole database loss.
> 
> Even if they take new backup just after they set wal_level to higher than
> minimal, there is still the risk of whole database loss until the backup is
> completed.
> 
> This makes me think that we should document this risk.... Thought?
+1. We should notify the risk when user changes
the wal_level higher than minimal to minimal
to invoke a carefulness of user for such kind of operation.


Best Regards,
    Takamichi Osumi


pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Stronger safeguard for archive recovery not to miss data
Next
From: Fujii Masao
Date:
Subject: Re: Get memory contexts of an arbitrary backend process