RE: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers
From | osumi.takamichi@fujitsu.com |
---|---|
Subject | RE: Stronger safeguard for archive recovery not to miss data |
Date | |
Msg-id | OSBPR01MB48882EF2845B2C152BBBA9F7ED779@OSBPR01MB4888.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Stronger safeguard for archive recovery not to miss data (Fujii Masao <masao.fujii@oss.nttdata.com>) |
List | pgsql-hackers |
On Monday, April 5, 2021 9:16 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2021/04/05 16:13, Kyotaro Horiguchi wrote: > > At Mon, 5 Apr 2021 12:34:53 +0900, Fujii Masao > > <masao.fujii@oss.nttdata.com> wrote in > >> > >> > >> On 2021/04/04 11:58, osumi.takamichi@fujitsu.com wrote: > >>>> IMO it's better to comment why this server restart is necessary. > >>>> As far as I understand correctly, this is necessary to ensure the > >>>> WAL file containing the record about the change of wal_level (to > >>>> minimal) is archived, so that the subsequent archive recovery will > >>>> be able to replay it. > >>> OK, added some comments. Further, I felt the way I wrote this part > >>> was not good at all and self-evident and developers who read this > >>> test would feel uneasy about that point. > >>> So, a little bit fixed that test so that we can get clearer > >>> conviction for wal archive. > >> > >> LGTM. Thanks for updating the patch! > >> > >> Attached is the updated version of the patch. I applied the following > >> changes. > > > > + errhint("Use a backup taken after setting > wal_level to higher than minimal " > > + "or recover to the point in > time before wal_level was changed > > +to minimal even though it may cause data loss."))); > > > > Looking the HINT message, I thought that it's hard to find where up to > > I should recover. > > Yes. And, what's the worse, when archive recovery finds WAL generated with > wal_level=minimal and fails, "minimal" is saved in pg_control's wal_level. > This means that subsequent archive recovery always fails at the beginning of > recovery (before entering WAL replay main loop), in that case. > So even if recovery_targrt_lsn is specified, archive recovery fails before > checking that. Any recovery target settings have no effect on that case. > > Maybe we can avoid this, for example, by changing xlog_redo() so that it calls > CheckRequiredParameterValues() before UpdateControlFile(). > But I'm not sure if this change is safe. Probably we need more time to > consider this, but right now there is no so much time left at this stage. > > At least the HINT message "or recover to the point in time before wal_level > was changed to minimal even though it may cause data loss." should be > removed because it's not helpful at all... > > Ok, so if archive recovery finds WAL generated with wal_level=minimal and > fails, and also there is no backup taken after wal_level is set to higher than > minimal, basically [1] we lose whole database. I think that those who set > wal_level to minimal understand that this setting can cause data loss, for > example, any data loaded with wal_level=minimal may be lost later. But I'm > afraid that they might not understand the risk of whole database loss. > > Even if they take new backup just after they set wal_level to higher than > minimal, there is still the risk of whole database loss until the backup is > completed. > > This makes me think that we should document this risk.... Thought? +1. We should notify the risk when user changes the wal_level higher than minimal to minimal to invoke a carefulness of user for such kind of operation. Best Regards, Takamichi Osumi
pgsql-hackers by date: