RE: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Stronger safeguard for archive recovery not to miss data
Date
Msg-id OSBPR01MB488886DB12B35261CDB44DBAED7B9@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Stronger safeguard for archive recovery not to miss data  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Stronger safeguard for archive recovery not to miss data  (Fujii Masao <masao.fujii@oss.nttdata.com>)
RE: Stronger safeguard for archive recovery not to miss data  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
Hi,


On Wednesday, March 31, 2021 3:06 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> At Wed, 31 Mar 2021 15:03:28 +0900 (JST), Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote in
> > At Wed, 31 Mar 2021 02:11:48 +0900, Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote in
> > > > So, I would revert all the changes in xlog.c except changing the
> > > > warning to an error:
> > > > -        ereport(WARNING,
> > > > -                (errmsg("WAL was generated with
> > > > wal_level=minimal, -data may be missing"),
> > > > -                 errhint("This happens if you temporarily set
> > > > -wal_level=minimal without taking a new base backup.")));
> > > > +            ereport(FATAL,
> > > > +                    (errmsg("WAL was generated with
> > > > wal_level=minimal, cannot continue recovering"),
> > > > +                     errdetail("This happens if you temporarily
> > > > +set
> > > > wal_level=minimal on the server."),
> > > > +                     errhint("Run recovery again from a new
> base
> > > > backup taken after setting wal_level higher than minimal")));
> > > I guess that users usually encounter this error because they have
> > > not taken base backups yet after setting wal_level to higher than
> > > minimal and have to use the old base backup for archive recovery. So
> > > I'm not sure how much only this HINT is helpful for them. Isn't it
> > > better to append something like "If there is no such backup, recover
> > > to the point in time before wal_level is set to minimal even though
> > > which cause data loss, to start the server." into HINT?
> >
> > I agree that the hint doesn't make sense.
> 
> For the primary case,
> 
> > HINT:  Restart with archive recovery turned off.  The past backups are no
> longer usable.  You need to take a new one after restart.
> >
> > If it's the replica case, it would be..
> >
> > HINT:  Start from a fresh standby created from the curent primary server.
> 
> Start from a fresh backup...
Thank you for sharing your ideas about the hint. Absolutely need to change the message.
In my opinion, combining the basic idea of yours and Fujii-san's would be the best.

Updated the patch and made v05. The changes I made are

* rewording of errhint although this has become long !
* fix of the typo in the TAP test
* modification of my past changes not to change conditions in CheckRequiredParameterValues
* rename of the test file to 024_archive_recovery.pl because two files are made
    since the last update of this patch
* pgindent is conducted to check my alignment again.

By the way, when I build postgres with this patch and enable-coverage option,
the results of RT becomes unstable. Does someone know the reason ?
When it fails, I get stderr like below

t/001_start_stop.pl .. 10/24
#   Failed test 'pg_ctl start: no stderr'
#   at t/001_start_stop.pl line 48.
#          got: 'profiling:/home/k5user/new_disk/recheck/PostgreSQL-Source-Dev/src/backend/executor/execMain.gcda:Merge
mismatchfor function 15
 
# '
#     expected: ''
t/001_start_stop.pl .. 24/24 # Looks like you failed 1 test of 24.
t/001_start_stop.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/24 subtests

Similar phenomena was observed in [1] and its solution
seems to upgrade my gcc higher than 7. And, I did so but still get this unstable error with
enable-coverage. This didn't happen when I remove enable-option and
the make check-world passes.


[1] - https://www.mail-archive.com/pgsql-hackers@postgresql.org/msg323147.html


Best Regards,
    Takamichi Osumi


Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: MultiXact\SLRU buffers configuration
Next
From: Masahiko Sawada
Date:
Subject: Re: Flaky vacuum truncate test in reloptions.sql