RE: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Stronger safeguard for archive recovery not to miss data
Date
Msg-id OSBPR01MB4888297D2CEA401A2B05C1BDED769@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Stronger safeguard for archive recovery not to miss data  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
List pgsql-hackers
On Tuesday, April 6, 2021 8:32 AM Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> On Monday, April 5, 2021 11:49 PM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com>
> > On Mon Apr 5, 2021 12:35 PM Fujii Masao <masao.fujii@oss.nttdata.com>
> > wrote:
> > > >>> By the way, when I build postgres with this patch and
> > > >>> enable-coverage option, the results of RT becomes unstable. Does
> > > >>> someone know the
> > > >> reason ?
> > > >>> When it fails, I get stderr like below
> > > >>
> > > >> I have no idea about this. Does this happen even without the patch?
> > > > Unfortunately, no. I get this only with --enable-coverage and with
> > > > my patch, althought regression tests have passed with this patch.
> > > > OSS HEAD doesn't produce the stderr even with --enable-coverage.
> > >
> > > Could you check whether the latest patch still causes this issue or not?
> > > If it still causes, could you check which part (the change of xlog.c
> > > or the addition of regression test) caused the issue?
> > v07 reproduces the phenomena, even with make coverage-clean between
> > tests.
> > The possibility is not high though.
> >
> > We cannot do the regression test separately from xlog.c because it
> > uses the new error message of xlog.c.
> > Applying only the TAP test should fail because we get an warning not error.
> >
> > Therefore, I took the changes of xlog.c only and I'm doing the RT in a
> > loop now. If we can get the stderr again, then we can guess xlog.c is
> > the cause, right ?
> >
> > I think I can report the result tomorrow.
> > Just in case, I'm running the RT for OSS HEAD in parallel...
> > although I cannot reproduce it with it at all.
> I really apologie that this OSS HEAD reproduced that stderr with success of
> RT.
> I executed check-world in parallel with -j option so the reason should be what
> Tsunakawa-san told us.
> Its probability is pretty low.
> I'm so sorry for making noises loudly.
> Therefore, I don't have any concern left.
This is *not* due to the patch but for future analysis.
The phenomena happens with a very little possibility, and in other case,
with --enable-coverage and make check-world causes an error like below.
I used gcc 8.

#   Failed test 'pg_ctl start: no stderr'
#   at t/001_start_stop.pl line 48.
#          got: 'profiling:/home/(path/to/oss/head)/src/backend/utils/adt/regproc.gcda:Merge mismatch for function 24
# '
#     expected: ''
# Looks like you failed 1 test of 24.
make[2]: *** [Makefile:50: check] Error 1
make[1]: *** [Makefile:43: check-pg_ctl-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [GNUmakefile:71: check-world-src/bin-recurse] Error 2
make: *** Waiting for unfinished jobs....

The steps I used are
$ git clone and cd to OSS HEAD
$ ./configure --enable-coverage --enable-cassert --enable-debug --enable-tap-tests --with-icu CFLAGS=-O0
--prefix=/where/to/put/binary
$ make -j4 2> make.log
$ make check-world -j4 2> make_check_world.log

Best Regards,
    Takamichi Osumi


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: New IndexAM API controlling index vacuum strategies
Next
From: Peter Geoghegan
Date:
Subject: Re: New IndexAM API controlling index vacuum strategies