Re: snapper vs. HEAD - Mailing list pgsql-hackers

From Andres Freund
Subject Re: snapper vs. HEAD
Date
Msg-id 20200329231708.5yop3ni3rutjmkkh@alap3.anarazel.de
Whole thread Raw
In response to snapper vs. HEAD  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: snapper vs. HEAD
List pgsql-hackers
Hi,

On 2020-03-28 23:50:32 -0400, Tom Lane wrote:
> Buildfarm member snapper has been crashing in the core regression tests
> since commit 17a28b0364 (well, there's a bit of a range of uncertainty
> there, but 17a28b0364 looks to be the only such commit that could have
> affected code in gistget.c where the crash is).  Curiously, its sibling
> skate is *not* failing, despite being on the same machine and compiler.

Hm. There's some difference in code-gen specific options.

snapper has:
'CFLAGS' => '-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security ',
'CPPFLAGS' => '-D_FORTIFY_SOURCE=2',
'LDFLAGS' => '-Wl,-z,relro -Wl,-z,now'
and specifies (among others)
                                      '--enable-thread-safety',
                                      '--with-gnu-ld',
whereas skate has --enable-cassert.

Not too hard to imagine that several of these could cause enough
code-gen differences so that one exhibits the bug, and the other
doesn't.


The different commandlines for gistget end up being:

snapper:
ccache gcc-4.7 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla
-Wendif-labels-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g
-g-O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security  -I../../../../src/include
-D_FORTIFY_SOURCE=2-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/mit-krb5  -c -o gistget.o gistget.c
 
skate:
ccache gcc-4.7 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla
-Wendif-labels-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g
-O2-I../../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o gistget.o gistget.c
 


> I looked into this by dint of setting up a similar environment in a
> qemu VM.  I might not have reproduced things exactly, but I managed
> to get the same kind of crash at approximately the same place, and
> what it looks like to me is a compiler bug.

What options were you using? Reproducing snapper as exactly as possible?


> It's unclear how 17a28b0364 would have affected this, but there is
> an elog() call elsewhere in the same function, so maybe the new
> coding for that changed register assignments or some other
> phase-of-the-moon effect.

Yea, wouldn't be too surprising.


> I doubt that anyone's going to take much interest in fixing this
> old compiler version, so my recommendation is to back off the
> optimization level on snapper to -O1, and probably on skate as
> well because there's no obvious reason why the same compiler bug
> might not bite skate at some point.  I was able to get through
> the core regression tests on my qemu VM after recompiling
> gistget.c at -O1 (with other flags the same as snapper is using).

If you still have the environment it might make sense to check wether
it's related to one of the other options. But otherwise I wouldn't be
against the proposal.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()
Next
From: Andres Freund
Date:
Subject: Re: DROP DATABASE doesn't force other backends to close FDs