Thread: [HACKERS] initdb failure on Debian sid/mips64el in EventTriggerEndCompleteQuery

[HACKERS] initdb failure on Debian sid/mips64el in EventTriggerEndCompleteQuery

From
Christoph Berg
Date:
10beta3 and 9.6.4 are both failing during initdb on mips64el on
Debian/sid (unstable):

https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.6&arch=mips64el&ver=9.6.4-1&stamp=1502374949&raw=0
https://buildd.debian.org/status/fetch.php?pkg=postgresql-10&arch=mips64el&ver=10%7Ebeta3-1&stamp=1502535836&raw=0

All other architectures have succeeded, as well as the 9.6.4 build for
Debian/stretch (stable) on mips64el. The difference might be the
compiler version (6.3.0 vs 7.1.0).

Command was: "initdb" -D "/home/myon/postgresql-9.6/postgresql-9.6-9.6.3/build/src/t
est/regress/./tmp_check/data" --noclean --nosync > "/home/myon/postgresql-9.6/postgr
esql-9.6-9.6.3/build/src/test/regress/log/initdb.log" 2>&1

******** build/src/test/regress/log/initdb.log ********
Running in noclean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "myon".
This user must also own the server process.

The database cluster will be initialized with locales COLLATE:  de_DE.utf8 CTYPE:    de_DE.utf8 MESSAGES: C MONETARY:
de_DE.utf8NUMERIC:  de_DE.utf8 TIME:     de_DE.utf8
 
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "german".

Data page checksums are disabled.

creating directory /home/myon/postgresql-9.6/postgresql-9.6-9.6.3/build/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... Segmentation fault (core dumped)
child process exited with exit code 139


$ gdb build/tmp_install/usr/lib/postgresql/9.6/bin/postgres build/src/test/regress/tmp_check/data/core 
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "mips64el-linux-gnuabi64".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from build/tmp_install/usr/lib/postgresql/9.6/bin/postgres...done.
[New LWP 24217]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/mips64el-linux-gnuabi64/libthread_db.so.1".
Core was generated by `/home/myon/postgresql-9.6/postgresql-9.6-9.6.3/build/tmp_install/usr/lib/postgr'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000aababa6634 in EventTriggerEndCompleteQuery ()   at ./build/../src/backend/commands/event_trigger.c:1263
1263            MemoryContextDelete(currentEventTriggerState->cxt);
(gdb) bt full
#0  0x000000aababa6634 in EventTriggerEndCompleteQuery ()   at ./build/../src/backend/commands/event_trigger.c:1263
 prevstate = <optimized out>
 
#1  0x000000aabad6d508 in ProcessUtilitySlow (   parsetree=parsetree@entry=0xaac3688428,
queryString=queryString@entry=0xaac3687888"REVOKE ALL on pg_authid FROM public;\n",
context=context@entry=PROCESS_UTILITY_TOPLEVEL,params=params@entry=0x0,
completionTag=completionTag@entry=0xffff985218"",    dest=0xaabb0a0378 <debugtupDR>) at
./build/../src/backend/tcop/utility.c:1582      isTopLevel = 1 '\001'       isCompleteQuery = 1 '\001'
needCleanup= 0 '\000'       commandCollected = <optimized out>       address = {classId = 2, objectId = 0, objectSubId
=0}       secondaryObject = {classId = 0, objectId = 0, objectSubId = 0}
 
#2  0x000000aabad6c6cc in standard_ProcessUtility (parsetree=0xaac3688428,    queryString=0xaac3687888 "REVOKE ALL on
pg_authidFROM public;\n",    context=<optimized out>, params=0x0, dest=0xaabb0a0378 <debugtupDR>,
completionTag=0xffff985218"") at ./build/../src/backend/tcop/utility.c:907       isTopLevel = 1 '\001'       __func__ =
"standard_ProcessUtility"
#3  0x000000aabad6d33c in ProcessUtility (parsetree=<optimized out>,    queryString=<optimized out>, context=<optimized
out>,params=<optimized out>,    dest=<optimized out>, completionTag=<optimized out>)   at
./build/../src/backend/tcop/utility.c:336
No locals.
#4  0x000000aabad68e80 in PortalRunUtility (portal=portal@entry=0xaac368a8a8,
utilityStmt=utilityStmt@entry=0xaac3688428,   isTopLevel=isTopLevel@entry=1 '\001',
setHoldSnapshot=setHoldSnapshot@entry=0'\000',    dest=0xaabb0a0378 <debugtupDR>, completionTag=0xffff985218 "")   at
./build/../src/backend/tcop/pquery.c:1193      snapshot = 0xaac368c8e8       __func__ = "PortalRunUtility"
 
#5  0x000000aabad69d70 in PortalRunMulti (portal=portal@entry=0xaac368a8a8,    isTopLevel=isTopLevel@entry=1 '\001',
setHoldSnapshot=setHoldSnapshot@entry=0'\000',    dest=dest@entry=0xaabb0a0378 <debugtupDR>,
altdest=altdest@entry=0xaabb0a0378<debugtupDR>,    completionTag=completionTag@entry=0xffff985218 "")   at
./build/../src/backend/tcop/pquery.c:1349      stmt = 0xaac3688428       active_snapshot_set = 0 '\000'
stmtlist_item= 0xaac3688738
 
#6  0x000000aabad6ac44 in PortalRun (portal=portal@entry=0xaac368a8a8,    count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=1'\001',    dest=dest@entry=0xaabb0a0378 <debugtupDR>,
altdest=altdest@entry=0xaabb0a0378<debugtupDR>,    completionTag=completionTag@entry=0xffff985218 "")
 
---Type <return> to continue, or q <return> to quit---   at ./build/../src/backend/tcop/pquery.c:815
save_exception_stack= 0xffff985068       save_context_stack = 0x0       local_sigjmp_buf = {{__jmpbuf = {{__pc =
733279071168,               __sp = 1099504831664, __regs = {733422847016, 733422856360,                  733422847096,
1,733282435960, 733422844040, 733282649704,                  733422428264}, __fp = 733422847832, __gp = 733282512816,
            __glibc_reserved1 = -1, __fpregs = {-nan(0xfffffffffffff),                  -nan(0xfffffffffffff),
-nan(0xfffffffffffff),                 -nan(0xfffffffffffff), -nan(0xfffffffffffff),
-nan(0xfffffffffffff),-nan(0xfffffffffffff),                  -nan(0xfffffffffffff)}}}, __mask_was_saved = 0,
__saved_mask= {             __val = {733282649704, 733422428264, 733282512816, 733422847832,
733280473476,733422856360, 1099165163056, 733282073632, 1,                733282512816, 733280604384, 733282512816,
3618681818990495232,               733282512816, 3618681818990495232, 733422847016}}}}       result = <optimized out>
   nprocessed = <optimized out>       saveTopTransactionResourceOwner = 0xaac361d088       saveTopTransactionContext =
0xaac3622068      saveActivePortal = 0x0       saveResourceOwner = 0xaac361d088       savePortalContext = 0x0
saveMemoryContext= 0xaac3622068       __func__ = "PortalRun"
 
#7  0x000000aabad6802c in exec_simple_query (   query_string=0xaac3687888 "REVOKE ALL on pg_authid FROM public;\n")
at./build/../src/backend/tcop/postgres.c:1094       parsetree = 0xaac3688428       portal = 0xaac368a8a8
snapshot_set= <optimized out>       commandTag = <optimized out>       completionTag =
"\000g\345\352\377\000\000\000w\245\252\377\377\000\000\000\034\b\327\352\377\000\000\000\060)\345\352\377\000\000\000\360g\345\352\377\000\000\000\320R\230\377\377\000\000\000\370\332\344\352\377\000\000\000\020\232\323\352\377\000\000"
     querytree_list = <optimized out>       plantree_list = 0xaac3688758       receiver = 0xaabb0a0378 <debugtupDR>
 format = 0       dest = DestDebug       parsetree_list = 0xaac3688498       save_log_statement_stats = 0 '\000'
was_logged= 0 '\000'       msec_str =
"\020\326^ê\000\000\000\070|\337\352\377\000\000\000\370\226\335\352\377\000\000\000\000\000\000\000\000\000\000"
parsetree_item= 0xaac3688478       isTopLevel = 1 '\001'
 
#8  PostgresMain (argc=<optimized out>, argv=<optimized out>, 
---Type <return> to continue, or q <return> to quit---   dbname=<optimized out>, username=<optimized out>)   at
./build/../src/backend/tcop/postgres.c:4076      query_string = 0xaac3687888 "REVOKE ALL on pg_authid FROM public;\n"
   input_message = {         data = 0xaac3687888 "REVOKE ALL on pg_authid FROM public;\n", len = 38,          maxlen =
1024,cursor = 38}       local_sigjmp_buf = {{__jmpbuf = {{__pc = 733279051620,                __sp = 1099504832144,
__regs= {733282651656, 10, 733422077472,                  733282527592, 0, 733422071296, 0, 1099506026728},
  __fp = 1099506034039, __gp = 733282512816, __glibc_reserved1 = 0,                __fpregs = {-nan(0xfffffffffffff),
-nan(0xfffffffffffff),                 -nan(0xfffffffffffff), -nan(0xfffffffffffff),
-nan(0xfffffffffffff),-nan(0xfffffffffffff),                  -nan(0xfffffffffffff), -nan(0xfffffffffffff)}}},
 __mask_was_saved = 1, __saved_mask = {__val = {0, 0, 733422206784,                1099506026728, 1099111271600, 0,
1099157352184,1099157547312,                1099111271600, 0, 1099157352184, 1099157547312, 1024,
1099157563376,1099156624308, 1099109578624}}}}       send_ready_for_query = 0 '\000'
disable_idle_in_transaction_timeout= 0 '\000'       __func__ = "PostgresMain"
 
#9  0x000000aabaa65658 in main (argc=<optimized out>, argv=0xaac35e7160)   at ./build/../src/backend/main/main.c:224
No locals.
(gdb) l
1258            EventTriggerQueryState *prevstate;
1259    
1260            prevstate = currentEventTriggerState->previous;
1261    
1262            /* this avoids the need for retail pfree of SQLDropList items: */
1263            MemoryContextDelete(currentEventTriggerState->cxt);
1264    
1265            currentEventTriggerState = prevstate;
1266    }
1267    

$ gcc --version
gcc (Debian 7.1.0-11) 7.1.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Christoph



Re: To PostgreSQL Hackers 2017-08-13 <20170813130127.g3tcyzzvuvlpzcxy@msg.df7cb.de>
> 10beta3 and 9.6.4 are both failing during initdb on mips64el on
> Debian/sid (unstable):
> 
> https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.6&arch=mips64el&ver=9.6.4-1&stamp=1502374949&raw=0
> https://buildd.debian.org/status/fetch.php?pkg=postgresql-10&arch=mips64el&ver=10%7Ebeta3-1&stamp=1502535836&raw=0
> 
> All other architectures have succeeded, as well as the 9.6.4 build for
> Debian/stretch (stable) on mips64el. The difference might be the
> compiler version (6.3.0 vs 7.1.0).

Seems to be a gcc-7 problem affecting several packages on mips64el:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=871514

Christoph



Christoph Berg <myon@debian.org> writes:
> 10beta3 and 9.6.4 are both failing during initdb on mips64el on
> Debian/sid (unstable):
> All other architectures have succeeded, as well as the 9.6.4 build for
> Debian/stretch (stable) on mips64el. The difference might be the
> compiler version (6.3.0 vs 7.1.0).

It's hard to explain that stack trace other than as a compiler bug.
There shouldn't be any event triggers active here, so 
EventTriggerBeginCompleteQuery should have done nothing and returned
false.  I don't put complete faith in gdb reports of local variable
values, but it says       needCleanup = 0 '\000'
which agrees with that.  Also the core dump appears to be because
currentEventTriggerState is NULL (please check that), which is
expected if EventTriggerBeginCompleteQuery did nothing.  However, then
EventTriggerEndCompleteQuery should not have gotten called at all.

I suspect you could work around this with
bool        isCompleteQuery = (context <= PROCESS_UTILITY_QUERY);
-    bool        needCleanup;
+    volatile bool    needCleanup;bool        commandCollected = false;

If that fixes it, it's definitely a compiler bug.  That function does
not change needCleanup after the sigsetjmp call, so per POSIX it
should not have to label the variable volatile.  This is far from
being the first such bug we've seen though.
        regards, tom lane



Christoph Berg <myon@debian.org> writes:
> Seems to be a gcc-7 problem affecting several packages on mips64el:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=871514

Hm, unless there is a use of sigsetjmp earlier in that clamav
routine, I would not assume that that's the same issue.  The bug
I suspect we are looking at here is very specific to sigsetjmp
callers: it usually amounts to the compiler unsafely trying to
use the same temporary location for multiple purposes.
        regards, tom lane



Re: Tom Lane 2017-08-13 <14517.1502638417@sss.pgh.pa.us>
> I suspect you could work around this with
> 
>     bool        isCompleteQuery = (context <= PROCESS_UTILITY_QUERY);
> -    bool        needCleanup;
> +    volatile bool    needCleanup;
>     bool        commandCollected = false;
> 
> If that fixes it, it's definitely a compiler bug.  That function does
> not change needCleanup after the sigsetjmp call, so per POSIX it
> should not have to label the variable volatile.  This is far from
> being the first such bug we've seen though.

In the meantime, gcc-7 is at version 7.2.0-1, so I gave 9.6 on
mips64el a new try. It's still failing at initdb time, and indeed
adding "volatile" makes initdb proceed, but then the rest of the
testsuite fails in various ways:

DETAIL:  Failed process was running: CREATE TABLE enumtest_child (parent rainbow REFERENCES enumtest_parent);
DETAIL:  Failed process was running: create table trigtest2 (i int references trigtest(i) on delete cascade);
DETAIL:  Failed process was running: CREATE TABLE trunc_b (a int REFERENCES truncate_a);
DETAIL:  Failed process was running: CREATE SCHEMA evttrig               CREATE TABLE one (col_a SERIAL PRIMARY KEY,
col_btext DEFAULT 'forty two')               CREATE INDEX one_idx ON one (col_b)               CREATE TABLE two (col_c
INTEGERCHECK (col_c > 0) REFERENCES one DEFAULT 42);
 

Hopefully the compiler gets fixed soonish on mips64el...

Thanks for the analysis,
Christoph



Re: [HACKERS] initdb failure on Debian sid/mips64el inEventTriggerEndCompleteQuery

From
Christoph Berg
Date:
Re: Tom Lane 2017-08-13 <14677.1502638689@sss.pgh.pa.us>
> Christoph Berg <myon@debian.org> writes:
> > Seems to be a gcc-7 problem affecting several packages on mips64el:
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=871514
> 
> Hm, unless there is a use of sigsetjmp earlier in that clamav
> routine, I would not assume that that's the same issue.  The bug
> I suspect we are looking at here is very specific to sigsetjmp
> callers: it usually amounts to the compiler unsafely trying to
> use the same temporary location for multiple purposes.

It appears to have been the same issue - non-long ints spilled on the
stack and loaded back as long int:

Changes:gcc-7 (7.2.0-3) unstable; urgency=high.  * Update to SVN 20170901 (r251583) from the gcc-7-branch.    - Fix PR
target/81504(PPC), PR c++/82040.  * Apply proposed patch for PR target/81803 (James Cowgill), conditionally    for
mips*targets. Closes: #871514.
 

The package built successfully on mips64el now.

Christoph