Thread: buildfarm NetBSD/m68k tsearch regression failure
This result is now seen for HEAD on buildfarm member osprey: ================= pgsql.23138/contrib/tsearch/regression.diffs =================== *** ./expected/tsearch.out Tue Dec 28 12:05:27 2004 --- ./results/tsearch.out Tue Dec 28 19:52:47 2004 *************** *** 312,322 **** (1 row) SELECT '1'::mquery_txt; ! mquery_txt ! ------------ ! '1' ! (1 row) ! SELECT '1 '::mquery_txt; mquery_txt ------------ --- 312,318 ---- (1 row) SELECT '1'::mquery_txt; ! ERROR: cache lookup failed for type 3095621458 SELECT '1 '::mquery_txt; mquery_txt ------------ ====================================================================== Note that this is the only failure on buildfarm for HEAD except for 2 machines where the failure is known to be related toold hardware / emulator software. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > This result is now seen for HEAD on buildfarm member osprey: Yeah, I was wondering about that. It should be easy to reproduce the failure by hand (just run the tsearch regression test and then re-execute that query) --- can we see a debugger stack trace from the errfinish call? regards, tom lane
Le 28 déc. 04, à 23:36, Tom Lane a écrit : > Andrew Dunstan <andrew@dunslane.net> writes: >> This result is now seen for HEAD on buildfarm member osprey: > > Yeah, I was wondering about that. It should be easy to reproduce the > failure by hand (just run the tsearch regression test and then > re-execute that query) --- can we see a debugger stack trace from the > errfinish call? I'm trying to reproduce this. I've made an install with ./configure --prefix /data/postgresql/tests-install --enable-debug --enable--cassert Then cd contrib/tsearch gmake gmake install gmake installcheck The test fails, but differently than with the buildfarm setup (which fails consistently with the same error message each time). here is the tail of regression.diff (which indicates that the first query failed): SELECT '1'::mquery_txt; ! server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. ! connection to server was lost the log of the server is ERROR: group "regressgroup1" does not exist NOTICE: type "txtidx" is not yet defined DETAIL: Creating a shell type definition. NOTICE: argument type txtidx is only a shell NOTICE: type "query_txt" is not yet defined DETAIL: Creating a shell type definition. NOTICE: argument type query_txt is only a shell NOTICE: type "mquery_txt" is not yet defined DETAIL: Creating a shell type definition. NOTICE: argument type mquery_txt is only a shell NOTICE: type "gtxtidx" is not yet defined DETAIL: Creating a shell type definition. NOTICE: argument type gtxtidx is only a shell LOG: server process (PID 23241) was terminated by signal 11 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing LOG: database system was interrupted at 2004-12-29 15:36:27 CET LOG: checkpoint record is at 0/DA95E8 LOG: redo record is at 0/DA95E8; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 1119; next OID: 66382 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/DA9628 LOG: record with zero length at 0/E09988 LOG: redo done at 0/E09958 LOG: database system is ready The cube contrib regression tests passes with this setup. I'm trying right now to gmake installcheck in contrib/, to see if there are dependancies between tests. Regards, Rémi Zara -- Rémi Zara http://www.remi-zara.net/
Rémi Zara <remi_zara@mac.com> writes: > here is the tail of regression.diff (which indicates that the first=20 > query failed): > SELECT '1'::mquery_txt; > ! server closed the connection unexpectedly > ! This probably means the server terminated abnormally > ! before or while processing the request. > ! connection to server was lost Backtracing the core dump from that crash would do fine. It's probably the same failure --- what this looks like to me now is dereferencing a garbage pointer, which happens to pick up an irrelevant value in the one symptom and touch unmapped memory in the other. regards, tom lane
Le 29 déc. 04, à 18:05, Tom Lane a écrit : > Rémi Zara <remi_zara@mac.com> writes: >> here is the tail of regression.diff (which indicates that the first=20 >> query failed): > >> SELECT '1'::mquery_txt; >> ! server closed the connection unexpectedly >> ! This probably means the server terminated abnormally >> ! before or while processing the request. >> ! connection to server was lost > > Backtracing the core dump from that crash would do fine. Here you go (gdb) bt #0 0x0100000a in ?? () #1 0x046e9cce in queryin (buf=Cannot access memory at address 0x0 ) at query.c:543 #2 0x046e9e44 in mqtxt_in (fcinfo=0xffffb688) at query.c:620 #3 0x0019d790 in OidFunctionCall3 (functionId=61367, arg1=2762304, arg2=0, arg3=4294967295) at fmgr.c:1408 #4 0x00091298 in stringTypeDatum (tp=0x2a26e9, string=0x2a2640 "1", atttypmod=-1) at parse_type.c:338 #5 0x00091968 in coerce_type (pstate=0x2a2610, node=0x2a2240, inputTypeId=2762304, targetTypeId=61366, targetTypeMod=-1, ccontext=98, cformat=COERCE_EXPLICIT_CAST) at parse_coerce.c:185 #6 0x0009157c in coerce_to_target_type (pstate=0x2a2518, expr=0x2a2240, exprtype=705, targettype=61366, targettypmod=-1, ccontext=COERCION_EXPLICIT, cformat=COERCE_EXPLICIT_CAST) at parse_coerce.c:80 #7 0x0008b440 in typecast_expression (pstate=0x2a2518, expr=0x2a2240, typename=0x2a2358) at parse_expr.c:1651 #8 0x0008a814 in transformExpr (pstate=0x2a2518, expr=0x2a23d8) at parse_expr.c:177 #9 0x00093224 in transformTargetEntry (pstate=0x2a2518, node=0x2a23d8, expr=0x0, colname=0x0, resjunk=0 '\0') at parse_target.c:72 #10 0x000932aa in transformTargetList (pstate=0x2a2518, targetlist=0xffffb688) at parse_target.c:148 #11 0x00077676 in transformSelectStmt (pstate=0x2a2518, stmt=0x2a2450) at analyze.c:1813 #12 0x00075496 in transformStmt (pstate=0x2a2518, parseTree=0x2a2450, extras_before=0xffffba80, extras_after=0xffffba84) at analyze.c:371 #13 0x00075230 in do_parse_analyze (parseTree=0x2a2450, pstate=0x2a2518) at analyze.c:245 #14 0x0007514c in parse_analyze (parseTree=0x2a2450, paramTypes=0x0, numParams=0) at analyze.c:169 #15 0x00138f3a in pg_analyze_and_rewrite (parsetree=0x2a2450, paramTypes=0x0, numParams=0) at postgres.c:555 #16 0x00139298 in exec_simple_query (query_string=0x2a2020 "SELECT '1'::mquery_txt;") at postgres.c:872 #17 0x0013b4c6 in PostgresMain (argc=4, argv=0x27f390, username=0x27f260 "rzara") at postgres.c:3007 #18 0x00114b7c in BackendRun (port=0x28f200) at postmaster.c:2817 #19 0x0011447e in BackendStartup (port=0x28f200) at postmaster.c:2453 #20 0x00112cd0 in ServerLoop () at postmaster.c:1198 #21 0x001126f0 in PostmasterMain (argc=3, argv=0xffffc674) at postmaster.c:917 #22 0x000e465e in main (argc=3, argv=0xffffc674) at main.c:268 Regards, Rémi Zara -- Rémi Zara http://www.remi-zara.net/
Rémi Zara <remi_zara@mac.com> writes: > Le 29 d=E9c. 04, =E0 18:05, Tom Lane a =E9crit : >> Backtracing the core dump from that crash would do fine. > Here you go > (gdb) bt > #0 0x0100000a in ?? () > #1 0x046e9cce in queryin (buf=3DCannot access memory at address 0x0 > ) at query.c:543 > #2 0x046e9e44 in mqtxt_in (fcinfo=3D0xffffb688) at query.c:620 > #3 0x0019d790 in OidFunctionCall3 (functionId=3D61367, arg1=3D2762304,=20= > arg2=3D0, arg3=3D4294967295) at fmgr.c:1408 > #4 0x00091298 in stringTypeDatum (tp=3D0x2a26e9, string=3D0x2a2640 "1",=20= > atttypmod=3D-1) at parse_type.c:338 Hmm. I was hoping to spot some obviously machine-dependent code nearby to the crash point, but I don't see anything wrong in that area. You might try rebuilding tsearch with -O0 (if it wasn't already) in hopes that the backtrace becomes more accurate. regards, tom lane
Le 29 déc. 04, à 23:38, Tom Lane a écrit : > Rémi Zara <remi_zara@mac.com> writes: >> Le 29 d=E9c. 04, =E0 18:05, Tom Lane a =E9crit : >>> Backtracing the core dump from that crash would do fine. > >> Here you go > >> (gdb) bt >> #0 0x0100000a in ?? () >> #1 0x046e9cce in queryin (buf=3DCannot access memory at address 0x0 >> ) at query.c:543 >> #2 0x046e9e44 in mqtxt_in (fcinfo=3D0xffffb688) at query.c:620 >> #3 0x0019d790 in OidFunctionCall3 (functionId=3D61367, >> arg1=3D2762304,=20= > >> arg2=3D0, arg3=3D4294967295) at fmgr.c:1408 >> #4 0x00091298 in stringTypeDatum (tp=3D0x2a26e9, string=3D0x2a2640 >> "1",=20= > >> atttypmod=3D-1) at parse_type.c:338 > > Hmm. I was hoping to spot some obviously machine-dependent code nearby > to the crash point, but I don't see anything wrong in that area. > > You might try rebuilding tsearch with -O0 (if it wasn't already) in > hopes that the backtrace becomes more accurate. The tsearch test passes when compiled with -O0 (postgres is still compiled with -O2) regards, Rémi Zara -- Rémi Zara http://www.remi-zara.net/
Rémi Zara <remi_zara@mac.com> writes: >> Hmm. I was hoping to spot some obviously machine-dependent code nearby >> to the crash point, but I don't see anything wrong in that area. >> >> You might try rebuilding tsearch with -O0 (if it wasn't already) in >> hopes that the backtrace becomes more accurate. > The tsearch test passes when compiled with -O0 (postgres is still > compiled with -O2) Ugh. That suggests it could be a compiler bug. Are you using the latest available compiler version for your platform? regards, tom lane
Le 30 déc. 04, à 16:05, Tom Lane a écrit : > Rémi Zara <remi_zara@mac.com> writes: >>> Hmm. I was hoping to spot some obviously machine-dependent code >>> nearby >>> to the crash point, but I don't see anything wrong in that area. >>> >>> You might try rebuilding tsearch with -O0 (if it wasn't already) in >>> hopes that the backtrace becomes more accurate. > >> The tsearch test passes when compiled with -O0 (postgres is still >> compiled with -O2) > > Ugh. That suggests it could be a compiler bug. Are you using the > latest available compiler version for your platform? Hi, The problem is that when compiled with -O2, the pushval_morph func address is 0x0 (in query.c). It goes away with the following patch, which might not be a proper solution.... Regards, Rémi Zara Index: query.c =================================================================== RCS file: /projects/cvsroot/pgsql/contrib/tsearch/query.c,v retrieving revision 1.16 diff -u -r1.16 query.c --- query.c 9 Nov 2004 06:09:33 -0000 1.16 +++ query.c 30 Dec 2004 19:10:46 -0000 @@ -616,6 +616,7 @@ char pbuf[16384], *cur; #endif + elog(DEBUG5, "pushval_morph address is %p", pushval_morph); initmorph(); query = queryin((char *) PG_GETARG_POINTER(0),pushval_morph); res = clean_fakeval(GETQUERY(query), &len); -- Rémi Zara http://www.remi-zara.net/
Rémi Zara <remi_zara@mac.com> writes: >> Ugh. That suggests it could be a compiler bug. Are you using the >> latest available compiler version for your platform? > The problem is that when compiled with -O2, the pushval_morph func > address is 0x0 (in query.c). > It goes away with the following patch, which might not be a proper > solution.... If that isn't a compiler bug, I don't know what is. Report it to the gcc boys. regards, tom lane