Thread: buildfarm NetBSD/m68k tsearch regression failure

buildfarm NetBSD/m68k tsearch regression failure

From
Andrew Dunstan
Date:
This result is now seen for HEAD on buildfarm member osprey:


================= pgsql.23138/contrib/tsearch/regression.diffs ===================
*** ./expected/tsearch.out    Tue Dec 28 12:05:27 2004
--- ./results/tsearch.out    Tue Dec 28 19:52:47 2004
***************
*** 312,322 **** (1 row)  SELECT '1'::mquery_txt;
!  mquery_txt 
! ------------
!  '1'
! (1 row)
!  SELECT '1 '::mquery_txt;  mquery_txt  ------------
--- 312,318 ---- (1 row)  SELECT '1'::mquery_txt;
! ERROR:  cache lookup failed for type 3095621458 SELECT '1 '::mquery_txt;  mquery_txt  ------------

======================================================================

Note that this is the only failure on buildfarm for HEAD except for 2 machines where the failure is known to be related
toold hardware / emulator software.
 

cheers

andrew



Re: buildfarm NetBSD/m68k tsearch regression failure

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> This result is now seen for HEAD on buildfarm member osprey:

Yeah, I was wondering about that.  It should be easy to reproduce the
failure by hand (just run the tsearch regression test and then
re-execute that query) --- can we see a debugger stack trace from the
errfinish call?
        regards, tom lane


Re: buildfarm NetBSD/m68k tsearch regression failure

From
Rémi Zara
Date:
Le 28 déc. 04, à 23:36, Tom Lane a écrit :

> Andrew Dunstan <andrew@dunslane.net> writes:
>> This result is now seen for HEAD on buildfarm member osprey:
>
> Yeah, I was wondering about that.  It should be easy to reproduce the
> failure by hand (just run the tsearch regression test and then
> re-execute that query) --- can we see a debugger stack trace from the
> errfinish call?

I'm trying to reproduce this.
I've made an install with ./configure --prefix
/data/postgresql/tests-install --enable-debug --enable--cassert
Then
cd contrib/tsearch
gmake
gmake install
gmake installcheck

The test fails, but differently than with the buildfarm setup (which
fails consistently with the same error message each time).

here is the tail of regression.diff (which indicates that the first
query failed):
  SELECT '1'::mquery_txt;
! server closed the connection unexpectedly
!       This probably means the server terminated abnormally
!       before or while processing the request.
! connection to server was lost

the log of the server is

ERROR:  group "regressgroup1" does not exist
NOTICE:  type "txtidx" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type txtidx is only a shell
NOTICE:  type "query_txt" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type query_txt is only a shell
NOTICE:  type "mquery_txt" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type mquery_txt is only a shell
NOTICE:  type "gtxtidx" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type gtxtidx is only a shell
LOG:  server process (PID 23241) was terminated by signal 11
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2004-12-29 15:36:27 CET
LOG:  checkpoint record is at 0/DA95E8
LOG:  redo record is at 0/DA95E8; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 1119; next OID: 66382
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/DA9628
LOG:  record with zero length at 0/E09988
LOG:  redo done at 0/E09958
LOG:  database system is ready

The cube contrib regression tests passes with this setup.
I'm trying right now to gmake installcheck in contrib/, to see if there
are dependancies between tests.

Regards,

Rémi Zara
--
Rémi Zara
http://www.remi-zara.net/

Re: buildfarm NetBSD/m68k tsearch regression failure

From
Tom Lane
Date:
Rémi Zara <remi_zara@mac.com> writes:
> here is the tail of regression.diff (which indicates that the first=20
> query failed):

>    SELECT '1'::mquery_txt;
> ! server closed the connection unexpectedly
> !       This probably means the server terminated abnormally
> !       before or while processing the request.
> ! connection to server was lost

Backtracing the core dump from that crash would do fine.  It's probably
the same failure --- what this looks like to me now is dereferencing a
garbage pointer, which happens to pick up an irrelevant value in the one
symptom and touch unmapped memory in the other.
        regards, tom lane


Re: buildfarm NetBSD/m68k tsearch regression failure

From
Rémi Zara
Date:
Le 29 déc. 04, à 18:05, Tom Lane a écrit :

> Rémi Zara <remi_zara@mac.com> writes:
>> here is the tail of regression.diff (which indicates that the first=20
>> query failed):
>
>>    SELECT '1'::mquery_txt;
>> ! server closed the connection unexpectedly
>> !       This probably means the server terminated abnormally
>> !       before or while processing the request.
>> ! connection to server was lost
>
> Backtracing the core dump from that crash would do fine.

Here you go

(gdb) bt
#0  0x0100000a in ?? ()
#1  0x046e9cce in queryin (buf=Cannot access memory at address 0x0
) at query.c:543
#2  0x046e9e44 in mqtxt_in (fcinfo=0xffffb688) at query.c:620
#3  0x0019d790 in OidFunctionCall3 (functionId=61367, arg1=2762304,
arg2=0, arg3=4294967295) at fmgr.c:1408
#4  0x00091298 in stringTypeDatum (tp=0x2a26e9, string=0x2a2640 "1",
atttypmod=-1) at parse_type.c:338
#5  0x00091968 in coerce_type (pstate=0x2a2610, node=0x2a2240,
inputTypeId=2762304, targetTypeId=61366, targetTypeMod=-1, ccontext=98,
cformat=COERCE_EXPLICIT_CAST)    at parse_coerce.c:185
#6  0x0009157c in coerce_to_target_type (pstate=0x2a2518,
expr=0x2a2240, exprtype=705, targettype=61366, targettypmod=-1,
ccontext=COERCION_EXPLICIT,    cformat=COERCE_EXPLICIT_CAST) at parse_coerce.c:80
#7  0x0008b440 in typecast_expression (pstate=0x2a2518, expr=0x2a2240,
typename=0x2a2358) at parse_expr.c:1651
#8  0x0008a814 in transformExpr (pstate=0x2a2518, expr=0x2a23d8) at
parse_expr.c:177
#9  0x00093224 in transformTargetEntry (pstate=0x2a2518, node=0x2a23d8,
expr=0x0, colname=0x0, resjunk=0 '\0') at parse_target.c:72
#10 0x000932aa in transformTargetList (pstate=0x2a2518,
targetlist=0xffffb688) at parse_target.c:148
#11 0x00077676 in transformSelectStmt (pstate=0x2a2518, stmt=0x2a2450)
at analyze.c:1813
#12 0x00075496 in transformStmt (pstate=0x2a2518, parseTree=0x2a2450,
extras_before=0xffffba80, extras_after=0xffffba84) at analyze.c:371
#13 0x00075230 in do_parse_analyze (parseTree=0x2a2450,
pstate=0x2a2518) at analyze.c:245
#14 0x0007514c in parse_analyze (parseTree=0x2a2450, paramTypes=0x0,
numParams=0) at analyze.c:169
#15 0x00138f3a in pg_analyze_and_rewrite (parsetree=0x2a2450,
paramTypes=0x0, numParams=0) at postgres.c:555
#16 0x00139298 in exec_simple_query (query_string=0x2a2020 "SELECT
'1'::mquery_txt;") at postgres.c:872
#17 0x0013b4c6 in PostgresMain (argc=4, argv=0x27f390,
username=0x27f260 "rzara") at postgres.c:3007
#18 0x00114b7c in BackendRun (port=0x28f200) at postmaster.c:2817
#19 0x0011447e in BackendStartup (port=0x28f200) at postmaster.c:2453
#20 0x00112cd0 in ServerLoop () at postmaster.c:1198
#21 0x001126f0 in PostmasterMain (argc=3, argv=0xffffc674) at
postmaster.c:917
#22 0x000e465e in main (argc=3, argv=0xffffc674) at main.c:268

Regards,

Rémi Zara

--
Rémi Zara
http://www.remi-zara.net/

Re: buildfarm NetBSD/m68k tsearch regression failure

From
Tom Lane
Date:
Rémi Zara <remi_zara@mac.com> writes:
> Le 29 d=E9c. 04, =E0 18:05, Tom Lane a =E9crit :
>> Backtracing the core dump from that crash would do fine.

> Here you go

> (gdb) bt
> #0  0x0100000a in ?? ()
> #1  0x046e9cce in queryin (buf=3DCannot access memory at address 0x0
> ) at query.c:543
> #2  0x046e9e44 in mqtxt_in (fcinfo=3D0xffffb688) at query.c:620
> #3  0x0019d790 in OidFunctionCall3 (functionId=3D61367, arg1=3D2762304,=20=

> arg2=3D0, arg3=3D4294967295) at fmgr.c:1408
> #4  0x00091298 in stringTypeDatum (tp=3D0x2a26e9, string=3D0x2a2640 "1",=20=

> atttypmod=3D-1) at parse_type.c:338

Hmm.  I was hoping to spot some obviously machine-dependent code nearby
to the crash point, but I don't see anything wrong in that area.

You might try rebuilding tsearch with -O0 (if it wasn't already) in
hopes that the backtrace becomes more accurate.
        regards, tom lane


Re: buildfarm NetBSD/m68k tsearch regression failure

From
Rémi Zara
Date:
Le 29 déc. 04, à 23:38, Tom Lane a écrit :

> Rémi Zara <remi_zara@mac.com> writes:
>> Le 29 d=E9c. 04, =E0 18:05, Tom Lane a =E9crit :
>>> Backtracing the core dump from that crash would do fine.
>
>> Here you go
>
>> (gdb) bt
>> #0  0x0100000a in ?? ()
>> #1  0x046e9cce in queryin (buf=3DCannot access memory at address 0x0
>> ) at query.c:543
>> #2  0x046e9e44 in mqtxt_in (fcinfo=3D0xffffb688) at query.c:620
>> #3  0x0019d790 in OidFunctionCall3 (functionId=3D61367,
>> arg1=3D2762304,=20=
>
>> arg2=3D0, arg3=3D4294967295) at fmgr.c:1408
>> #4  0x00091298 in stringTypeDatum (tp=3D0x2a26e9, string=3D0x2a2640
>> "1",=20=
>
>> atttypmod=3D-1) at parse_type.c:338
>
> Hmm.  I was hoping to spot some obviously machine-dependent code nearby
> to the crash point, but I don't see anything wrong in that area.
>
> You might try rebuilding tsearch with -O0 (if it wasn't already) in
> hopes that the backtrace becomes more accurate.

The tsearch test passes when compiled with -O0 (postgres is still
compiled with -O2)

regards,

Rémi Zara

--
Rémi Zara
http://www.remi-zara.net/

Re: buildfarm NetBSD/m68k tsearch regression failure

From
Tom Lane
Date:
Rémi Zara <remi_zara@mac.com> writes:
>> Hmm.  I was hoping to spot some obviously machine-dependent code nearby
>> to the crash point, but I don't see anything wrong in that area.
>> 
>> You might try rebuilding tsearch with -O0 (if it wasn't already) in
>> hopes that the backtrace becomes more accurate.

> The tsearch test passes when compiled with -O0 (postgres is still
> compiled with -O2)

Ugh.  That suggests it could be a compiler bug.  Are you using the
latest available compiler version for your platform?
        regards, tom lane


Re: buildfarm NetBSD/m68k tsearch regression failure

From
Rémi Zara
Date:
Le 30 déc. 04, à 16:05, Tom Lane a écrit :

> Rémi Zara <remi_zara@mac.com> writes:
>>> Hmm.  I was hoping to spot some obviously machine-dependent code
>>> nearby
>>> to the crash point, but I don't see anything wrong in that area.
>>>
>>> You might try rebuilding tsearch with -O0 (if it wasn't already) in
>>> hopes that the backtrace becomes more accurate.
>
>> The tsearch test passes when compiled with -O0 (postgres is still
>> compiled with -O2)
>
> Ugh.  That suggests it could be a compiler bug.  Are you using the
> latest available compiler version for your platform?

Hi,

The problem is that when compiled with -O2, the pushval_morph func
address is 0x0 (in query.c).
It goes away with the following patch, which might not be a proper
solution....

Regards,

Rémi Zara

Index: query.c
===================================================================
RCS file: /projects/cvsroot/pgsql/contrib/tsearch/query.c,v
retrieving revision 1.16
diff -u -r1.16 query.c
--- query.c     9 Nov 2004 06:09:33 -0000       1.16
+++ query.c     30 Dec 2004 19:10:46 -0000
@@ -616,6 +616,7 @@        char            pbuf[16384],                           *cur;
#endif
+       elog(DEBUG5, "pushval_morph address is %p", pushval_morph);        initmorph();        query = queryin((char *)
PG_GETARG_POINTER(0),pushval_morph);        res = clean_fakeval(GETQUERY(query), &len); 
--
Rémi Zara
http://www.remi-zara.net/

Re: buildfarm NetBSD/m68k tsearch regression failure

From
Tom Lane
Date:
Rémi Zara <remi_zara@mac.com> writes:
>> Ugh.  That suggests it could be a compiler bug.  Are you using the
>> latest available compiler version for your platform?

> The problem is that when compiled with -O2, the pushval_morph func
> address is 0x0 (in query.c).
> It goes away with the following patch, which might not be a proper
> solution....

If that isn't a compiler bug, I don't know what is.  Report it to the
gcc boys.
        regards, tom lane