Re: weird buildfarm failures on arm/mipsel and --with-tcl - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: weird buildfarm failures on arm/mipsel and --with-tcl
Date
Msg-id 45A34323.4080100@kaltenbrunner.cc
Whole thread Raw
In response to Re: weird buildfarm failures on arm/mipsel and --with-tcl  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: weird buildfarm failures on arm/mipsel and --with-tcl  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List pgsql-hackers
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> one of my new buildfarm boxes (an Debian/Etch based ARM box) is
>> sometimes failing to stop the database during the regression tests:
> 
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quagga&dt=2007-01-08%2003:03:03
> 
>> this only seems to happen sometimes and only if --with-tcl is enabled on
>> quagga.
> 
>> lionfish (my mipsel box) is able to trigger that on every build if I
>> enable --with-tcl but it is nearly impossible to debug it there because
>> of the low amount of memory and diskspace it has.
> 
> Hm, could pl/tcl somehow be preventing the backend from exiting once
> it's run any pl/tcl stuff?  I have no idea why though, and even less
> why it wouldn't be repeatable. 
> 
>> After the stopdb failure we still have those processes running:
>> pgbuild   3488  0.0  2.4  43640  6300 ?        Ss   06:15   0:01
>> postgres: pgbuild pl_regression [local] idle
> 
> Can you get a stack trace from this process?

(gdb) bt
#0  0x406b9d80 in __pthread_sigsuspend () from /lib/libpthread.so.0
#1  0x406b8a7c in __pthread_wait_for_restart_signal () from
/lib/libpthread.so.0
#2  0x406b91f8 in pthread_onexit_process () from /lib/libpthread.so.0
#3  0x40438658 in exit () from /lib/libc.so.6
#4  0x40438658 in exit () from /lib/libc.so.6
Previous frame identical to this frame (corrupt stack?)



> 
>> pgbuild   3489  0.0  0.0      0     0 ?        Z    06:15   0:00
>> [postgres] <defunct>
> 
> This is a bit odd ... if that process is a direct child of the
> postmaster it should have been reaped promptly.  Could it be a child
> of the other backend?  If so, why was it started?  Please try the
> ps again with whatever switch it needs to list parent process ID.

looks you are right - the defunct 3489 seems to be a child of 3488:
PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND   1  3389 18341 18341 ?           -1 S     1001   0:03
/home/pgbuild/pgbuildfarm/HEAD/inst/bin/postgres -D data3389  3391  3391  3391 ?           -1 Ss    1001   0:00
postgres:
writer process3389  3392  3392  3392 ?           -1 Ss    1001   0:00 postgres: stats
collector process3389  3488  3488  3488 ?           -1 Ss    1001   0:01 postgres:
pgbuild pl_regression [local] idle3488  3489  3488  3488 ?           -1 Z     1001   0:00 [postgres]
<defunct>


Stefan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: Widen the money type to 64 bits.
Next
From: Dave Page
Date:
Subject: Re: -f option for pg_dumpall