Re: Curious buildfarm failures - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Curious buildfarm failures
Date
Msg-id 20130114215016.GA22155@awork2.anarazel.de
Whole thread Raw
In response to Curious buildfarm failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Curious buildfarm failures
List pgsql-hackers
On 2013-01-14 16:35:48 -0500, Tom Lane wrote:
> Since commit 2065dd2834e832eb820f1fbcd16746d6af1f6037, there have been
> a few buildfarm failures along the lines of
>   
>   -- Commit table drop
>   COMMIT PREPARED 'regress-two';
> ! PANIC:  failed to re-find shared proclock object
> ! PANIC:  failed to re-find shared proclock object
> ! connection to server was lost
> 
> Evidently I bollixed something, but what?  I've been unable to reproduce
> this locally so far.  Anybody see what's wrong?
> 
> Another thing is that dugong has been reproducibly failing with
> 
>  drop cascades to table testschema.atable
>   -- Should succeed
>   DROP TABLESPACE testspace;
> + ERROR:  tablespace "testspace" is not empty
> 
> since the elog-doesn't-return patch (b853eb97) went in.  Maybe this is
> some local problem there but I'm suspicious that there's a connection.
> But what?
> 
> Any insights out there?

It also has:

LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
FATAL:  could not open file "base/16384/28182": No such file or directory
CONTEXT:  writing block 6 of relation base/16384/28182
TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1743)
LOG:  checkpointer process (PID 30366) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
LOG:  abnormal database system shutdown


================== stack trace: pgsql.9958/src/test/regress/tmp_check/data/core ==================
Using host libthread_db library "/lib/tls/libthread_db.so.1".

warning: Can't read pathname for load map: Input/output error.
Core was generated by `postgres: checkpointer process                                                '.
Program terminated with signal 6, Aborted.

#0  0xa000000000010620 in __kernel_syscall_via_break ()
#0  0xa000000000010620 in __kernel_syscall_via_break ()
#1  0x2000000000953bb0 in raise () from /lib/tls/libc.so.6.1
#2  0x2000000000956df0 in abort () from /lib/tls/libc.so.6.1
#3  0x4000000000b4b510 in ExceptionalCondition (   conditionName=0x4000000000d76390 "!(PrivateRefCount[i] == 0)",
errorType=0x4000000000d06500"FailedAssertion",    fileName=0x4000000000d76260 "bufmgr.c", lineNumber=1743) at
assert.c:54
#4  0x40000000007a7d20 in AtProcExit_Buffers (code=1, arg=59) at bufmgr.c:1743
#5  0x40000000007c4e50 in shmem_exit (code=1) at ipc.c:221
#6  0x40000000007c4fa0 in proc_exit_prepare (code=1) at ipc.c:181
#7  0x40000000007c4ab0 in proc_exit (code=1) at ipc.c:96
#8  0x4000000000b5d390 in errfinish (dummy=0) at elog.c:518
#9  0x4000000000823380 in _mdfd_getseg (reln=0x6000000000155420,    forknum=1397792, blkno=6, skipFsync=0 '\0',
behavior=EXTENSION_FAIL)  at md.c:577
 
#10 0x400000000081e5c0 in mdwrite (reln=0x6000000000155420,    forknum=MAIN_FORKNUM, blocknum=6,
buffer=0x2000000001432ea0"",    skipFsync=0 '\0') at md.c:735
 
#11 0x4000000000824690 in smgrwrite (reln=0x6000000000155420,    forknum=MAIN_FORKNUM, blocknum=6,
buffer=0x2000000001432ea0"",    skipFsync=0 '\0') at smgr.c:534
 
#12 0x400000000079e510 in FlushBuffer (buf=0x1, reln=0x6000000000155420)   at bufmgr.c:1941
#13 0x40000000007a10b0 in SyncOneBuffer (buf_id=0, skip_recently_used=0 '\0')   at bufmgr.c:1677
#14 0x40000000007a0c00 in CheckPointBuffers (flags=5) at bufmgr.c:1284
#15 0x40000000001fcbf0 in CheckPointGuts (checkPointRedo=80827000, flags=5)   at xlog.c:7391
#16 0x40000000001fb2a0 in CreateCheckPoint (flags=5) at xlog.c:7240
#17 0x40000000001f6820 in ShutdownXLOG (code=14699520,    arg=4611686018440093920) at xlog.c:6823
#18 0x400000000072d780 in _setjmp_lpad_CheckpointerMain_0$0$18 ()   at checkpointer.c:413
#19 0x4000000000235810 in AuxiliaryProcessMain (argc=496536,    argv=0x60000fffff80e520) at bootstrap.c:433
#20 0x40000000007172b0 in StartChildProcess (type=508288) at postmaster.c:4956
#21 0x4000000000713f50 in reaper (postgres_signal_arg=30365)   at postmaster.c:2568
#22 <signal handler called>
#23 0xa000000000010620 in __kernel_syscall_via_break ()
#24 0x2000000000953f70 in sigprocmask () from /lib/tls/libc.so.6.1
#25 0x4000000000720480 in ServerLoop () at postmaster.c:1521
#26 0x400000000071d9d0 in PostmasterMain (argc=6, argv=0x60000000000d85e0)   at postmaster.c:1244
#27 0x4000000000577a30 in main (argc=6, argv=0x60000000000d8010) at main.c:197

in the log. So it seems like it also could be related to locking
changes although I don't immediately see why.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Curious buildfarm failures
Next
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] COPY .. COMPRESSED