Re: Still something fishy in the fastpath lock stuff - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Still something fishy in the fastpath lock stuff
Date
Msg-id CA+TgmoYkeB+PhHBpAPt+xen7eaB2J3tpHxJG1XPR8y6U_zKF+w@mail.gmail.com
Whole thread Raw
In response to Still something fishy in the fastpath lock stuff  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Still something fishy in the fastpath lock stuff
List pgsql-hackers
On Tue, Mar 25, 2014 at 10:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Buildfarm member prairiedog crashed on today's HEAD with this:
>
> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > 0)", File: "lock.c", Line: 1240)
>
> I have the core file, and gdb says:
>
> #0  0x90047dac in kill ()
> #1  0x9012d7b4 in abort ()
> #2  0x003b1c38 in ExceptionalCondition (conditionName=0x0, errorType=0x25 "", fileName=0x2 "", lineNumber=109) at
assert.c:54
> #3  0x002960bc in RemoveLocalLock (locallock=0x280b414) at lock.c:1240
> #4  0x002968a4 in LockReleaseAll (lockmethodid=1, allLocks=0 '\0') at lock.c:2087
> #5  0x00299550 in ProcReleaseLocks (isCommit=1 '\001') at proc.c:752
> #6  0x003de2d4 in ResourceOwnerReleaseInternal (owner=0x2802184, phase=RESOURCE_RELEASE_LOCKS, isCommit=1 '\001',
isTopLevel=1'\001') at resowner.c:307
 
> #7  0x003de71c in ResourceOwnerRelease (owner=0x2802184, phase=RESOURCE_RELEASE_LOCKS, isCommit=1 '\001',
isTopLevel=1'\001') at resowner.c:212
 
> #8  0x00083b7c in CommitTransaction () at xact.c:1998
> #9  0x00083ea4 in CommitTransactionCommand () at xact.c:2713
> #10 0x002ab9b4 in finish_xact_command () at postgres.c:2408
> #11 0x002af2b0 in exec_simple_query (query_string=0x2807e1c "CREATE TABLE t3 (name TEXT, n INTEGER);") at
postgres.c:1076
> #12 0x002b08bc in PostgresMain (argc=41975324, argv=0xbfffdd3c, dbname=0x2800b6c "regression", username=0x51 "") at
postgres.c:4004
> #13 0x0023d5bc in ServerLoop () at postmaster.c:4089
> #14 0x0023f11c in PostmasterMain (argc=6, argv=0x46d9d0) at postmaster.c:1222
> #15 0x001bb89c in main (argc=6, argv=0x2100710) at main.c:203
>
> Since this machine has only been running the buildfarm for a week,
> I rather imagine we will see more of these.  Thoughts?
>
> (This is a PPC machine, but only a single processor, so it's hard
> to see how memory ordering issues might enter into it ...)

Well, it's possible that the issue could be related to compiler
reordering, since it's still the rule that SpinLockAcquire/Release
must be memory barriers but need not be compiler barriers, and
FastPathStrongRelationLocks is not volatile-ized.  I really think we
should change that rule; it leads to ugly code, and bugs.  But to
determine whether that's the issue, we'd probably need to disassemble
the relevant code and see whether the compiler did in fact shift
things around.

But it seems equally possible that the bug is the result of some more
pedestrian error.  I don't have a great idea where to look for such a
mistake, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Standby server won't start
Next
From: Andres Freund
Date:
Subject: Fix typo in decoding always passing true for the startup callback's is_init.