Re: Some bogus results from prairiedog - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Some bogus results from prairiedog
Date
Msg-id CA+TgmoaGjfWA+Zz-D_mFRbLgiSXgcL3b8dUGkp1LqWUCXnORsQ@mail.gmail.com
Whole thread Raw
In response to Re: Some bogus results from prairiedog  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Some bogus results from prairiedog  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jul 22, 2014 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jul 22, 2014 at 12:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Anyway, to cut to the chase, the crash seems to be from this:
>>> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > 0)", File: "lock.c", Line: 2957)
>>> So there is still something rotten in the fastpath lock logic.
>
>> Gosh, that sucks.
>
>> The inconstancy of this problem would seem to suggest some kind of
>> locking bug rather than a flat-out concurrency issue, but it looks to
>> me like everything relevant is marked volatile.
>
> I don't think that you need any big assumptions about machine-specific
> coding issues to spot the problem.

I don't think that I'm making what could be described as big
assumptions; I think we should fix and back-patch the PPC64 spinlock
change.

But...

> The assert in question is here:
>
>     /*
>      * Decrement strong lock count.  This logic is needed only for 2PC.
>      */
>     if (decrement_strong_lock_count
>         && ConflictsWithRelationFastPath(&lock->tag, lockmode))
>     {
>         uint32    fasthashcode = FastPathStrongLockHashPartition(hashcode);
>
>         SpinLockAcquire(&FastPathStrongRelationLocks->mutex);
>         Assert(FastPathStrongRelationLocks->count[fasthashcode] > 0);
>         FastPathStrongRelationLocks->count[fasthashcode]--;
>         SpinLockRelease(&FastPathStrongRelationLocks->mutex);
>     }
>
> and it sure looks to me like that
> "ConflictsWithRelationFastPath(&lock->tag" is looking at the tag of the
> shared-memory lock object you just released.  If someone else had managed
> to recycle that locktable entry for some other purpose, the
> ConflictsWithRelationFastPath call might incorrectly return true.
>
> I think s/&lock->tag/locktag/ would fix it, but maybe I'm missing
> something.

...this is probably the real cause of the failures we've actually been
seeing.  I'll go back-patch that change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Braunstein, Alan"
Date:
Subject: Re: Exporting Table-Specified BLOBs Only?
Next
From: Robert Haas
Date:
Subject: shm_mq bug