Thread: BUG #5918: SummarizeOldestCommittedSxact assertion failure

BUG #5918: SummarizeOldestCommittedSxact assertion failure

From
"YAMAMOTO Takashi"
Date:
The following bug has been logged online:

Bug reference:      5918
Logged by:          YAMAMOTO Takashi
Email address:      yamt@mwd.biglobe.ne.jp
PostgreSQL version: 9.1devel
Operating system:   NetBSD
Description:        SummarizeOldestCommittedSxact assertion failure
Details:

running 05d93c38a791836eeceaf8edb0ea8cb19cdf2760 with my patch
in BUG #5915 applied, i got the following assertion failure.
given that availableList is not empty and SxactGlobalXminCount == 0,
i guess it was raced with ReleasePredicateLocks.

(gdb) bt
#0  0xbbba4cc7 in _lwp_kill () from /usr/lib/libc.so.12
#1  0xbbba4c85 in raise (s=6) at /siro/nbsd/src/lib/libc/gen/raise.c:48
#2  0xbbba445a in abort () at /siro/nbsd/src/lib/libc/stdlib/abort.c:74
#3  0x083dbfa4 in ExceptionalCondition (
    conditionName=0x854d904
"!(!SHMQueueEmpty(FinishedSerializableTransactions))
", errorType=0x854d360 "FailedAssertion", fileName=0x854d354 "predicate.c",
    lineNumber=1311) at assert.c:57
#4  0x082e8423 in SummarizeOldestCommittedSxact () at predicate.c:1311
#5  0x082e87ab in RegisterSerializableTransactionInt (snapshot=0x8596aa0)
    at predicate.c:1451
#6  0x082e86ca in RegisterSerializableTransaction (snapshot=0x8596aa0)
    at predicate.c:1415
#7  0x0840ebb2 in GetTransactionSnapshot () at snapmgr.c:138
#8  0x082f44df in exec_bind_message (input_message=0xbfbfe2c4)
    at postgres.c:1545
#9  0x082f7839 in PostgresMain (argc=2, argv=0xbb9196a4,
    username=0xbb9195f8 "takashi") at postgres.c:3944
#10 0x082a8359 in BackendRun (port=0xbb94f0f0) at postmaster.c:3593
#11 0x082a7a1d in BackendStartup (port=0xbb94f0f0) at postmaster.c:3278
#12 0x082a4c9d in ServerLoop () at postmaster.c:1452
#13 0x082a444c in PostmasterMain (argc=3, argv=0xbfbfe594) at
postmaster.c:1113
#14 0x0822571c in main (argc=3, argv=0xbfbfe594) at main.c:199
(gdb) p PredXact
$3 = (PredXactList) 0xbb53fc80
(gdb) p *PredXact
$4 = {availableList = {prev = 0xbb5479d0, next = 0xbb5407e4}, activeList =
{
    prev = 0xbb548e4c, next = 0xbb53fcc0}, SxactGlobalXmin = 0,
  SxactGlobalXminCount = 0, WritableSxactCount = 0,
  LastSxactCommitSeqNo = 4582775, CanPartialClearThrough = 4582775,
  HavePartialClearedThrough = 3576768, OldCommittedSxact = 0xbb53fcc8,
  element = 0xbb53fcc0}
(gdb)

Re: BUG #5918: SummarizeOldestCommittedSxact assertion failure

From
Heikki Linnakangas
Date:
On 08.03.2011 02:37, YAMAMOTO Takashi wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5918
> Logged by:          YAMAMOTO Takashi
> Email address:      yamt@mwd.biglobe.ne.jp
> PostgreSQL version: 9.1devel
> Operating system:   NetBSD
> Description:        SummarizeOldestCommittedSxact assertion failure
> Details:
>
> running 05d93c38a791836eeceaf8edb0ea8cb19cdf2760 with my patch
> in BUG #5915 applied, i got the following assertion failure.
> given that availableList is not empty and SxactGlobalXminCount == 0,
> i guess it was raced with ReleasePredicateLocks.

Yeah, that's what it looks like. One backend calls
RegisterSerializableTransaction() while all the serializablexact slots
are in use. So it releases SerializableXactHashLock and calls
SummarizeOldestCommittedSxact(). Before SummarizeOldestCommittedSxact()
acquires SerializableFinishedListLock, another backend calls
ReleasePredicateLocks(false), triggering cleanup of old predicate locks,
and ClearOldPredicateLocks() clears all old locks. Now when
SummarizeOldestCommittedSxact() finally gets the lock, it sees that
there are no old transactions to summarize, and trips the assertion.

I think we need to just treat an empty list as normal in
SummarizeOldestcommittedSxact(), patch attached.

Thanks for yet another excellent bug report!

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: BUG #5918: SummarizeOldestCommittedSxact assertion failure

From
Dan Ports
Date:
On Tue, Mar 08, 2011 at 01:22:20PM +0200, Heikki Linnakangas wrote:
> I think we need to just treat an empty list as normal in
> SummarizeOldestcommittedSxact(), patch attached.

I just hit the same assertion. Testing this patch now.

Dan

--
Dan R. K. Ports              MIT CSAIL                http://drkp.net/

Re: BUG #5918: SummarizeOldestCommittedSxact assertion failure

From
Dan Ports
Date:
Looks good -- with this patch I didn't hit any assertion failures or other =
errors during an hour of stress testing with DBT-2.

Dan

Re: BUG #5918: SummarizeOldestCommittedSxact assertion failure

From
"Kevin Grittner"
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> On 08.03.2011 02:37, YAMAMOTO Takashi wrote:

>> i got the following assertion failure. given that availableList
>> is not empty and SxactGlobalXminCount == 0, i guess it was raced
>> with ReleasePredicateLocks.
>
> Yeah, that's what it looks like. One backend calls
> RegisterSerializableTransaction() while all the serializablexact
> slots are in use. So it releases SerializableXactHashLock and
> calls SummarizeOldestCommittedSxact(). Before
> SummarizeOldestCommittedSxact() acquires
> SerializableFinishedListLock, another backend calls
> ReleasePredicateLocks(false), triggering cleanup of old predicate
> locks, and ClearOldPredicateLocks() clears all old locks. Now when
> SummarizeOldestCommittedSxact() finally gets the lock, it sees
> that there are no old transactions to summarize, and trips the
> assertion.
>
> I think we need to just treat an empty list as normal in
> SummarizeOldestcommittedSxact(), patch attached.

Looks good.  I suggest we get that one in before the alpha is cut.
Especially since Dan was able to hit that same assertion an hour
into DBT-2 testing, and didn't hit problems with this patch.

> Thanks for yet another excellent bug report!

Indeed!  I'm quite curious about the testing environment which is
finding these, and very much appreciate all the work to help in
making this feature so solid before we even hit an alpha release.

-Kevin

Re: BUG #5918: SummarizeOldestCommittedSxact assertion failure

From
Heikki Linnakangas
Date:
On 08.03.2011 18:27, Kevin Grittner wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  wrote:
>> I think we need to just treat an empty list as normal in
>> SummarizeOldestcommittedSxact(), patch attached.
>
> Looks good.  I suggest we get that one in before the alpha is cut.
> Especially since Dan was able to hit that same assertion an hour
> into DBT-2 testing, and didn't hit problems with this patch.

Ok, committed.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com