Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Date
Msg-id 5390BA0F.2030103@vmware.com
Whole thread Raw
In response to BUG #10533: 9.4 beta1 assertion failure in autovacuum process  (levertond@googlemail.com)
Responses Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-bugs
On 06/05/2014 09:01 PM, levertond@googlemail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      10533
> Logged by:          David Leverton
> Email address:      levertond@googlemail.com
> PostgreSQL version: 9.4beta1
> Operating system:   RHEL 5 x86_64
> Description:
>
> Our application's test suite triggers an assertion failure in an autovacuum
> process under 9.4 beta1.  I wasn't able to reduce it to a nice test case,
> but I hope the backtrace illustrates the problem:

Yes, it does, thanks for the report!

> #0  0x00000032bae30265 in raise () from /lib64/libc.so.6
> #1  0x00000032bae31d10 in abort () from /lib64/libc.so.6
> #2  0x000000000078b69d in ExceptionalCondition (conditionName=<value
> optimized out>, errorType=<value optimized out>,
>      fileName=<value optimized out>, lineNumber=<value optimized out>) at
> assert.c:54
> #3  0x00000000007ad6e2 in palloc (size=16) at mcxt.c:670
> #4  0x00000000004d3592 in GetMultiXactIdMembers (multi=75092,
> members=0x7fff915f9468, allow_old=0 '\000') at multixact.c:1242
> #5  0x0000000000495c9c in MultiXactIdGetUpdateXid (xmax=17061,
> t_infomask=<value optimized out>) at heapam.c:6059
> #6  0x00000000007ba93c in HeapTupleHeaderIsOnlyLocked (tuple=0x42a5) at
> tqual.c:1539
> #7  0x00000000007baf2c in HeapTupleSatisfiesVacuum (htup=<value optimized
> out>, OldestXmin=67407, buffer=347) at tqual.c:1174
> #8  0x00000000005a96eb in heap_page_is_all_visible (onerel=0x2b1b020f3f58,
> blkno=86, buffer=347, tupindex=339, vacrelstats=0x1cfe3148,
>      vmbuffer=0x7fff915fa65c) at vacuumlazy.c:1788
> #9  lazy_vacuum_page (onerel=0x2b1b020f3f58, blkno=86, buffer=347,
> tupindex=339, vacrelstats=0x1cfe3148, vmbuffer=0x7fff915fa65c)
>      at vacuumlazy.c:1220
> ...

MultiXactIdGetUpdateXid() calls GetMultiXactIdMembers(), which can fail
if you run out of memory. That's not cool if you're in a critical
section, as the error will be promoted to PANIC; the assertion checks
that you don't call palloc() while in a critical section, to catch that
kind of problems early. The potential for a problem is there in 9.3 as
well, but the assertion was only added to 9.4 fairly recently. That
function requires very little memory, so it's highly unlikely to fail
with OOM in practice, but in theory it could.

I think we'll need a variant of GetMultiXactIdMembers() that only
returns the update XID, avoiding the palloc(). The straight-forward fix
would be to copy-paste contents of GetMultiXactIdMembers() into
MultiXactIdGetUpdateXid(), and instead of returning the members in an
array, only return the update-xid. But it's a long and complicated
function, so copy-pasting is not a good option. I think it needs to be
refactored into some kind of a helper function that both
MultiXactIdGetUpdateXid() and GetMultiXactIdMembers() could call.

- Heikki

pgsql-bugs by date:

Previous
From: levertond@googlemail.com
Date:
Subject: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Next
From: Andres Freund
Date:
Subject: Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process