Re: BUG #6425: Bus error in slot_deform_tuple - Mailing list pgsql-bugs

From Simon Riggs
Subject Re: BUG #6425: Bus error in slot_deform_tuple
Date
Msg-id CA+U5nMJkaLowf=Vksbh30MBHMQdT2D65fwZfTWF6SQfbT8429A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #6425: Bus error in slot_deform_tuple  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #6425: Bus error in slot_deform_tuple
List pgsql-bugs
On Fri, Feb 3, 2012 at 6:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I have not gotten very far with the coredump, except to observe that
>> gdb says the Assert ought to have passed: ...
>> This suggests very strongly that indeed the buffer was changing under
>> us.
>
> I probably ought to let the test case run overnight before concluding
> anything, but at this point it's run for two-plus hours with no errors
> after applying this patch:
>
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/trans=
am/xlog.c
> index cce87a3..b128bfd 100644
> *** a/src/backend/access/transam/xlog.c
> --- b/src/backend/access/transam/xlog.c
> *************** RestoreBkpBlocks(XLogRecPtr lsn, XLogRec
> *** 3716,3724 ****
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0else
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* must zero-fill the hole =
*/
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 MemSet((char *) page, 0, BL=
CKSZ);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy((char *) page, blk,=
 bkpb.hole_offset);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy((char *) page + (bk=
pb.hole_offset + bkpb.hole_length),
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 blk +=
 bkpb.hole_offset,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 BLCKS=
Z - (bkpb.hole_offset + bkpb.hole_length));
> --- 3716,3724 ----
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0else
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy((char *) page, blk,=
 bkpb.hole_offset);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* must zero-fill the hole =
*/
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 MemSet((char *) page + bkpb=
.hole_offset, 0, bkpb.hole_length);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy((char *) page + (bk=
pb.hole_offset + bkpb.hole_length),
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 blk +=
 bkpb.hole_offset,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 BLCKS=
Z - (bkpb.hole_offset + bkpb.hole_length));
>
>
> The existing code makes the page state transiently invalid (all zeroes)
> for no particularly good reason, and consumes useless cycles to do so,
> so this would be a good change in any case. =A0The reason it is relevant
> to our current problem is that even though RestoreBkpBlocks faithfully
> takes exclusive lock on the buffer, *that is not enough to guarantee
> that no one else is touching that buffer*. =A0Another backend that has
> already located a visible tuple on a page is entitled to keep accessing
> that tuple with only a buffer pin. =A0So the existing code transiently
> wipes the data from underneath the other backend's pin.
>
> It's clear how this explains the symptoms

Yes, that looks like the murder weapon.

--=20
=A0Simon Riggs=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 http:/=
/www.2ndQuadrant.com/
=A0PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: BUG #6347: Reopening bug #6085
Next
From: Bridget Frey
Date:
Subject: Re: BUG #6425: Bus error in slot_deform_tuple