Re: BUG #6425: Bus error in slot_deform_tuple - Mailing list pgsql-bugs
From | Simon Riggs |
---|---|
Subject | Re: BUG #6425: Bus error in slot_deform_tuple |
Date | |
Msg-id | CA+U5nMJkaLowf=Vksbh30MBHMQdT2D65fwZfTWF6SQfbT8429A@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #6425: Bus error in slot_deform_tuple (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #6425: Bus error in slot_deform_tuple
|
List | pgsql-bugs |
On Fri, Feb 3, 2012 at 6:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I have not gotten very far with the coredump, except to observe that >> gdb says the Assert ought to have passed: ... >> This suggests very strongly that indeed the buffer was changing under >> us. > > I probably ought to let the test case run overnight before concluding > anything, but at this point it's run for two-plus hours with no errors > after applying this patch: > > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/trans= am/xlog.c > index cce87a3..b128bfd 100644 > *** a/src/backend/access/transam/xlog.c > --- b/src/backend/access/transam/xlog.c > *************** RestoreBkpBlocks(XLogRecPtr lsn, XLogRec > *** 3716,3724 **** > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0else > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* must zero-fill the hole = */ > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 MemSet((char *) page, 0, BL= CKSZ); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy((char *) page, blk,= bkpb.hole_offset); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy((char *) page + (bk= pb.hole_offset + bkpb.hole_length), > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 blk += bkpb.hole_offset, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 BLCKS= Z - (bkpb.hole_offset + bkpb.hole_length)); > --- 3716,3724 ---- > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0else > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy((char *) page, blk,= bkpb.hole_offset); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* must zero-fill the hole = */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 MemSet((char *) page + bkpb= .hole_offset, 0, bkpb.hole_length); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy((char *) page + (bk= pb.hole_offset + bkpb.hole_length), > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 blk += bkpb.hole_offset, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 BLCKS= Z - (bkpb.hole_offset + bkpb.hole_length)); > > > The existing code makes the page state transiently invalid (all zeroes) > for no particularly good reason, and consumes useless cycles to do so, > so this would be a good change in any case. =A0The reason it is relevant > to our current problem is that even though RestoreBkpBlocks faithfully > takes exclusive lock on the buffer, *that is not enough to guarantee > that no one else is touching that buffer*. =A0Another backend that has > already located a visible tuple on a page is entitled to keep accessing > that tuple with only a buffer pin. =A0So the existing code transiently > wipes the data from underneath the other backend's pin. > > It's clear how this explains the symptoms Yes, that looks like the murder weapon. --=20 =A0Simon Riggs=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 http:/= /www.2ndQuadrant.com/ =A0PostgreSQL Development, 24x7 Support, Training & Services
pgsql-bugs by date: