Re: Reviewing freeze map code - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Reviewing freeze map code
Date
Msg-id 20160623224224.ubztdvhfx6bwkphu@alap3.anarazel.de
Whole thread Raw
In response to Re: Reviewing freeze map code  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Reviewing freeze map code  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On 2016-06-21 16:32:03 -0400, Robert Haas wrote:
> On Tue, Jun 21, 2016 at 3:46 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-06-21 15:38:25 -0400, Robert Haas wrote:
> >> On Tue, Jun 21, 2016 at 1:49 PM, Andres Freund <andres@anarazel.de> wrote:
> >> >> I'm also a bit dubious that LockAcquire is safe to call in general
> >> >> with interrupts held.
> >> >
> >> > Looks like we could just acquire the tuple-lock *before* doing the
> >> > toast_insert_or_update/RelationGetBufferForTuple, but after releasing
> >> > the buffer lock. That'd allow us to do avoid doing the nested locking,
> >> > should make the recovery just a goto l2;, ...
> >>
> >> Why isn't that racey?  Somebody else can grab the tuple lock after we
> >> release the buffer content lock and before we acquire the tuple lock.
> >
> > Sure, but by the time the tuple lock is released, they'd have updated
> > xmax. So once we acquired that we can just do
> >                 if (xmax_infomask_changed(oldtup.t_data->t_infomask,
> >                                                                   infomask) ||
> >                         !TransactionIdEquals(HeapTupleHeaderGetRawXmax(oldtup.t_data),
> >                                                                  xwait))
> >                         goto l2;
> > which is fine, because we've not yet done the toasting.
> 
> I see.
> 
> > I'm not sure wether this approach is better than deleting potentially
> > toasted data though. It's probably faster, but will likely touch more
> > places in the code, and eat up a infomask bit (infomask & HEAP_MOVED
> > == HEAP_MOVED in my prototype).
> 
> Ugh.  That's not very desirable at all.

I'm looking into three approaches right now:

1) Flag approach from above
2) Undo toasting on concurrent activity, retry
3) Use WAL logging for the already_marked = true case.


1) primarily suffers from a significant amount of complexity. I still
have a bug in there that sometimes triggers "attempted to update
invisible tuple" ERRORs.  Otherwise it seems to perform decently
performancewise - even on workloads with many backends hitting the same
tuple, the retry-rate is low.

2) Seems to work too, but due to the amount of time the tuple is not
locked, the retry rate can be really high. As we perform significant
amount of work (toast insertion & index manipulation or extending a
file) , while the tuple is not locked, it's quite likely that another
session tries to modify the tuple inbetween.  I think it's possible to
essentially livelock.

3) This approach so far seems the best. It's possible to reuse the
xl_heap_lock record (in an afaics backwards compatible manner), and in
most cases the overhead isn't that large.  It's of course annoying to
emit more WAL, but it's not that big an overhead compared to extending a
file, or to toasting.  It's also by far the simplest fix.


Comments?



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Questionabl description in datatype.sgml
Next
From: Alvaro Herrera
Date:
Subject: Re: Reviewing freeze map code