Thread: On conflict update & hint bits

On conflict update & hint bits

From
Konstantin Knizhnik
Date:
Hi,

I am faced with rarely reproduced problem at our multimaster (and never 
at vanilla Postgres).
We are using our own customized transaction manager, so it may be 
definitely the problem in our multimaster.
But stack trace looks suspiciously and this is why I want to consult 
with people familiar with this code whether it is bug in 
ExecOnConflictUpdate or not.

Briefly: ExecOnConflictUpdate tries to set hint bit without holding lock 
on the buffer and so get assertion failure in MarkBufferDirtyHint.

Now stack trace:

#0  0x00007fe3b940acc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fe3b940e0d8 in __GI_abort () at abort.c:89
#2  0x000000000097b996 in ExceptionalCondition (conditionName=0xb4d970 
"!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", 
errorType=0xb4d2e9 "FailedAssertion",    fileName=0xb4d2e0 "bufmgr.c", lineNumber=3380) at assert.c:54
#3  0x00000000007e365b in MarkBufferDirtyHint (buffer=946, buffer_std=1 
'\001') at bufmgr.c:3380
#4  0x00000000009c3660 in SetHintBits (tuple=0x7fe396a9d858, buffer=946, 
infomask=256, xid=1398) at tqual.c:136
#5  0x00000000009c5194 in HeapTupleSatisfiesMVCC (htup=0x7ffc00169030, 
snapshot=0x2e79778, buffer=946) at tqual.c:1065
#6  0x00000000006ace83 in ExecCheckHeapTupleVisible (estate=0x2e81ae8, 
tuple=0x7ffc00169030, buffer=946) at nodeModifyTable.c:197
#7  0x00000000006ae343 in ExecOnConflictUpdate (mtstate=0x2e81d50, 
resultRelInfo=0x2e81c38, conflictTid=0x7ffc001690c0, planSlot=0x2e82428, 
excludedSlot=0x2e82428, estate=0x2e81ae8,    canSetTag=1 '\001', returning=0x7ffc001690c8) at nodeModifyTable.c:1173
#8  0x00000000006ad256 in ExecInsert (mtstate=0x2e81d50, slot=0x2e82428, 
planSlot=0x2e82428, arbiterIndexes=0x2e7eeb0, 
onconflict=ONCONFLICT_UPDATE, estate=0x2e81ae8, canSetTag=1 '\001')    at nodeModifyTable.c:395
#9  0x00000000006aebe3 in ExecModifyTable (node=0x2e81d50) at 
nodeModifyTable.c:1496

In ExecOnConflictUpdate buffer is pinned but not locked:
    /*     * Lock tuple for update.  Don't follow updates when tuple cannot be     * locked without doing so.  A row
lockingconflict here means our     * previous conclusion that the tuple is conclusively committed is not     * true
anymore.    */    tuple.t_self = *conflictTid;    test = heap_lock_tuple(relation, &tuple, estate->es_output_cid,
                   lockmode, LockWaitBlock, false, &buffer,                           &hufd);
 

heap_lock_tuple is pinning buffer but not locking it: *    *buffer: set to buffer holding tuple (pinned but not locked
atexit)
 

Later we try to check tuple visibility:
    ExecCheckHeapTupleVisible(estate, &tuple, buffer);

and inside HeapTupleSatisfiesMVCC try to set hint bit.

MarkBufferDirtyHint assumes that buffer is locked: * 2. The caller might have only share-lock instead of exclusive-lock

on the *      buffer's content lock.

and we get assertion failure in
    /* here, either share or exclusive lock is OK */
Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));

So the question is whether it is correct that ExecOnConflictUpdate tries 
to access and update tuple without holding lock on the buffer?

Thank in advance,

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: On conflict update & hint bits

From
Peter Geoghegan
Date:
On Fri, Sep 30, 2016 at 5:33 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> Later we try to check tuple visibility:
>
>     ExecCheckHeapTupleVisible(estate, &tuple, buffer);
>
> and inside HeapTupleSatisfiesMVCC try to set hint bit.

So, you're using repeatable read or serializable isolation level?

-- 
Peter Geoghegan



Re: On conflict update & hint bits

From
Konstantin Knizhnik
Date:

On 30.09.2016 19:37, Peter Geoghegan wrote:
> On Fri, Sep 30, 2016 at 5:33 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> Later we try to check tuple visibility:
>>
>>      ExecCheckHeapTupleVisible(estate, &tuple, buffer);
>>
>> and inside HeapTupleSatisfiesMVCC try to set hint bit.
> So, you're using repeatable read or serializable isolation level?
>
Repeatable read.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: On conflict update & hint bits

From
Peter Geoghegan
Date:
On Fri, Sep 30, 2016 at 5:33 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> So the question is whether it is correct that ExecOnConflictUpdate tries to
> access and update tuple without holding lock on the buffer?

You're right -- this is a bug in Postgres.

I'm travelling from Ireland to the USA this weekend, but will work on
this early next week. I don't think it's a particularly tricky fix --
as you say, it is necessary to have at least a shared buffer lock to
call HeapTupleSatisfiesVisibility(), and we quite simply fail to
ensure that. We must have a shared buffer lock in the visibility-check
path for ON CONFLICT DO UPDATE where isolation level > READ COMMITTED
-- a buffer pin is not enough.

It also looks like the DO NOTHING variant is similarly affected, even
when the isolation level is READ COMMITTED.    :-(

Thanks for the detailed report.

-- 
Peter Geoghegan



Re: On conflict update & hint bits

From
Peter Geoghegan
Date:
On Sat, Oct 1, 2016 at 1:15 PM, Peter Geoghegan <pg@heroku.com> wrote:
> It also looks like the DO NOTHING variant is similarly affected, even
> when the isolation level is READ COMMITTED.    :-(

Actually, the DO NOTHING variant is also only affected in isolation
levels greater than READ COMMITTED. Not sure why I thought otherwise.


-- 
Peter Geoghegan



Re: On conflict update & hint bits

From
Peter Geoghegan
Date:
On Sat, Oct 1, 2016 at 5:15 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Sep 30, 2016 at 5:33 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> So the question is whether it is correct that ExecOnConflictUpdate tries to
>> access and update tuple without holding lock on the buffer?
>
> You're right -- this is a bug in Postgres.

(Thomas Munro and Kevin Grittner added to CC list.)

Attached is a revision of Thomas' patch to fix a related issue; it now
also fixes this no-buffer-lock-held issue. He posted his version of
this to a -general mailing list thread a week ago [1]. This was a
thread about spurious serialization failure with ON CONFLICT DO
NOTHING. I understand that Kevin is still not happy with the behavior
under SSI even with our fix, since serialization failures will still
occur more often than necessary (see other thread for details of what
I'm talking about).

I believe this patch to be a strict improvement on master, and I
suggest it be applied in advance of the deadline to get fixes in for
the next set of point releases. Note that this revision includes
Thomas' isolation test, which is completely unchanged. I still intend
to work with Kevin to do better than this under SSI, though that will
probably be for a future point release. The behavior proposed by my
fix here is the right thing for REPEATABLE READ isolation level, which
has nothing to do with Kevin's proposed more careful handling under
SSI. Besides, the buffer pin bug reported by Konstantin on this thread
really should be addressed ASAP.

[1] https://www.postgresql.org/message-id/CAEepm=3Ra9NgDHocDBtB4iiB7MWdavQybNS3F47SvKh1Mk-mFQ@mail.gmail.com
--
Peter Geoghegan

Attachment

Re: On conflict update & hint bits

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> Attached is a revision of Thomas' patch to fix a related issue; it now
> also fixes this no-buffer-lock-held issue.

I think this needs more work.

1. Why are you replacing ExecOnConflictUpdate's ExecCheckHeapTupleVisible
call with ExecCheckTIDVisible?  That appears to demand a re-fetch of the
tuple ExecOnConflictUpdate already has its hands on.  Wouldn't it be
better to just put a buffer lock/unlock around the existing code?

2. Your proposed coding

+    if (!heap_fetch(rel, estate->es_snapshot, &tuple, &buffer, true, NULL))
+ ...
+        if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))

will dump core in the case where heap_fetch returns false with
tuple.t_data set to null.  If that case is impossible here,
it's not apparent why, and it'd surely deserve at least a comment,
maybe an Assert.  But I'd rather just assume it's possible.

I'm not taking a position on whether this is OK at the higher level
of whether the SSI behavior could be better.  However, unless Kevin
has objections to committing something like this now, I think we
should fix the above-mentioned problems and push it.  The existing
behavior is clearly bad.
        regards, tom lane



Re: On conflict update & hint bits

From
Peter Geoghegan
Date:
On Sat, Oct 22, 2016 at 9:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 1. Why are you replacing ExecOnConflictUpdate's ExecCheckHeapTupleVisible
> call with ExecCheckTIDVisible?  That appears to demand a re-fetch of the
> tuple ExecOnConflictUpdate already has its hands on.  Wouldn't it be
> better to just put a buffer lock/unlock around the existing code?

I thought that that was less than idiomatic within nodeModifyTable.c
-- executor modules rarely directly acquire buffer locks. More
importantly, while the lighter weight ExecCheckHeapTupleVisible()
variant seemed to make sense when there was no buffer lock
acquisition, now that it's clear that a buffer lock acquisition is
necessary, I am skeptical. I now doubt that the added complexity pays
for itself, which is why I proposed to remove
ExecCheckHeapTupleVisible(). Besides, ON CONFLICT seems like a feature
that is significantly less compelling at higher isolation levels; that
must be why it took this long to hear a problem report like
Konstantin's.

> 2. Your proposed coding
>
> +       if (!heap_fetch(rel, estate->es_snapshot, &tuple, &buffer, true, NULL))
> + ...
> +               if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))
>
> will dump core in the case where heap_fetch returns false with
> tuple.t_data set to null.  If that case is impossible here,
> it's not apparent why, and it'd surely deserve at least a comment,
> maybe an Assert.  But I'd rather just assume it's possible.

You could say the same thing about at least one other existing place
that more or less assumes a conflictTid heap_fetch() or similar cannot
allow that to happen, I think. Maybe this is just as good a place to
talk about it as any other, though. I defer to you.

(It's safe because our snapshot is a kind of interlock against vacuum
killing the conflictTid tuple.)

> I'm not taking a position on whether this is OK at the higher level
> of whether the SSI behavior could be better.  However, unless Kevin
> has objections to committing something like this now, I think we
> should fix the above-mentioned problems and push it.  The existing
> behavior is clearly bad.

I don't have any strong feelings on it myself just yet; I trust that
Kevin's judgement that we should do better under SSI is sound. I
should have mentioned that Kevin encouraged me to do this much first
in my description of the patch. I'm pretty confident that he will have
no objection to doing something like this for now.

-- 
Peter Geoghegan



Re: On conflict update & hint bits

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> On Sat, Oct 22, 2016 at 9:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 1. Why are you replacing ExecOnConflictUpdate's ExecCheckHeapTupleVisible
>> call with ExecCheckTIDVisible?  That appears to demand a re-fetch of the
>> tuple ExecOnConflictUpdate already has its hands on.  Wouldn't it be
>> better to just put a buffer lock/unlock around the existing code?

> I thought that that was less than idiomatic within nodeModifyTable.c
> -- executor modules rarely directly acquire buffer locks.

"Rarely" is not "never".  A bigger problem though is that heap_fetch
does more than just lock the buffer: there are also PredicateLockTuple
and CheckForSerializableConflictOut calls in there.  It's possible that
those are no-ops in this usage (because after all we already fetched
the tuple once), or maybe they're even desirable because they would help
resolve Kevin's concerns.  But it hasn't been analyzed and so I'm not
prepared to risk a behavioral change in this already under-tested area
the day before a release wrap.

AFAICT, it's very hard to get to an actual failure from the lack of
buffer lock here.  You would need to have a situation where the tuple
was not hintable when originally fetched but has become so by the time
ON CONFLICT is rechecking it.  That's possible, eg if we're using
async commit and the previous transaction's commit record has gotten
flushed to disk in between, but it's not likely.  Even then, no actual
problem would ensue (in non-assert builds) from setting a hint bit without
buffer lock, except in very hard-to-hit race conditions.  So the buffer
lock issue doesn't seem urgent enough to me to justify making a fix under
time pressure.

The business about not throwing a serialization failure due to actions
of our own xact does seem worth fixing now, but if I understand correctly,
we can deal with that by adding a test for xmin-is-our-own-xact into
the existing code structure.  I propose doing that much (and adding
Munro's regression test case) and calling it good for today.
        regards, tom lane



Re: On conflict update & hint bits

From
Peter Geoghegan
Date:
On Sun, Oct 23, 2016 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Rarely" is not "never".  A bigger problem though is that heap_fetch
> does more than just lock the buffer: there are also PredicateLockTuple
> and CheckForSerializableConflictOut calls in there.  It's possible that
> those are no-ops in this usage (because after all we already fetched
> the tuple once), or maybe they're even desirable because they would help
> resolve Kevin's concerns.  But it hasn't been analyzed and so I'm not
> prepared to risk a behavioral change in this already under-tested area
> the day before a release wrap.

The heap_fetch() contract doesn't ask its callers to consider any of
that. Besides, those actions (PredicateLockTuple +
CheckForSerializableConflictOut) are very probably redundant no-ops as
you say. (They won't help with Kevin's additional concerns, BTW,
because Kevin is concerned about excessive serialization failures with
SSI.)

> AFAICT, it's very hard to get to an actual failure from the lack of
> buffer lock here.  You would need to have a situation where the tuple
> was not hintable when originally fetched but has become so by the time
> ON CONFLICT is rechecking it.  That's possible, eg if we're using
> async commit and the previous transaction's commit record has gotten
> flushed to disk in between, but it's not likely.  Even then, no actual
> problem would ensue (in non-assert builds) from setting a hint bit without
> buffer lock, except in very hard-to-hit race conditions.  So the buffer
> lock issue doesn't seem urgent enough to me to justify making a fix under
> time pressure.
>
> The business about not throwing a serialization failure due to actions
> of our own xact does seem worth fixing now, but if I understand correctly,
> we can deal with that by adding a test for xmin-is-our-own-xact into
> the existing code structure.  I propose doing that much (and adding
> Munro's regression test case) and calling it good for today.

I'm surprised at how you've assessed the risk here. I think that the
risk of not proceeding with fixing the buffer lock issue is greater
than the risk of breaking something else with the proposed fix. Even
if the former risk isn't such a big one.

If there are a non-trivial number of users that are particularly
attached to the precise behavior of higher isolation levels in master
(assuming it would be altered by the proposed fix), then it's
surprising that it took so many months for someone to complain about
the ON CONFLICT DO NOTHING behavior being blatantly broken.

-- 
Peter Geoghegan



Re: On conflict update & hint bits

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> On Sun, Oct 23, 2016 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Rarely" is not "never".  A bigger problem though is that heap_fetch
>> does more than just lock the buffer: there are also PredicateLockTuple
>> and CheckForSerializableConflictOut calls in there.  It's possible that
>> those are no-ops in this usage (because after all we already fetched
>> the tuple once), or maybe they're even desirable because they would help
>> resolve Kevin's concerns.  But it hasn't been analyzed and so I'm not
>> prepared to risk a behavioral change in this already under-tested area
>> the day before a release wrap.

> I'm surprised at how you've assessed the risk here.

What's bothering me is (a) it's less than 24 hours to release wrap and
(b) this patch changes SSI-relevant behavior and hasn't been approved
by Kevin.  I'm not familiar enough with that logic to commit a change
in it on my own authority, especially with so little time for problems
to be uncovered.

I'm okay with adding an explicit buffer lock/unlock pair, and in fact
plan to go have a look at that in a bit.  I'm not okay with doing a
refactoring that might change the behavior in more ways than that
under these circumstances.
        regards, tom lane



Re: On conflict update & hint bits

From
Peter Geoghegan
Date:
On Sun, Oct 23, 2016 at 2:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What's bothering me is (a) it's less than 24 hours to release wrap and
> (b) this patch changes SSI-relevant behavior and hasn't been approved
> by Kevin.  I'm not familiar enough with that logic to commit a change
> in it on my own authority, especially with so little time for problems
> to be uncovered.

I should point out that I knew that the next set of point releases had
been moved forward much later than you did. I had to work on this fix
during the week, which was pretty far from ideal for me for my own
reasons.

> I'm okay with adding an explicit buffer lock/unlock pair, and in fact
> plan to go have a look at that in a bit.  I'm not okay with doing a
> refactoring that might change the behavior in more ways than that
> under these circumstances.

Fair enough. As long as we do that much, I'm comfortable.

-- 
Peter Geoghegan



Re: On conflict update & hint bits

From
Konstantin Knizhnik
Date:

On 24.10.2016 00:49, Peter Geoghegan wrote:
> On Sun, Oct 23, 2016 at 2:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What's bothering me is (a) it's less than 24 hours to release wrap and
>> (b) this patch changes SSI-relevant behavior and hasn't been approved
>> by Kevin.  I'm not familiar enough with that logic to commit a change
>> in it on my own authority, especially with so little time for problems
>> to be uncovered.
> I should point out that I knew that the next set of point releases had
> been moved forward much later than you did. I had to work on this fix
> during the week, which was pretty far from ideal for me for my own
> reasons.
>
Just for information: I know that you are working on this issue, but as 
far as we need to proceed further with our testing of multimaster,
I have done the following obvious changes and it fixes the problem (at 
least this assertion failure is not happen any more):

src/backend/executor/nodeModifyTable.c

@@ -1087,6 +1087,13 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,     test = heap_lock_tuple(relation, &tuple,
estate->es_output_cid,                           lockmode, LockWaitBlock, false, &buffer,
&hufd);
+    /*
+     * We must hold share lock on the buffer content while examining tuple
+     * visibility.  Afterwards, however, the tuples we have found to be
+     * visible are guaranteed good as long as we hold the buffer pin.
+     */
+    LockBuffer(buffer, BUFFER_LOCK_SHARE);
+     switch (test)     {         case HeapTupleMayBeUpdated:
@@ -1142,6 +1149,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,              * loop here, as the new version of
therow might not conflict              * anymore, or the conflicting tuple has actually been 
 
deleted.              */
+            LockBuffer(buffer, BUFFER_LOCK_UNLOCK);             ReleaseBuffer(buffer);             return false;

@@ -1175,6 +1183,8 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,     /* Store target's existing tuple in the
state'sdedicated slot */     ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
 

+    LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+     /*      * Make tuple and any needed join variables available to ExecQual and      * ExecProject.  The EXCLUDED
tupleis installed in 
 
ecxt_innertuple, while

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: On conflict update & hint bits

From
Tom Lane
Date:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> Just for information: I know that you are working on this issue, but as
> far as we need to proceed further with our testing of multimaster,
> I have done the following obvious changes and it fixes the problem (at
> least this assertion failure is not happen any more):

This seems kind of the hard way --- why didn't you put the buffer lock
calls into ExecCheckHeapTupleVisible?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8f1fb7d621b0e6bd2eb0ba2ac9634c5b5a03564b
        regards, tom lane



Re: On conflict update & hint bits

From
Kevin Grittner
Date:
On Sun, Oct 23, 2016 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The business about not throwing a serialization failure due to actions
> of our own xact does seem worth fixing now, but if I understand correctly,
> we can deal with that by adding a test for xmin-is-our-own-xact into
> the existing code structure.  I propose doing that much (and adding
> Munro's regression test case) and calling it good for today.

Thanks.  This is the only part of it that I consider an actual
*bug* (since you can retry the serialization failure forever and
never move on because there is no other transaction involved which
can finish to clear the problem) as opposed to an opportunity to
optimize (by reducing false positive serialization failures
actually involving other transactions).  I never saw anything in
the literature addressing the intersection of UPSERT and SSI, and I
think we do need to think about and discuss this a bit more before
we can be sure of the best fix.  This is probably not thread on
which to have that discussion.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company