Re: Assertion failing in master, predicate.c - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Assertion failing in master, predicate.c
Date
Msg-id a9a03acc-20cf-d074-c39b-fe581b3f9276@gmail.com
Whole thread Raw
In response to Re: Assertion failing in master, predicate.c  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Assertion failing in master, predicate.c
List pgsql-hackers

On 11/22/19 3:25 PM, Tom Lane wrote:
> I wrote:
>> Mark Dilger <hornschnorter@gmail.com> writes:
>>> `git bisect` shows the problem occurs earlier than that, and by
>>> chance the first bad commit was one of yours.  I'm not surprised
>>> that your commit was regarding LISTEN/NOTIFY, as the error is
>>> always triggered with a LISTEN statement.  (I've now hit this
>>> many times in many tests of multiple SQL statements, and the
>>> last statement before the error is always a LISTEN.)
> 
>> Oh my, that's interesting!  I had wondered a bit about the LISTEN
>> changes, but it's hard to see how those could have any connection
>> to serializable mode.  This will be an entertaining debugging
>> exercise ...
> 
> It looks to me like this is an ancient bug that just happened to be
> made more probable by 51004c717.  That Assert in predicate.c is
> basically firing because MySerializableXact got created *after*
> PreCommit_CheckForSerializationFailure, which is what should have
> marked it as prepared.  And that will happen, if we're in serializable
> mode and this is the first LISTEN of the session, because
> CommitTransaction() calls PreCommit_Notify after it calls
> PreCommit_CheckForSerializationFailure, and PreCommit_Notify calls
> asyncQueueReadAllNotifications which wants to get a snapshot, and
> the transaction had no snapshot before.
> 
> The only reason it's showing up now is that actually the logic is
> 
>      if (!QUEUE_POS_EQUAL(max, head))
>          asyncQueueReadAllNotifications();
> 
> that is, we'll skip the problematic call if the notify queue is
> visibly empty.  But 51004c717 changed how aggressively we move
> the queue tail forward, so that in this simple example we will
> now see the queue as possibly not empty, where we would have
> decided it was empty before.

Right, I've been staring at that code for the last couple hours,
trying to see a problem with it.  I tried making the code a bit
more aggressive about moving the tail forward to see if that
would help, but the only fix that worked was completely reverting
yours and Martijn's commit.  It makes sense now.

> Of course, the bug exists anyway, because concurrent NOTIFY traffic
> could certainly cause the queue to be nonempty at this point.
> I venture that the only reason we've not seen field reports of
> this issue is that people don't run with asserts on in production
> (and, I guess, the problem is actually harmless except for the
> Assert).  Or maybe people don't use serializable mode in apps
> that use LISTEN/NOTIFY?
> 
> Anyway, it seems like the simplest fix is to swap the order of
> the PreCommit_CheckForSerializationFailure and PreCommit_Notify
> steps in CommitTransaction.  There's also PrepareTransaction
> to think about, but there again we could just move AtPrepare_Notify
> up; it's only going to throw an error anyway, so we might as well
> do that sooner.

I changed PrepareTransaction and CommitTransaction in the manner
you suggest, and the tests pass now.  I have not yet looked over
all the other possible implications of this change, so I'll go
do that for a while.

> 
> An alternative idea is to use some other way of getting a snapshot
> in asyncQueueReadAllNotifications, one that always gets a current
> snapshot and doesn't enter predicate.c.  But that might have semantic
> consequences on the timing of notifications.  I'm not really sure
> that anybody's ever thought hard about how async.c ought to act
> in serializable mode, so this might or might not be a good change.

The semantics of receiving a notification in serializable mode are
not clear, unless you just insist on not receiving any.  The whole
point of serializable mode, as I understand it, it to be given the
impression that all your work happens either before or after other
transactions' work.  It hardly makes sense to receive a notification
mid transaction informing you of some other transaction having just
changed something.

I don't propose any changes to this, though, since it may break
existing applications.  I prefer the simplicity of your suggestion
above.

-- 
Mark Dilger



pgsql-hackers by date:

Previous
From: Nikita Glukhov
Date:
Subject: Re: Avoid full GIN index scan when possible
Next
From: Ranier Vilela
Date:
Subject: RE: [BUG] (firsttupleslot)==NULL is redundant or is possible nulldereference?