Thread: Assertion failing in master, predicate.c

Assertion failing in master, predicate.c

From
Mark Dilger
Date:
Hackers,

I stumbled upon an assertion while testing master for possible
bugs.  I am reporting it here in the hope that this report will
be useful.  The attached small regression test patch consistently
triggers an assert in predicate.c:

   TRAP: FailedAssertion("!isCommit || 
SxactIsPrepared(MySerializableXact)", File: "predicate.c", Line: 3372)

I originally hit this from sources with less than recent
code checked out, but the error is the same in a recent,
fresh `git clone` (4a0aab14dcb35550b55e623a3c194442c5666084)
The problem does not reproduce for me in REL_12_STABLE, though the
same assertion does exist in that branch.

I built on my laptop:

   Linux 4.19.0-5-amd64 #1 SMP Debian 4.19.37-3 (2019-05-15) x86_64 
GNU/Linux

I built using

   `./configure --enable-cassert --enable-tap-tests --with-perl 
--with-python --with-tcl`

The perl, python, and tcl options don't appear to matter, as nothing
changes using

   `./configure --enable-cassert && make -j4 && make check-world`

-- 
Mark Dilger

Attachment

Re: Assertion failing in master, predicate.c

From
Mark Dilger
Date:

On 11/21/19 6:20 PM, Mark Dilger wrote:
> Hackers,
> 
> I stumbled upon an assertion while testing master for possible
> bugs.  I am reporting it here in the hope that this report will
> be useful.  The attached small regression test patch consistently
> triggers an assert in predicate.c:
> 
>    TRAP: FailedAssertion("!isCommit || 
> SxactIsPrepared(MySerializableXact)", File: "predicate.c", Line: 3372)

I have winnowed down the test a bit further.  The attached
smaller patch still triggers the same assertion as the prior
patch did.

-- 
Mark Dilger

Attachment

Re: Assertion failing in master, predicate.c

From
Tom Lane
Date:
Mark Dilger <hornschnorter@gmail.com> writes:
> I have winnowed down the test a bit further.  The attached
> smaller patch still triggers the same assertion as the prior
> patch did.

FWIW, I can reproduce the assertion failure with your first test,
but not with this simplified one.

I also confirm that it only happens in HEAD, not v12.  I've not
actually bisected, but a look at the git history for predicate.c
sure makes it look like db2687d1f ("Optimize PredicateLockTuple")
must be to blame.

            regards, tom lane



Re: Assertion failing in master, predicate.c

From
Mark Dilger
Date:

On 11/21/19 8:03 PM, Tom Lane wrote:
> Mark Dilger <hornschnorter@gmail.com> writes:
>> I have winnowed down the test a bit further.  The attached
>> smaller patch still triggers the same assertion as the prior
>> patch did.
> 
> FWIW, I can reproduce the assertion failure with your first test,
> but not with this simplified one.

Thanks for checking!

> I also confirm that it only happens in HEAD, not v12.  I've not
> actually bisected, but a look at the git history for predicate.c
> sure makes it look like db2687d1f ("Optimize PredicateLockTuple")
> must be to blame.

`git bisect` shows the problem occurs earlier than that, and by
chance the first bad commit was one of yours.  I'm not surprised
that your commit was regarding LISTEN/NOTIFY, as the error is
always triggered with a LISTEN statement.  (I've now hit this
many times in many tests of multiple SQL statements, and the
last statement before the error is always a LISTEN.)

I'm cc'ing Martijn because he is mentioned in that commit.


51004c7172b5c35afac050f4d5849839a06e8d3b is the first bad commit
commit 51004c7172b5c35afac050f4d5849839a06e8d3b
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Sep 22 11:46:29 2019 -0400

     Make some efficiency improvements in LISTEN/NOTIFY.

     Move the responsibility for advancing the NOTIFY queue tail pointer
     from the listener(s) to the notification sender, and only have the
     sender do it once every few queue pages, rather than after every batch
     of notifications as at present.  This reduces the number of times we
     execute asyncQueueAdvanceTail, and reduces contention when there are
     multiple listeners (since that function requires exclusive lock).
     This change relies on the observation that we don't really need the 
tail
     pointer to be exactly up-to-date.  It's certainly not necessary to
     attempt to release disk space more often than once per SLRU segment.
     The only other usage of the tail pointer is that an incoming listener,
     if it's the only listener in its database, will need to scan the queue
     forward from the tail; but that's surely a less performance-critical
     path than routine sending and receiving of notifies.  We compromise by
     advancing the tail pointer after every 4 pages of output, so that it
     shouldn't get more than a few pages behind.

     Also, when sending signals to other backends after adding notify
     message(s) to the queue, recognize that only backends in our own
     database are going to care about those messages, so only such
     backends really need to be awakened promptly.  Backends in other
     databases should get kicked if they're well behind on reading the
     queue, else they'll hold back the global tail pointer; but wakening
     them for every single message is pointless.  This change can
     substantially reduce signal traffic if listeners are spread among
     many databases.  It won't help for the common case of only a single
     active database, but the extra check costs very little.

     Martijn van Oosterhout, with some adjustments by me

     Discussion: 
https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
     Discussion: 
https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com


-- 
Mark Dilger



Re: Assertion failing in master, predicate.c

From
Tom Lane
Date:
Mark Dilger <hornschnorter@gmail.com> writes:
> On 11/21/19 8:03 PM, Tom Lane wrote:
>> I also confirm that it only happens in HEAD, not v12.  I've not
>> actually bisected, but a look at the git history for predicate.c
>> sure makes it look like db2687d1f ("Optimize PredicateLockTuple")
>> must be to blame.

> `git bisect` shows the problem occurs earlier than that, and by
> chance the first bad commit was one of yours.  I'm not surprised
> that your commit was regarding LISTEN/NOTIFY, as the error is
> always triggered with a LISTEN statement.  (I've now hit this
> many times in many tests of multiple SQL statements, and the
> last statement before the error is always a LISTEN.)

Oh my, that's interesting!  I had wondered a bit about the LISTEN
changes, but it's hard to see how those could have any connection
to serializable mode.  This will be an entertaining debugging
exercise ...

            regards, tom lane



Re: Assertion failing in master, predicate.c

From
Mark Dilger
Date:

On 11/22/19 11:07 AM, Tom Lane wrote:
> Mark Dilger <hornschnorter@gmail.com> writes:
>> On 11/21/19 8:03 PM, Tom Lane wrote:
>>> I also confirm that it only happens in HEAD, not v12.  I've not
>>> actually bisected, but a look at the git history for predicate.c
>>> sure makes it look like db2687d1f ("Optimize PredicateLockTuple")
>>> must be to blame.
> 
>> `git bisect` shows the problem occurs earlier than that, and by
>> chance the first bad commit was one of yours.  I'm not surprised
>> that your commit was regarding LISTEN/NOTIFY, as the error is
>> always triggered with a LISTEN statement.  (I've now hit this
>> many times in many tests of multiple SQL statements, and the
>> last statement before the error is always a LISTEN.)
> 
> Oh my, that's interesting!  I had wondered a bit about the LISTEN
> changes, but it's hard to see how those could have any connection
> to serializable mode.  This will be an entertaining debugging
> exercise ...

predicate.c was changed a few times after REL_12_STABLE was
branched from master but before Thomas's change that you
initially thought might be to blame.  I checked whether
rolling back the changes in predicate.c while keeping your
LISTEN/NOTIFY changes might fix the bug, but alas the bug
is still present.

I'll go familiarize myself with your LISTEN/NOTIFY changes
now....

-- 
Mark Dilger



Re: Assertion failing in master, predicate.c

From
Mark Dilger
Date:

On 11/22/19 11:22 AM, Mark Dilger wrote:
> predicate.c was changed a few times after REL_12_STABLE was
> branched from master but before Thomas's change that you
> initially thought might be to blame.  I checked whether
> rolling back the changes in predicate.c while keeping your
> LISTEN/NOTIFY changes might fix the bug, but alas the bug
> is still present.

On closer inspection, those changes were merely cosmetic
changes to code comments.  It is no surprise rolling those
back made no difference.

-- 
Mark Dilger



Re: Assertion failing in master, predicate.c

From
Tom Lane
Date:
I wrote:
> Mark Dilger <hornschnorter@gmail.com> writes:
>> `git bisect` shows the problem occurs earlier than that, and by
>> chance the first bad commit was one of yours.  I'm not surprised
>> that your commit was regarding LISTEN/NOTIFY, as the error is
>> always triggered with a LISTEN statement.  (I've now hit this
>> many times in many tests of multiple SQL statements, and the
>> last statement before the error is always a LISTEN.)

> Oh my, that's interesting!  I had wondered a bit about the LISTEN
> changes, but it's hard to see how those could have any connection
> to serializable mode.  This will be an entertaining debugging
> exercise ...

It looks to me like this is an ancient bug that just happened to be
made more probable by 51004c717.  That Assert in predicate.c is
basically firing because MySerializableXact got created *after*
PreCommit_CheckForSerializationFailure, which is what should have
marked it as prepared.  And that will happen, if we're in serializable
mode and this is the first LISTEN of the session, because
CommitTransaction() calls PreCommit_Notify after it calls
PreCommit_CheckForSerializationFailure, and PreCommit_Notify calls
asyncQueueReadAllNotifications which wants to get a snapshot, and
the transaction had no snapshot before.

The only reason it's showing up now is that actually the logic is

    if (!QUEUE_POS_EQUAL(max, head))
        asyncQueueReadAllNotifications();

that is, we'll skip the problematic call if the notify queue is
visibly empty.  But 51004c717 changed how aggressively we move
the queue tail forward, so that in this simple example we will
now see the queue as possibly not empty, where we would have
decided it was empty before.

Of course, the bug exists anyway, because concurrent NOTIFY traffic
could certainly cause the queue to be nonempty at this point.
I venture that the only reason we've not seen field reports of
this issue is that people don't run with asserts on in production
(and, I guess, the problem is actually harmless except for the
Assert).  Or maybe people don't use serializable mode in apps
that use LISTEN/NOTIFY?

Anyway, it seems like the simplest fix is to swap the order of
the PreCommit_CheckForSerializationFailure and PreCommit_Notify
steps in CommitTransaction.  There's also PrepareTransaction
to think about, but there again we could just move AtPrepare_Notify
up; it's only going to throw an error anyway, so we might as well
do that sooner.

An alternative idea is to use some other way of getting a snapshot
in asyncQueueReadAllNotifications, one that always gets a current
snapshot and doesn't enter predicate.c.  But that might have semantic
consequences on the timing of notifications.  I'm not really sure
that anybody's ever thought hard about how async.c ought to act
in serializable mode, so this might or might not be a good change.

            regards, tom lane



Re: Assertion failing in master, predicate.c

From
Mark Dilger
Date:

On 11/22/19 3:25 PM, Tom Lane wrote:
> I wrote:
>> Mark Dilger <hornschnorter@gmail.com> writes:
>>> `git bisect` shows the problem occurs earlier than that, and by
>>> chance the first bad commit was one of yours.  I'm not surprised
>>> that your commit was regarding LISTEN/NOTIFY, as the error is
>>> always triggered with a LISTEN statement.  (I've now hit this
>>> many times in many tests of multiple SQL statements, and the
>>> last statement before the error is always a LISTEN.)
> 
>> Oh my, that's interesting!  I had wondered a bit about the LISTEN
>> changes, but it's hard to see how those could have any connection
>> to serializable mode.  This will be an entertaining debugging
>> exercise ...
> 
> It looks to me like this is an ancient bug that just happened to be
> made more probable by 51004c717.  That Assert in predicate.c is
> basically firing because MySerializableXact got created *after*
> PreCommit_CheckForSerializationFailure, which is what should have
> marked it as prepared.  And that will happen, if we're in serializable
> mode and this is the first LISTEN of the session, because
> CommitTransaction() calls PreCommit_Notify after it calls
> PreCommit_CheckForSerializationFailure, and PreCommit_Notify calls
> asyncQueueReadAllNotifications which wants to get a snapshot, and
> the transaction had no snapshot before.
> 
> The only reason it's showing up now is that actually the logic is
> 
>      if (!QUEUE_POS_EQUAL(max, head))
>          asyncQueueReadAllNotifications();
> 
> that is, we'll skip the problematic call if the notify queue is
> visibly empty.  But 51004c717 changed how aggressively we move
> the queue tail forward, so that in this simple example we will
> now see the queue as possibly not empty, where we would have
> decided it was empty before.

Right, I've been staring at that code for the last couple hours,
trying to see a problem with it.  I tried making the code a bit
more aggressive about moving the tail forward to see if that
would help, but the only fix that worked was completely reverting
yours and Martijn's commit.  It makes sense now.

> Of course, the bug exists anyway, because concurrent NOTIFY traffic
> could certainly cause the queue to be nonempty at this point.
> I venture that the only reason we've not seen field reports of
> this issue is that people don't run with asserts on in production
> (and, I guess, the problem is actually harmless except for the
> Assert).  Or maybe people don't use serializable mode in apps
> that use LISTEN/NOTIFY?
> 
> Anyway, it seems like the simplest fix is to swap the order of
> the PreCommit_CheckForSerializationFailure and PreCommit_Notify
> steps in CommitTransaction.  There's also PrepareTransaction
> to think about, but there again we could just move AtPrepare_Notify
> up; it's only going to throw an error anyway, so we might as well
> do that sooner.

I changed PrepareTransaction and CommitTransaction in the manner
you suggest, and the tests pass now.  I have not yet looked over
all the other possible implications of this change, so I'll go
do that for a while.

> 
> An alternative idea is to use some other way of getting a snapshot
> in asyncQueueReadAllNotifications, one that always gets a current
> snapshot and doesn't enter predicate.c.  But that might have semantic
> consequences on the timing of notifications.  I'm not really sure
> that anybody's ever thought hard about how async.c ought to act
> in serializable mode, so this might or might not be a good change.

The semantics of receiving a notification in serializable mode are
not clear, unless you just insist on not receiving any.  The whole
point of serializable mode, as I understand it, it to be given the
impression that all your work happens either before or after other
transactions' work.  It hardly makes sense to receive a notification
mid transaction informing you of some other transaction having just
changed something.

I don't propose any changes to this, though, since it may break
existing applications.  I prefer the simplicity of your suggestion
above.

-- 
Mark Dilger



Re: Assertion failing in master, predicate.c

From
Tom Lane
Date:
Mark Dilger <hornschnorter@gmail.com> writes:
> On 11/22/19 3:25 PM, Tom Lane wrote:
>> An alternative idea is to use some other way of getting a snapshot
>> in asyncQueueReadAllNotifications, one that always gets a current
>> snapshot and doesn't enter predicate.c.  But that might have semantic
>> consequences on the timing of notifications.  I'm not really sure
>> that anybody's ever thought hard about how async.c ought to act
>> in serializable mode, so this might or might not be a good change.

> The semantics of receiving a notification in serializable mode are
> not clear, unless you just insist on not receiving any.  The whole
> point of serializable mode, as I understand it, it to be given the
> impression that all your work happens either before or after other
> transactions' work.  It hardly makes sense to receive a notification
> mid transaction informing you of some other transaction having just
> changed something.

Well, you don't: notifications are only sent to the client between
transactions.  After sleeping on it I have these thoughts:

* The other two callers of asyncQueueReadAllNotifications have just
started fresh transactions, so they have no issue.  Regardless of
the session isolation level, they'll be reading the queue with a
freshly-taken snapsnot.

* The point of calling asyncQueueReadAllNotifications in
Exec_ListenPreCommit is to advance over already-committed queue entries
before we start sending any events to the client.  Without this, a
newly-listening client could be sent some very stale events.  (Note
that 51004c717 changed this from "somewhat stale" to "very stale".
I had thought briefly about whether we could fix the problem by just
removing this call of asyncQueueReadAllNotifications, but I do not
think people would find that side-effect acceptable.)

* Given that the idea is to ignore already-committed entries, it makes
sense that if LISTEN is called inside a serializable transaction block
then the cutoff ought to be the transaction's snapshot.  Otherwise we'd
be dropping notifications from transactions that the calling session
can't have seen the effects of.  That defeats the whole point.

* This says that not only is it okay to use a serializable snapshot
as the reference, but we *should* do so; that is, it's actually wrong
to use GetLatestSnapshot here, we should use GetTransactionSnapshot.
There's not going to be any real difference in read-committed mode,
but in repeatable-read or serializable mode we are risking dropping
events that it'd be better to send to the client.

I'm disinclined to make such a change in the back branches, but it'd
be reasonable to do so in HEAD.

Meanwhile, as far as fixing the assertion failure goes, I don't see
any alternative except to rearrange the order of operations during
commit.

            regards, tom lane



Re: Assertion failing in master, predicate.c

From
Tom Lane
Date:
I wrote:
> * Given that the idea is to ignore already-committed entries, it makes
> sense that if LISTEN is called inside a serializable transaction block
> then the cutoff ought to be the transaction's snapshot.  Otherwise we'd
> be dropping notifications from transactions that the calling session
> can't have seen the effects of.  That defeats the whole point.

> * This says that not only is it okay to use a serializable snapshot
> as the reference, but we *should* do so; that is, it's actually wrong
> to use GetLatestSnapshot here, we should use GetTransactionSnapshot.
> There's not going to be any real difference in read-committed mode,
> but in repeatable-read or serializable mode we are risking dropping
> events that it'd be better to send to the client.

> I'm disinclined to make such a change in the back branches, but it'd
> be reasonable to do so in HEAD.

I spent some time working on this, but then realized that the idea
has a fatal problem.  We cannot guarantee receipt of all notifications
since the transaction snapshot, because if our session isn't yet
listening, there's nothing to stop other transactions from trimming
away notify queue entries as soon as all the already-listening sessions
have read them.

One could imagine changing the queue-trimming rules to avoid this,
but I think it's pointless.  The right way to use LISTEN is to be
sure you commit it before inspecting database state, and that's
independent of isolation level.

I'd written some documentation and comment changes around this,
claiming falsely that Repeatable Read or Serializable isolation
would now let you make more assumptions about the timing of the
first received notification.  I'll get rid of those claims and
just commit the docs changes --- it seems worthwhile to clarify
what is and isn't safe use of LISTEN.  But the code should be
left as-is, I'm thinking.

            regards, tom lane