Thread: uninterruptable loop: concurrent delete in progress within table

uninterruptable loop: concurrent delete in progress within table

From
Sandro Santilli
Date:
The attached script shows a plpgsql function that enters
an infinite loop which is not interrupted by statement_timeout.

Looping message is:

 WARNING:  concurrent delete in progress within table "crash9", xid is 4458893, self is 4458894/4458889
 CONTEXT:  SQL statement "CREATE INDEX ON crash9 USING GIST ( the_geom)"
 PL/pgSQL function crash(regclass) line 148 at EXECUTE statement

Note that the xid and self parts of the WARNING message were added by myself
on request by "andres" (on the freenode IRC channel).

I could reproduce with both 9.3.0 and 9.3.4 (current 9.3 stable branch).

Sounds like a PostgreSQL bug to me, what do you think ?
Is there anything I can do from the plpgsql function to workaround it ?

--strk;

 ()  ASCII ribbon campaign  --  Keep it simple !
 /\  http://strk.keybit.net/rants/ascii_mails.txt

Re: uninterruptable loop: concurrent delete in progress within table

From
Alvaro Herrera
Date:
Sandro Santilli wrote:
> The attached script shows a plpgsql function that enters
> an infinite loop which is not interrupted by statement_timeout.
>
> Looping message is:
>
>  WARNING:  concurrent delete in progress within table "crash9", xid is 4458893, self is 4458894/4458889
>  CONTEXT:  SQL statement "CREATE INDEX ON crash9 USING GIST ( the_geom)"
>  PL/pgSQL function crash(regclass) line 148 at EXECUTE statement
>
> Note that the xid and self parts of the WARNING message were added by myself
> on request by "andres" (on the freenode IRC channel).
>
> I could reproduce with both 9.3.0 and 9.3.4 (current 9.3 stable branch).
>
> Sounds like a PostgreSQL bug to me, what do you think ?

Yes, probably, can you share the function and the WARNING patch you
mention above?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
Hi,

On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> The attached script shows a plpgsql function that enters
> an infinite loop which is not interrupted by statement_timeout.

There's nothing attached... :)

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Sandro Santilli
Date:
On Fri, May 30, 2014 at 04:45:35PM +0200, Andres Freund wrote:
> Hi,
>
> On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> > The attached script shows a plpgsql function that enters
> > an infinite loop which is not interrupted by statement_timeout.
>
> There's nothing attached... :)

Oops, sorry.
I've sent (and will send) on request to interested parties.

Besides, I just tried against 9.1.13 and it fails there too
(this time officially packaged by ubuntu)

--strk;

Re: uninterruptable loop: concurrent delete in progress within table

From
Sandro Santilli
Date:
On Fri, May 30, 2014 at 04:59:10PM +0200, Sandro Santilli wrote:
> On Fri, May 30, 2014 at 04:45:35PM +0200, Andres Freund wrote:
> > Hi,
> >
> > On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> > > The attached script shows a plpgsql function that enters
> > > an infinite loop which is not interrupted by statement_timeout.
> >
> > There's nothing attached... :)
>
> Oops, sorry.
> I've sent (and will send) on request to interested parties.

Actually, I further reduced it to not rely on PostGIS
and it is attached here.

--strk;

 ()  ASCII ribbon campaign  --  Keep it simple !
 /\  http://strk.keybit.net/rants/ascii_mails.txt

Attachment

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-05-30 17:28:03 +0200, Sandro Santilli wrote:
> On Fri, May 30, 2014 at 04:59:10PM +0200, Sandro Santilli wrote:
> > On Fri, May 30, 2014 at 04:45:35PM +0200, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> > > > The attached script shows a plpgsql function that enters
> > > > an infinite loop which is not interrupted by statement_timeout.
> > >
> > > There's nothing attached... :)
> >
> > Oops, sorry.
> > I've sent (and will send) on request to interested parties.
>
> Actually, I further reduced it to not rely on PostGIS
> and it is attached here.

Thanks. I've reproduced it and I am looking at it right now.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-05-30 17:33:22 +0200, Andres Freund wrote:
> > Actually, I further reduced it to not rely on PostGIS
> > and it is attached here.
>
> Thanks. I've reproduced it and I am looking at it right now.

The problem is that HeapTupleSatisfiesVacuum() doesn't really check xmax
in the TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple))
case. That seems to be a longstanding omission.
Thus the case where xmin = current xid, xmax = aborted subtransaction of
the current backend isn't handled correctly an
HEAPTUPLE_DELETE_IN_PROGRESS is returned. That in turn leads to an
endless goto loop in IndexBuildHeapScan() because waiting for an aborted
xact doesn't do much good.

That itself is obviously easy to fix, but I wonder whether there aren't
other such omissions in HeapTupleSatisfiesVacuum() - from the looks of
it it wasn't originally written to support the case where the current
xact performed changes itself. Not generally unreasonable, but long
since obsolete.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
Hi,

On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> The attached script shows a plpgsql function that enters
> an infinite loop which is not interrupted by statement_timeout.

>  WARNING:  concurrent delete in progress within table "crash9", xid is 4458893, self is 4458894/4458889
>  CONTEXT:  SQL statement "CREATE INDEX ON crash9 USING GIST ( the_geom)"
>  PL/pgSQL function crash(regclass) line 148 at EXECUTE statement

I've attached a patch including an explanatory commit message. It
includes a regression test that fails with an endless loop on all
supported releases before the fix is applied.

Tom, Alvaro, All: Since it needs to fix HeapTupleSatisfiesVacuum() and
those routines are tricky I'd very much welcome a look over the
changes. As e.g. evidenced that exactly the buggy lines have been
whacked around repeatedly a long time ago.
Especially the slight behavioural change of HTSV deserves some review.

I do wonder if any of the other existing callers of HTSV are affected. I
don't understand predicate.c well enough to be sure, but it looks to me
like it'd could in theory lead to missed conflicts. Seems fairly
unlikely to matter in practice though.

I have to say I really hate the amount of repetitive code inside the
individual visibility routines. Obviously it's nothing backpatchable and
may become moot to a certain degree with the CSN work, but those are
pretty close to being unmaintainable.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: uninterruptable loop: concurrent delete in progress within table

From
Alvaro Herrera
Date:
Andres Freund wrote:

> I do wonder if any of the other existing callers of HTSV are affected. I
> don't understand predicate.c well enough to be sure, but it looks to me
> like it'd could in theory lead to missed conflicts. Seems fairly
> unlikely to matter in practice though.

One place where the difference in the new logic would cause a change is
the tupleIsAlive bit in IndexBuildHeapScan(), when building unique
indexes.  Right now if the tuple is inserted by a remote running
transaction, and updated by it, we return DELETE_IN_PROGRESS, so we set
tupleIsAlive = false; so we wouldn't block if we see a tuple with that
value elsewhere.   If we instead return INSERT_IN_PROGRESS we set
tupleIsAlive = true, and we would block until the inserter is done.

I think it'd be better to ensure that DELETE_IN_PROGRESS is returned as
appropriate in the new XidIsInProgress(xmin) block.

> I have to say I really hate the amount of repetitive code inside the
> individual visibility routines. Obviously it's nothing backpatchable and
> may become moot to a certain degree with the CSN work, but those are
> pretty close to being unmaintainable.

One thing I'm now wondering while reading this code is those cases where
XMAX_INVALID is not set, but the value in Xmax is InvalidXid.  We have
discussed these scenarios elsewhere and the conclusion seems to be that
they are rare but possible and therefore we should cater for them.
(IIRC the scenario is a tuple that is frozen without a full-page write
and no checksums; if we crash before writing the buffer, we would
recover the invalid value in Xmax but the hint bit would not become set.
I think this is not possible anymore after we tweaked the freezing
protocol.)

So instead of a check like
    if (tuple->t_infomask & HEAP_XMAX_INVALID)
we would have something like
    if (HeapTupleHeaderXmaxIsInvalid(tuple))
which checks both things -- and is easier to read, IMHO.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-06-02 12:48:01 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
>
> > I do wonder if any of the other existing callers of HTSV are affected. I
> > don't understand predicate.c well enough to be sure, but it looks to me
> > like it'd could in theory lead to missed conflicts. Seems fairly
> > unlikely to matter in practice though.
>
> One place where the difference in the new logic would cause a change is
> the tupleIsAlive bit in IndexBuildHeapScan(), when building unique
> indexes.  Right now if the tuple is inserted by a remote running
> transaction, and updated by it, we return DELETE_IN_PROGRESS, so we set
> tupleIsAlive = false; so we wouldn't block if we see a tuple with that
> value elsewhere.   If we instead return INSERT_IN_PROGRESS we set
> tupleIsAlive = true, and we would block until the inserter is done.

Hm. Maybe I am missing something, but if either INSERT or
DELETE_IN_PROGRESS is returned for a unique index in
IndexBuildHeapScan() we'll wait for the xmin/xmax respectively and
recheck the tuple, right? That recheck will then return a non-ephemeral
status.
The only change in the case you describe that I can see is whether we
loop once or twice?

> I think it'd be better to ensure that DELETE_IN_PROGRESS is returned as
> appropriate in the new XidIsInProgress(xmin) block.

That's still a valid position.

> > I have to say I really hate the amount of repetitive code inside the
> > individual visibility routines. Obviously it's nothing backpatchable and
> > may become moot to a certain degree with the CSN work, but those are
> > pretty close to being unmaintainable.
>
> One thing I'm now wondering while reading this code is those cases where
> XMAX_INVALID is not set, but the value in Xmax is InvalidXid.  We have
> discussed these scenarios elsewhere and the conclusion seems to be that
> they are rare but possible and therefore we should cater for them.

Is there really a situation where that combination is possible for
XMAX_INVALID? I don't see how.

Are you possibly thinking of the case where 9.3 assumed wrongly for a
while that a set XMAX_INVALID would be sufficient for freezing?

> (IIRC the scenario is a tuple that is frozen without a full-page write
> and no checksums; if we crash before writing the buffer, we would
> recover the invalid value in Xmax but the hint bit would not become set.
> I think this is not possible anymore after we tweaked the freezing
> protocol.)

I don't think that could have happened before either. After the crash
historically we'd have gone through the same freezing routine. That'd
have either left the whole thing intact (if xmin went backwards) or it'd
have set HEAP_XMAX_INVALID together with setting InvalidTransactionId as
xmax.

> So instead of a check like
>     if (tuple->t_infomask & HEAP_XMAX_INVALID)
> we would have something like
>     if (HeapTupleHeaderXmaxIsInvalid(tuple))
> which checks both things -- and is easier to read, IMHO.

Note that that bit of code isn't new. It just has moved a tiny
bit. Similar tests are splattered all over tqual.c - so I have a hard
time thinking this could be broken.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2014-06-02 12:48:01 -0400, Alvaro Herrera wrote:
> > Andres Freund wrote:
> >
> > > I do wonder if any of the other existing callers of HTSV are affected. I
> > > don't understand predicate.c well enough to be sure, but it looks to me
> > > like it'd could in theory lead to missed conflicts. Seems fairly
> > > unlikely to matter in practice though.
> >
> > One place where the difference in the new logic would cause a change is
> > the tupleIsAlive bit in IndexBuildHeapScan(), when building unique
> > indexes.  Right now if the tuple is inserted by a remote running
> > transaction, and updated by it, we return DELETE_IN_PROGRESS, so we set
> > tupleIsAlive = false; so we wouldn't block if we see a tuple with that
> > value elsewhere.   If we instead return INSERT_IN_PROGRESS we set
> > tupleIsAlive = true, and we would block until the inserter is done.
>
> Hm. Maybe I am missing something, but if either INSERT or
> DELETE_IN_PROGRESS is returned for a unique index in
> IndexBuildHeapScan() we'll wait for the xmin/xmax respectively and
> recheck the tuple, right? That recheck will then return a non-ephemeral
> status.

Yeah, that's how I interpret that maze of little loops using callbacks.
I don't think waiting on xmin is the same as waiting on xmax, however;
the xmin could stay running for a while, but the xmax could be an
quickly aborted subtransaction, for instance.

> > > I have to say I really hate the amount of repetitive code inside the
> > > individual visibility routines. Obviously it's nothing backpatchable and
> > > may become moot to a certain degree with the CSN work, but those are
> > > pretty close to being unmaintainable.
> >
> > One thing I'm now wondering while reading this code is those cases where
> > XMAX_INVALID is not set, but the value in Xmax is InvalidXid.  We have
> > discussed these scenarios elsewhere and the conclusion seems to be that
> > they are rare but possible and therefore we should cater for them.
>
> Is there really a situation where that combination is possible for
> XMAX_INVALID? I don't see how.

I vaguely remember Robert described how would this work, even though the
precise details evidently escape me because I can't quite trace it now.
I can't remember what thread this was on, either.

> Are you possibly thinking of the case where 9.3 assumed wrongly for a
> while that a set XMAX_INVALID would be sufficient for freezing?

Nope, that's not it IIRC.  It wasn't a bug but valid a corner case.  I
think it involved a crash.

> > So instead of a check like
> >     if (tuple->t_infomask & HEAP_XMAX_INVALID)
> > we would have something like
> >     if (HeapTupleHeaderXmaxIsInvalid(tuple))
> > which checks both things -- and is easier to read, IMHO.
>
> Note that that bit of code isn't new. It just has moved a tiny
> bit. Similar tests are splattered all over tqual.c - so I have a hard
> time thinking this could be broken.

Yeah, I'm not saying this is a new bug; I was only reacting to your
suggestion that we dump all of tqual.c and rewrite it from scratch
(heh).  Also, the corner case that produced this exact scenario is very
low probability IIRC, so even if it's broken today, it might not be all
that obvious that this is the cause.

I agree that the visibility routines are pretty hard to follow; and
evidently given the number of times we've messed up, it's hard to get
all the little corner cases right.  I don't know what a more
maintainable version of them would look like, however, particularly
keeping in mind that they are performance-critical.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-06-02 13:27:17 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2014-06-02 12:48:01 -0400, Alvaro Herrera wrote:
> > > Andres Freund wrote:
> > >
> > > > I do wonder if any of the other existing callers of HTSV are affected. I
> > > > don't understand predicate.c well enough to be sure, but it looks to me
> > > > like it'd could in theory lead to missed conflicts. Seems fairly
> > > > unlikely to matter in practice though.
> > >
> > > One place where the difference in the new logic would cause a change is
> > > the tupleIsAlive bit in IndexBuildHeapScan(), when building unique
> > > indexes.  Right now if the tuple is inserted by a remote running
> > > transaction, and updated by it, we return DELETE_IN_PROGRESS, so we set
> > > tupleIsAlive = false; so we wouldn't block if we see a tuple with that
> > > value elsewhere.   If we instead return INSERT_IN_PROGRESS we set
> > > tupleIsAlive = true, and we would block until the inserter is done.
> >
> > Hm. Maybe I am missing something, but if either INSERT or
> > DELETE_IN_PROGRESS is returned for a unique index in
> > IndexBuildHeapScan() we'll wait for the xmin/xmax respectively and
> > recheck the tuple, right? That recheck will then return a non-ephemeral
> > status.
>
> Yeah, that's how I interpret that maze of little loops using callbacks.
> I don't think waiting on xmin is the same as waiting on xmax, however;
> the xmin could stay running for a while, but the xmax could be an
> quickly aborted subtransaction, for instance.

It essentially is. If xmax aborts you still have to deal with an
'INSERT_IN_PROGRESS' tuple because, as you say, xmin is still
running. That's essentially my point. INSERT_IN_PROGRESS isn't a wrong
answer.

> > > One thing I'm now wondering while reading this code is those cases where
> > > XMAX_INVALID is not set, but the value in Xmax is InvalidXid.  We have
> > > discussed these scenarios elsewhere and the conclusion seems to be that
> > > they are rare but possible and therefore we should cater for them.
> >
> > Is there really a situation where that combination is possible for
> > XMAX_INVALID? I don't see how.
>
> I vaguely remember Robert described how would this work, even though the
> precise details evidently escape me because I can't quite trace it now.
> I can't remember what thread this was on, either.

Robert: Do you remember that case?

Alvaro: In the end it'd not be very harmful - if it happens
TransactionIdDidCommit() will return false (there's special case code
for it).

> Yeah, I'm not saying this is a new bug; I was only reacting to your
> suggestion that we dump all of tqual.c and rewrite it from scratch
> (heh).

Heh. Not going to start with that today. Especially not in a patch that
needs to be backpatched :)

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2014-06-02 13:27:17 -0400, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2014-06-02 12:48:01 -0400, Alvaro Herrera wrote:
> > > > Andres Freund wrote:
> > > >
> > > > > I do wonder if any of the other existing callers of HTSV are affected. I
> > > > > don't understand predicate.c well enough to be sure, but it looks to me
> > > > > like it'd could in theory lead to missed conflicts. Seems fairly
> > > > > unlikely to matter in practice though.
> > > >
> > > > One place where the difference in the new logic would cause a change is
> > > > the tupleIsAlive bit in IndexBuildHeapScan(), when building unique
> > > > indexes.  Right now if the tuple is inserted by a remote running
> > > > transaction, and updated by it, we return DELETE_IN_PROGRESS, so we set
> > > > tupleIsAlive = false; so we wouldn't block if we see a tuple with that
> > > > value elsewhere.   If we instead return INSERT_IN_PROGRESS we set
> > > > tupleIsAlive = true, and we would block until the inserter is done.
> > >
> > > Hm. Maybe I am missing something, but if either INSERT or
> > > DELETE_IN_PROGRESS is returned for a unique index in
> > > IndexBuildHeapScan() we'll wait for the xmin/xmax respectively and
> > > recheck the tuple, right? That recheck will then return a non-ephemeral
> > > status.
> >
> > Yeah, that's how I interpret that maze of little loops using callbacks.
> > I don't think waiting on xmin is the same as waiting on xmax, however;
> > the xmin could stay running for a while, but the xmax could be an
> > quickly aborted subtransaction, for instance.
>
> It essentially is. If xmax aborts you still have to deal with an
> 'INSERT_IN_PROGRESS' tuple because, as you say, xmin is still
> running. That's essentially my point. INSERT_IN_PROGRESS isn't a wrong
> answer.

Uh.  Actually it strikes me that DELETE_IN_PROGRESS is a wrong answer in
the XidIsInProgress(xmin) block anyhow.  If the tuple is being inserted
remotely, there is no way we can return DELETE_IN_PROGRESS, is there?
We can never be sure that the deleting subxact is not going to abort.
The only way for this transaction to say that the tuple is
DELETE_IN_PROGRESS is if the *inserting* subtransaction is an aborted
subxact; but we already tell that case apart.  So DELETE_IN_PROGRESS is
only a valid return in the new block, XidIsCurrentXact(xmin).


Another thing I wanted to comment on originally is that I'm not sure
about the CHECK_FOR_INTERRUPTS() additions.  Aren't those only to serve
the case of the current bug, or bugs of similar ilk?  There is no way
that a valid, non-bug situation is going to cause us to loop forever
there.  That said, there isn't much harm in having them there either,
but I think I'd put them before the xact wait, not after.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-06-02 14:39:22 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
> > It essentially is. If xmax aborts you still have to deal with an
> > 'INSERT_IN_PROGRESS' tuple because, as you say, xmin is still
> > running. That's essentially my point. INSERT_IN_PROGRESS isn't a wrong
> > answer.
>
> Uh.  Actually it strikes me that DELETE_IN_PROGRESS is a wrong answer in
> the XidIsInProgress(xmin) block anyhow.  If the tuple is being inserted
> remotely, there is no way we can return DELETE_IN_PROGRESS, is there?

I don't think it'd break stuff if we did it in the !IsCurrent()
case. But yes, that's why I made the TransactionIdIsInProgress(xmin)
block unconditionally return INSERT_IN_PROGRESS.

> Another thing I wanted to comment on originally is that I'm not sure
> about the CHECK_FOR_INTERRUPTS() additions.  Aren't those only to serve
> the case of the current bug, or bugs of similar ilk?  There is no way
> that a valid, non-bug situation is going to cause us to loop forever
> there.

I think there's other possible cases where it loop for a long
while. Think e.g. of a row that's updated under contention by several
backends or such.

> That said, there isn't much harm in having them there either,
> but I think I'd put them before the xact wait, not after.

XactLockTableWait() internally catches waits if they happen while
waiting, so that's covered. I am mainly concerned with xmax changing
frequently...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Robert Haas
Date:
On Mon, Jun 2, 2014 at 1:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Robert: Do you remember that case?
>
> Alvaro: In the end it'd not be very harmful - if it happens
> TransactionIdDidCommit() will return false (there's special case code
> for it).

Not specifically, but I'd be surprised if it isn't possible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
Hi Sandro,

On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> The attached script shows a plpgsql function that enters
> an infinite loop which is not interrupted by statement_timeout.

Pushed a fix. Thanks for the report!

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-06-04 19:03:15 -0400, Robert Haas wrote:
> On Mon, Jun 2, 2014 at 1:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Robert: Do you remember that case?
> >
> > Alvaro: In the end it'd not be very harmful - if it happens
> > TransactionIdDidCommit() will return false (there's special case code
> > for it).
>
> Not specifically, but I'd be surprised if it isn't possible.

Wouldn't that mean that every single visibility routine in tqual.c is
buggy?

I am not convinced there aren't further bugs in some corner cases in
tqual.c. But even a low likelihood scenarious of
xmax = InvalidTransactionId && (infomask & XMAX_INVALID) == 0
would have become visible by now given how widespread/central those
tests are?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Robert Haas
Date:
On Wed, Jun 4, 2014 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-04 19:03:15 -0400, Robert Haas wrote:
>> On Mon, Jun 2, 2014 at 1:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > Robert: Do you remember that case?
>> >
>> > Alvaro: In the end it'd not be very harmful - if it happens
>> > TransactionIdDidCommit() will return false (there's special case code
>> > for it).
>>
>> Not specifically, but I'd be surprised if it isn't possible.
>
> Wouldn't that mean that every single visibility routine in tqual.c is
> buggy?
>
> I am not convinced there aren't further bugs in some corner cases in
> tqual.c. But even a low likelihood scenarious of
> xmax = InvalidTransactionId && (infomask & XMAX_INVALID) == 0
> would have become visible by now given how widespread/central those
> tests are?

What specifically do you think will break?  For example, in
HeapTupleSatisfiesMVCC(), if no xmax-related hint bits are set, it
looks to me like we'll end up here:

        if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
        {
            /* it must have aborted or crashed */
            SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
                        InvalidTransactionId);
            return true;
        }

Well, TransactionIdDidCommit() will call TransactionLogFetch() which
has a special case coding to handle non-normal transaction IDs, which
for InvalidTransactionId will return TRANSACTION_STATUS_ABORTED.  Now
maybe that is dead code and always has been, but that wouldn't be my
bet going into it: I bet that got there for a reason.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: uninterruptable loop: concurrent delete in progress within table

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> What specifically do you think will break?  For example, in
> HeapTupleSatisfiesMVCC(), if no xmax-related hint bits are set, it
> looks to me like we'll end up here:

>         if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
>         {
>             /* it must have aborted or crashed */
>             SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
>                         InvalidTransactionId);
>             return true;
>         }

> Well, TransactionIdDidCommit() will call TransactionLogFetch() which
> has a special case coding to handle non-normal transaction IDs, which
> for InvalidTransactionId will return TRANSACTION_STATUS_ABORTED.  Now
> maybe that is dead code and always has been, but that wouldn't be my
> bet going into it: I bet that got there for a reason.

No, the code to return that for InvalidTransactionId is just there because
it has to do *something*, and returning status COMMITTED is clearly not
too sane.  I don't believe there was ever any expectation that xmax equal
to InvalidTransactionId would not have the XMAX_INVALID hint bit set.
If anyone had thought about that case, they'd probably have chosen to
put in a should-not-happen error for it, because it strongly suggests
you're looking at corrupted data.

            regards, tom lane

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-06-04 19:27:54 -0400, Robert Haas wrote:
> On Wed, Jun 4, 2014 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-06-04 19:03:15 -0400, Robert Haas wrote:
> >> On Mon, Jun 2, 2014 at 1:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> > Robert: Do you remember that case?
> >> >
> >> > Alvaro: In the end it'd not be very harmful - if it happens
> >> > TransactionIdDidCommit() will return false (there's special case code
> >> > for it).
> >>
> >> Not specifically, but I'd be surprised if it isn't possible.
> >
> > Wouldn't that mean that every single visibility routine in tqual.c is
> > buggy?
> >
> > I am not convinced there aren't further bugs in some corner cases in
> > tqual.c. But even a low likelihood scenarious of
> > xmax = InvalidTransactionId && (infomask & XMAX_INVALID) == 0
> > would have become visible by now given how widespread/central those
> > tests are?
>
> What specifically do you think will break?

If XMAX_INVALID may accidentally not be set even though there's no xmax,
other xmax flags could be bogus as well. Afaics there's no difference in
the handling between them.
And then tests for LOCKED_ONLY, IS_MULTI and such would be dodgy.

The only things that - afaics - are allowed to to change the actual xmax
transactionid are:
* freezing
* heap_(update|delete|lock)_tuple

And of those only freezing will unset xmax to InvalidTransactionId. But
it re-sets XMAX_INVALID (and clears all other xmax flags).

So I don't really see a danger. But you're right, even if there were
cases where that case happens - without any other odd flags - we'd still
be safe.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Sandro Santilli
Date:
On Thu, Jun 05, 2014 at 01:04:51AM +0200, Andres Freund wrote:
> Hi Sandro,
>
> On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> > The attached script shows a plpgsql function that enters
> > an infinite loop which is not interrupted by statement_timeout.
>
> Pushed a fix. Thanks for the report!

Thanks, will test asap.

BTW, I noticed the commit contained no testcase, would it be safer
to include one ? The smallest test I got to (not requiring PostGIS)
is here: http://strk.keybit.net/tmp/crash.sql


--strk;

Re: uninterruptable loop: concurrent delete in progress within table

From
Sandro Santilli
Date:
On Thu, Jun 05, 2014 at 07:58:56AM +0200, Sandro Santilli wrote:
> On Thu, Jun 05, 2014 at 01:04:51AM +0200, Andres Freund wrote:
> > Hi Sandro,
> >
> > On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> > > The attached script shows a plpgsql function that enters
> > > an infinite loop which is not interrupted by statement_timeout.
> >
> > Pushed a fix. Thanks for the report!
>
> Thanks, will test asap.

I confirm the crash doesn't happen anymore, thanks again.

--strk;

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-06-05 07:58:56 +0200, Sandro Santilli wrote:
> On Thu, Jun 05, 2014 at 01:04:51AM +0200, Andres Freund wrote:
> > Hi Sandro,
> >
> > On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> > > The attached script shows a plpgsql function that enters
> > > an infinite loop which is not interrupted by statement_timeout.
> >
> > Pushed a fix. Thanks for the report!
>
> Thanks, will test asap.
>
> BTW, I noticed the commit contained no testcase, would it be safer
> to include one ?

I have a separate commit with a testcase prepared. We're currently
discussing about some infrastructure that would make it more precise...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: uninterruptable loop: concurrent delete in progress within table

From
Sandro Santilli
Date:
On Thu, Jun 05, 2014 at 10:35:01AM +0200, Andres Freund wrote:
> On 2014-06-05 07:58:56 +0200, Sandro Santilli wrote:
> > On Thu, Jun 05, 2014 at 01:04:51AM +0200, Andres Freund wrote:
> > > Hi Sandro,
> > >
> > > On 2014-05-30 16:31:50 +0200, Sandro Santilli wrote:
> > > > The attached script shows a plpgsql function that enters
> > > > an infinite loop which is not interrupted by statement_timeout.
> > >
> > > Pushed a fix. Thanks for the report!
> >
> > Thanks, will test asap.
> >
> > BTW, I noticed the commit contained no testcase, would it be safer
> > to include one ?
>
> I have a separate commit with a testcase prepared. We're currently
> discussing about some infrastructure that would make it more precise...

Great, thanks again to everyone involved in this fix!

--strk;

Re: uninterruptable loop: concurrent delete in progress within table

From
Kevin Grittner
Date:
Andres Freund <andres@2ndquadrant.com> wrote:=0A=0A> I do wonder if any of =
the other existing callers of HTSV are=0A> affected. I don't understand pre=
dicate.c well enough to be sure,=0A> but it looks to me like it'd could in =
theory lead to missed=0A> conflicts. Seems fairly unlikely to matter in pra=
ctice though.=0A=0AEarly in the evaluation of the multi-xact bugs around 9.=
3 someone=0Apointed out that the serializable transactions could be affecte=
d. I=0Alooked at it then, and it was very hard to set up a case to=0Aactual=
ly hit that, and there were no field reports of it, so I=0Afigured I should=
 let the dust settle on fixes in this area before=0Adeciding what to do.=A0=
 It appeared that the fixes being made for=0Aother reasons might fix this a=
s well.=A0 So far I haven't gotten the=0Afeeling that this area has yet set=
tled down enough to get back to=0Athis.=A0 Perhaps we've now gotten to that=
 point?=0A=0AHave we really pinned down the semantics of the the return val=
ues=0Afor HeapTupleSatisfiesVacuum() at this point?=A0 If so, I guess the=
=0Anext question is, where is the exact meaning of each return value=0Adocu=
mented?=A0 The five to seven word comments in the enum value=0Adeclaration =
*seem* clear on the face of it, but I don't actually=0Ahave confidence what=
 they really mean when, for example, a=0Asubtransaction has rolled back.=A0=
 It appears that=0AHEAPTUPLE_INSERT_IN_PROGRESS can be returned for a DELET=
E under=0Asome circumstances, but it would take a lot of work to reverse-=
=0Aengineer exactly what the various conditions that can cause that=0Aretur=
n value are.=A0 It seems like someone who knows what these=0Avalues really =
mean should capture that in a code comment somewhere.=0A=0AWhat the code in=
 predicate.c is using this for is to determine, for=0Aa given tuple which t=
he calling process is reading, whether it is=0Avisible to the calling proce=
ss but has been deleted by another=0Atransaction (whose work this snapshot =
can't see), or is not visible=0Ato the calling process and has been inserte=
d by another transaction=0A(whose work this transaction can't see).=A0 To q=
ualify, the change=0Afrom the other transaction must be either committed or=
 still=0Apending (i.e., it might commit).=A0 With the tuple's visibility to=
=0Athe current process known, what return values answer that question=0Awit=
hout further checking, and what further checking is needed for=0Awhich othe=
r return codes?=0A=0A--=0AKevin Grittner=0AEDB: http://www.enterprisedb.com=
=0AThe Enterprise PostgreSQL Company

Re: uninterruptable loop: concurrent delete in progress within table

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Have we really pinned down the semantics of the the return values
> for HeapTupleSatisfiesVacuum() at this point?

Well, now that Andres unilaterally redefined HEAPTUPLE_INSERT_IN_PROGRESS
as meaning whatever the heck was easiest, I'd say no ;-).  But we have to
think about what we do want from it.  I'm not sure that we can positively
guarantee to distinguish all the possible states of an update-in-progress
tuple from outside the updating backend, and it likely would not be worth
the trouble to try since the answer could change at any instant anyway.

For the statistical purposes I was on about in the other thread, I would
be satisfied if we could distinguish "will be live if all relevant
transactions commit" from "will be dead if all relevant transactions
commit".  For other combinations like will-be-live-if-deleting-
subtransaction-rolls-back-but-outer-transaction-commits, there's really
no way for VACUUM to decide whether to count the tuple as live anyway.
Yet it'd be nice if we produced a live-tuple-count estimate that had some
relationship to reality in *some* scenario, and everything-commits seems
like the best choice of assumption.

It sounds like SSI might have some different requirements --- can you
specify what you need?

> What the code in predicate.c is using this for is to determine, for
> a given tuple which the calling process is reading, whether it is
> visible to the calling process but has been deleted by another
> transaction (whose work this snapshot can't see), or is not visible
> to the calling process and has been inserted by another transaction
> (whose work this transaction can't see).

I'm not sure why you'd need HeapTupleSatisfiesVacuum for that at all
--- it sounds like a plain visible-according-to-query-snapshot test.
Certainly HTSV is unconcerned with visibility as such, so it can't
satisfy this requirement by itself.

            regards, tom lane

Re: uninterruptable loop: concurrent delete in progress within table

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:=0A> Kevin Grittner <kgrittn@ymail.com> =
writes:=0A>> Have we really pinned down the semantics of the the return val=
ues=0A>> for HeapTupleSatisfiesVacuum() at this point?=0A>=0A> Well, now th=
at Andres unilaterally redefined HEAPTUPLE_INSERT_IN_PROGRESS=0A> as meanin=
g whatever the heck was easiest, I'd say no ;-).=0A=0AThat's kinda what I w=
as afraid of.=0A=0A> But we have to=0A> think about what we do want from it=
.=A0 I'm not sure that we can positively=0A> guarantee to distinguish all t=
he possible states of an update-in-progress=0A> tuple from outside the upda=
ting backend, and it likely would not be worth=0A> the trouble to try since=
 the answer could change at any instant anyway.=0A=0AThat sounds like way m=
ore than SSI needs anyway.=0A=0A> For the statistical purposes I was on abo=
ut in the other thread, I would=0A> be satisfied if we could distinguish "w=
ill be live if all relevant=0A> transactions commit" from "will be dead if =
all relevant transactions=0A> commit".=0A=0AIf I'm understanding you, and i=
f HEAPTUPLE_LIVE and HEAPTUPLE_DEAD=0Aare clear-cut "nothing is in progress=
" cases, I think that might be=0Aenough.=A0 We know coming in whether the t=
uple is visible to our own=0Atransaction; the question is whether its exist=
ence has been or is=0Apotentially being changed by a top level transaction =
whose work we=0Acan't see.=A0 Ideally we would ignore the effects of a subt=
ransaction=0Aunderneath such a top level transaction if that subtransaction=
 has=0Arolled back, but it would only compromise efficiency (not=0Acorrectn=
ess) if we didn't ignore the work of subtransactions which=0Ahave already b=
een rolled back on an uncommitted top-level transaction.=0A=0A>> What the c=
ode in predicate.c is using this for is to determine, for=0A>> a given tupl=
e which the calling process is reading, whether it is=0A>> visible to the c=
alling process but has been deleted by another=0A>> transaction (whose work=
 this snapshot can't see), or is not visible=0A>> to the calling process an=
d has been inserted by another transaction=0A>> (whose work this transactio=
n can't see).=0A>=0A> I'm not sure why you'd need HeapTupleSatisfiesVacuum =
for that at all=0A> --- it sounds like a plain visible-according-to-query-s=
napshot test.=0A=0AThat is *one* of the things we need, and it is passed in=
 as a bool=0Aparameter.=A0 We also need to know whether another transaction=
 has=0Acommitted, or might commit, something which would delete a visible=
=0Atuple or insert a tuple which is not visible to us (but which has=0Abeen=
 read on a scan).=A0 In other words, would the results of this=0Ascan be di=
fferent if run again after commit of the other=0Atransaction?=A0 We're talk=
ing about the=0ACheckForSerializableConflictOut() function, in case my desc=
ription=0Ahere isn't clear; perhaps the function comments and/or code will=
=0Aclarify things.=0A=0Ahttp://git.postgresql.org/gitweb/?p=3Dpostgresql.gi=
t;a=3Dblob;f=3Dsrc/backend/storage/lmgr/predicate.c;h=3D7c8d53e6a5a44e8bbbd=
a453c730bb5073f8a3842;hb=3Dmaster#l3850=0A=0A> Certainly HTSV is unconcerne=
d with visibility as such, so it can't=0A> satisfy this requirement by itse=
lf.=0A=0ARight.=0A=0A--=0AKevin Grittner=0AEDB: http://www.enterprisedb.com=
=0AThe Enterprise PostgreSQL Company

Re: uninterruptable loop: concurrent delete in progress within table

From
Andres Freund
Date:
On 2014-06-07 15:20:07 -0700, Kevin Grittner wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Kevin Grittner <kgrittn@ymail.com> writes:
> >> Have we really pinned down the semantics of the the return values
> >> for HeapTupleSatisfiesVacuum() at this point?
> >
> > Well, now that Andres unilaterally redefined HEAPTUPLE_INSERT_IN_PROGRESS
> > as meaning whatever the heck was easiest, I'd say no ;-).

Meh. It's not about defining it what was easiest. The issue is that
there's simply no clear definition of what's correct to return when both
xmin and xmax are in progress. You can argue that
HEAPTUPLE_INSERT_IN_PROGRESS is correct because it's not clear that the
row has ever been inserted. You can argue that DELETE_IN_PROGRESS is
more correct since it has actually been deleted. The latter has the
problem that callers might reasonably expect xmin to actually have
committed.

Note that the previous definition was clearly more bogus (which was
actively causing bugs!): When xmax was set but aborted it still returned
DELETE_IN_PROGRESS.

> > But we have to
> > think about what we do want from it. I'm not sure that we can positively
> > guarantee to distinguish all the possible states of an update-in-progress
> > tuple from outside the updating backend, and it likely would not be worth
> > the trouble to try since the answer could change at any instant anyway.
>
> That sounds like way more than SSI needs anyway.

Note that the current code *does* give an accurate answer in that
case. The only behavioural change is when xmin and xmax are *both* in
progress and are *not* running in the current backend.

Before my change it was:

        else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
        {
            if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid */
                return HEAPTUPLE_INSERT_IN_PROGRESS;
            /* only locked? run infomask-only check first, for performance */
            if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) ||
                HeapTupleHeaderIsOnlyLocked(tuple))
                return HEAPTUPLE_INSERT_IN_PROGRESS;
            /* inserted and then deleted by same xact */
            return HEAPTUPLE_DELETE_IN_PROGRESS;
        }
now it's
        else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
        {
            if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid */
                return HEAPTUPLE_INSERT_IN_PROGRESS;
            /* only locked? run infomask-only check first, for performance */
            if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) ||
                HeapTupleHeaderIsOnlyLocked(tuple))
                return HEAPTUPLE_INSERT_IN_PROGRESS;
            /* inserted and then deleted by same xact */
            if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple)))
                return HEAPTUPLE_DELETE_IN_PROGRESS;
            /* deleting subtransaction must have aborted */
            return HEAPTUPLE_INSERT_IN_PROGRESS;
        }
        else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
        {
            /*
             * It'd be possible to discern between INSERT/DELETE in progress
             * here by looking at xmax - but that doesn't seem beneficial for
             * the majority of callers and even detrimental for some. We'd
             * rather have callers look at/wait for xmin than xmax. It's
             * always correct to return INSERT_IN_PROGRESS because that's
             * what's happening from the view of other backends.
             */
            return HEAPTUPLE_INSERT_IN_PROGRESS;
        }

Disregarding all that verbiage, the old behaviour was:

If xmin is in progress in this or another backend and xmax was ever set:
HEAPTUPLE_DELETE_IN_PROGRESS

now it's

If xmin is in progress and it's running in the current backend:
return INSERT/DELETE_IN_PROGRESS according to xmax
otherwise if xmin is in progress
INSERT_IN_PROGRESS

All the rest of the cases are unchanged.

> > For the statistical purposes I was on about in the other thread, I would
> > be satisfied if we could distinguish "will be live if all relevant
> > transactions commit" from "will be dead if all relevant transactions
> > commit".
>
> If I'm understanding you, and if HEAPTUPLE_LIVE and HEAPTUPLE_DEAD
> are clear-cut "nothing is in progress" cases, I think that might be
> enough.

Ok. Those haven't changed.

> We know coming in whether the tuple is visible to our own
> transaction;

So I understand it correctly that that predicate.c will never be invoked
HeapTupleSatisfiesVacuum() when the tuple has been created (INSERT, new
UPDATE version) by a currently running backend? Does that include the
case where the looked at tuple has been created by the current backend?

> the question is whether its existence has been or is
> potentially being changed by a top level transaction whose work we
> can't see.

I.e. this is effectively only looking for changes of xmax? Because
otherwise we'd not be looking at the tuple?

> Ideally we would ignore the effects of a subtransaction
> underneath such a top level transaction if that subtransaction has
> rolled back, but it would only compromise efficiency (not
> correctness) if we didn't ignore the work of subtransactions which
> have already been rolled back on an uncommitted top-level transaction.

That's the case.

> >> What the code in predicate.c is using this for is to determine, for
> >> a given tuple which the calling process is reading, whether it is
> >> visible to the calling process but has been deleted by another
> >> transaction (whose work this snapshot can't see), or is not visible
> >> to the calling process and has been inserted by another transaction
> >> (whose work this transaction can't see).
> >
> > I'm not sure why you'd need HeapTupleSatisfiesVacuum for that at all
> > --- it sounds like a plain visible-according-to-query-snapshot test.
>
> That is *one* of the things we need, and it is passed in as a bool
> parameter.  We also need to know whether another transaction has
> committed, or might commit, something which would delete a visible
> tuple or insert a tuple which is not visible to us (but which has
> been read on a scan).  In other words, would the results of this
> scan be different if run again after commit of the other
> transaction?  We're talking about the
> CheckForSerializableConflictOut() function, in case my description
> here isn't clear; perhaps the function comments and/or code will
> clarify things.

I'm a bit confused by this - now you're talking about it touching tuples
which aren't visible to the current backend?
The relevant snippet are those cases here:
        case HEAPTUPLE_DELETE_IN_PROGRESS:
            xid = HeapTupleHeaderGetUpdateXid(tuple->t_data);
            break;
        case HEAPTUPLE_INSERT_IN_PROGRESS:
            xid = HeapTupleHeaderGetXmin(tuple->t_data);
            break;
Does it I) cause a bug if you're looking at xmax because
HEAPTUPLE_DELETE_IN_PROGRESS was returned even though xmin hasn't even
committed? That's the case 621a99a fixed. It looks like not because
a) you're chasing the xid to the toplevel xid anyway b) you're ignoring
the case where it was created by our own backend anyway.

Does it II) cause a bug if you're looking at xmin because
HEAPTUPLE_INSERT_IN_PROGRESS was returned even though xmax was set *iff*
xmin is also in progress by the same backend? It doesn't look so either
for I)'s a) reason. I.e. you'd in the end look at the same xid anyway.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services