Re: out-of-order XID insertion in KnownAssignedXids - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: out-of-order XID insertion in KnownAssignedXids
Date
Msg-id 20181005080422.GA7863@paquier.xyz
Whole thread Raw
In response to out-of-order XID insertion in KnownAssignedXids  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: out-of-order XID insertion in KnownAssignedXids  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Re: out-of-order XID insertion in KnownAssignedXids  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
> As you can notice, XID 2004495308 is encountered twice which cause error in
> KnownAssignedXidsAdd:
>
>     if (head > tail &&
>         TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
>     {
>         KnownAssignedXidsDisplay(LOG);
>         elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
>     }
>
> The probability of this error is very small but it can quite easily
> reproduced: you should just set breakpoint in debugger after calling
> MarkAsPrepared in twophase.c and then try to prepare any transaction.
> MarkAsPrepared  will add GXACT to proc array and at this moment there will
> be two entries in procarray with the same XID:
>
> [snip]
>
> Now generated RUNNING_XACTS record contains duplicated XIDs.

So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606

If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block.  Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record.  It is impressive that your customer has been able
to see that first, and then that you have been able to get into that
state with simple steps.

> I want to ask opinion of community about the best way of fixing this
> problem.  Should we avoid storing duplicated XIDs in procarray (by
> invalidating XID in original pgaxct) or eliminate/change check for
> duplicate in KnownAssignedXidsAdd (for example just ignore
> duplicates)?

Hmmmmm...  Please let me think through that first.  It seems to me that
the record should not be generated to begin with.  At least I am able to
confirm what you see.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: pg_upgrade failed with ERROR: null relpartbound for relation18159 error.
Next
From: Daniel Gustafsson
Date:
Subject: Re: [HACKERS] Optional message to user when terminating/cancellingbackend