Re: out-of-order XID insertion in KnownAssignedXids - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: out-of-order XID insertion in KnownAssignedXids |
Date | |
Msg-id | 20181008163049.nvaui5kjrsav2ojn@alap3.anarazel.de Whole thread Raw |
In response to | Re: out-of-order XID insertion in KnownAssignedXids (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: out-of-order XID insertion in KnownAssignedXids
Re: out-of-order XID insertion in KnownAssignedXids |
List | pgsql-hackers |
On 2018-10-08 18:28:52 +0300, Konstantin Knizhnik wrote: > > > On 08.10.2018 18:24, Andres Freund wrote: > > > > On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > > > > On 05.10.2018 11:04, Michael Paquier wrote: > > > > On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: > > > > > As you can notice, XID 2004495308 is encountered twice which cause > > > error in > > > > > KnownAssignedXidsAdd: > > > > > > > > > > if (head > tail && > > > > > TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], > > > from_xid)) > > > > > { > > > > > KnownAssignedXidsDisplay(LOG); > > > > > elog(ERROR, "out-of-order XID insertion in > > > KnownAssignedXids"); > > > > > } > > > > > > > > > > The probability of this error is very small but it can quite easily > > > > > reproduced: you should just set breakpoint in debugger after calling > > > > > MarkAsPrepared in twophase.c and then try to prepare any > > > transaction. > > > > > MarkAsPrepared will add GXACT to proc array and at this moment > > > there will > > > > > be two entries in procarray with the same XID: > > > > > > > > > > [snip] > > > > > > > > > > Now generated RUNNING_XACTS record contains duplicated XIDs. > > > > So, I have been doing exactly that, and if you trigger a manual > > > > checkpoint then things happen quite correctly if you let the first > > > > session finish: > > > > rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: > > > > 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 > > > > latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 > > > > > > > > If you still maintain the debugger after calling MarkAsPrepared, then > > > > the manual checkpoint would block. Now if you actually keep the > > > > debugger, and wait for a checkpoint timeout to happen, then I can see > > > > the incorrect record. It is impressive that your customer has been > > > able > > > > to see that first, and then that you have been able to get into that > > > > state with simple steps. > > > > > > > > > I want to ask opinion of community about the best way of fixing this > > > > > problem. Should we avoid storing duplicated XIDs in procarray (by > > > > > invalidating XID in original pgaxct) or eliminate/change check for > > > > > duplicate in KnownAssignedXidsAdd (for example just ignore > > > > > duplicates)? > > > > Hmmmmm... Please let me think through that first. It seems to me > > > that > > > > the record should not be generated to begin with. At least I am able > > > to > > > > confirm what you see. > > > The simplest way to fix the problem is to ignore duplicates before > > > adding them to KnownAssignedXids. > > > We in any case perform sort i this place... > > I vehemently object to that as the proper course. > And what about adding qsort to GetRunningTransactionData or > LogCurrentRunningXacts and excluding duplicates here? Sounds less terrible, but still pretty bad. I think we should fix the underlying data inconsistency, not paper over it a couple hundred meters away. Greetings, Andres Freund
pgsql-hackers by date: