out-of-order XID insertion in KnownAssignedXids - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | out-of-order XID insertion in KnownAssignedXids |
Date | |
Msg-id | 0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru Whole thread Raw |
Responses |
Re: out-of-order XID insertion in KnownAssignedXids
|
List | pgsql-hackers |
Hi hackers, Looks like there is a bug with logging running transactions XIDs and prepared transactions. One of our customers get error "FATAL: out-of-order XID insertion in KnownAssignedXids" trying to apply backup. WAL contains the following record: rmgr: Standby len (rec/tot): 98/ 98, tx: 0, lsn: 1418/A9A76C90, prev 1418/A9A76C48, desc: RUNNING_XACTS nextXid 2004495309 latestCompletedXid 2004495307 oldestRunningXid 2004495290; 3 xacts: 2004495290 2004495308 2004495308 As you can notice, XID 2004495308 is encountered twice which cause error in KnownAssignedXidsAdd: if (head > tail && TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid)) { KnownAssignedXidsDisplay(LOG); elog(ERROR, "out-of-order XID insertion in KnownAssignedXids"); } The probability of this error is very small but it can quite easily reproduced: you should just set breakpoint in debugger after calling MarkAsPrepared in twophase.c and then try to prepare any transaction. MarkAsPrepared will add GXACT to proc array and at this moment there will be two entries in procarray with the same XID: (gdb) p procArray->numProcs $2 = 4 (gdb) p allPgXact[procArray->pgprocnos[0]] $4 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000', delayChkpt = 1 '\001', nxids = 0 '\000', used = 0 '\000', parent = 0x0} (gdb) p allPgXact[procArray->pgprocnos[1]] $5 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000', delayChkpt = 0 '\000', nxids = 0 '\000', used = 0 '\000', parent = 0x0} (gdb) p allPgXact[procArray->pgprocnos[2]] $6 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000', delayChkpt = 0 '\000', nxids = 0 '\000', used = 0 '\000', parent = 0x0} (gdb) p allPgXact[procArray->pgprocnos[3]] $7 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000', delayChkpt = 0 '\000', nxids = 0 '\000', used = 0 '\000', parent = 0x0} Then you should just wait for sometime until checkpoint timeout is triggered and it logs snapshot: (gdb) bt #0 0x00000000007f3dab in GetRunningTransactionData () at procarray.c:2240 #1 0x00000000007fab22 in LogStandbySnapshot () at standby.c:943 #2 0x000000000077cde8 in BackgroundWriterMain () at bgwriter.c:331 #3 0x00000000005377f3 in AuxiliaryProcessMain (argc=2, argv=0x7ffe00aa00e0) at bootstrap.c:446 #4 0x000000000078e07e in StartChildProcess (type=BgWriterProcess) at postmaster.c:5323 #5 0x000000000078b6f0 in reaper (postgres_signal_arg=17) at postmaster.c:2948 #6 <signal handler called> #7 0x00007f1356d665b3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 #8 0x0000000000789931 in ServerLoop () at postmaster.c:1765 #9 0x000000000078906c in PostmasterMain (argc=3, argv=0x1902640) at postmaster.c:1406 #10 0x00000000006d0e4f in main (argc=3, argv=0x1902640) at main.c:228 Now generated RUNNING_XACTS record contains duplicated XIDs. I want to ask opinion of community about the best way of fixing this problem. Should we avoid storing duplicated XIDs in procarray (by invalidating XID in original pgaxct) or eliminate/change check for duplicate in KnownAssignedXidsAdd (for example just ignore duplicates)?
pgsql-hackers by date: