Home > mailing lists

txid failed epoch increment, again, aka 6291 - Mailing list pgsql-hackers

From	Daniel Farina
Subject	txid failed epoch increment, again, aka 6291
Date	September 4, 2012 09:08:15
Msg-id	CAAZKuFbnPYje8Xax2mMpCVfC4Mh8CZ3aT2Hd9z3W5GOLkce+_Q@mail.gmail.com Whole thread Raw
Responses	Re: txid failed epoch increment, again, aka 6291
List	pgsql-hackers

Tree view

It seems like this has reproduced once more.  And once again, there
doesn't appear to be any funny business in pg_control (but the structs
are pasted here for your re-check), and there are successful sensical
updates to it.  The primary is running 9.0.6.

However, we do have a new piece of data: there was a very brief period
where txid_snapshot did report an xmin greater than 2^33, our next
epoch boundary, by a few thousand transactions.  That could be because
the reporting function GetNextXidAndEpoch does its own epoch
calculation before the checkpoint and then after a checkpoint that
forgets to increment the epoch there is no need to add post-facto
adjust the epoch anymore.

I've been reviewing the mechanism in CreateCheckPoint for this on and
off for a couple of days, but so far I haven't come up with a
convincing mechanism. However, given that it seems historically that
this bug is more likely to surface than *not* surface on this system,
perhaps we can try for a sometimes-reproducing test case. I'm still
struggling for a hint of a solution, though, so toss your thoughts
here.

$2 = {Insert = {LogwrtResult = {Write = {xlogid = 0, xrecoff = 0},
Flush = {xlogid = 0, xrecoff = 0}}, PrevRecord = {xlogid = 0,     xrecoff = 0}, curridx = 0, currpage = 0x7ff4ed04a000,
currpos=
 
0x0, RedoRecPtr = {xlogid = 18751, xrecoff = 1200832888},   forcePageWrites = 0 '\000'}, LogwrtRqst = {Write = {xlogid
=0,
 
xrecoff = 0}, Flush = {xlogid = 0, xrecoff = 0}}, LogwrtResult = {   Write = {xlogid = 0, xrecoff = 0}, Flush = {xlogid
=0, xrecoff =
 
0}}, ckptXidEpoch = 1, ckptXid = 9904084, asyncXactLSN = {   xlogid = 0, xrecoff = 0}, lastRemovedLog = 0,
lastRemovedSeg= 0,
 
Write = {LogwrtResult = {Write = {xlogid = 0, xrecoff = 0},     Flush = {xlogid = 0, xrecoff = 0}}, curridx = 0,
lastSegSwitchTime = 0}, pages = 0x7ff4ed04a000 "", xlblocks =
0x7ff4ed0471d8, XLogCacheBlck = 1023, ThisTimeLineID = 0, RecoveryTargetTLI = 6,
archiveCleanupCommand = '\000' <repeats 1023 times>, SharedRecoveryInProgress = 1 '\001', lastCheckPointRecPtr =
{xlogid
= 18751, xrecoff = 1671519088}, lastCheckPoint = {redo = {     xlogid = 18751, xrecoff = 1200832888}, ThisTimeLineID =
6,
nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524,   nextMulti = 1119, nextMultiOffset = 3115, oldestXid =
4115479553,
oldestXidDB = 1, time = 1346746796, oldestActiveXid = 9776547}, replayEndRecPtr = {xlogid = 18751, xrecoff =
1748623656},
recoveryLastRecPtr = {xlogid = 18751, xrecoff = 1748623656}, recoveryLastXTime = 400062234671833, info_lck = 0 '\000'}
(gdb) p ControlFile
$3 = (ControlFileData *) 0x7ff4ed046bf8
(gdb) p *ControlFile
$4 = {system_identifier = 5613733157253676693, pg_control_version =
903, catalog_version_no = 201008051, state = DB_IN_ARCHIVE_RECOVERY, time = 1346746898, checkPoint =
{xlogid = 18751, xrecoff = 1072693824}, prevCheckPoint = {   xlogid = 18751, xrecoff = 1072693824}, checkPointCopy =
{redo=
 
{xlogid = 18751, xrecoff = 602482536}, ThisTimeLineID = 6,   nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524,
nextMulti
= 1119, nextMultiOffset = 3115, oldestXid = 4115479553,   oldestXidDB = 1, time = 1346746496, oldestActiveXid =
9558248},
minRecoveryPoint = {xlogid = 18751, xrecoff = 1748623656}, backupStartPoint = {xlogid = 0, xrecoff = 0}, wal_level =
2,
MaxConnections = 500, max_prepared_xacts = 500, max_locks_per_xact =
64, maxAlign = 8, floatFormat = 1234567, blcksz = 8192, relseg_size =
131072, xlog_blcksz = 8192, xlog_seg_size = 16777216, nameDataLen = 64, indexMaxKeys = 32, toast_max_chunk_size =
1996,
enableIntTimes = 1 '\001', float4ByVal = 1 '\001', float8ByVal = 1 '\001', crc = 3725972657}

-- 
fdr

pgsql-hackers by date:

From: Magnus Hagander
Date: 04 September 2012, 07:21:07
Subject: Re: Some whitespaces in utility.c

From: "Etsuro Fujita"
Date: 04 September 2012, 09:24:43
Subject: Minor document updates

txid failed epoch increment, again, aka 6291 - Mailing list pgsql-hackers

Previous

Next