Re: txid failed epoch increment, again, aka 6291 - Mailing list pgsql-hackers

From Noah Misch
Subject Re: txid failed epoch increment, again, aka 6291
Date
Msg-id 20120904150425.GA28600@tornado.leadboat.com
Whole thread Raw
In response to txid failed epoch increment, again, aka 6291  (Daniel Farina <daniel@heroku.com>)
Responses Re: txid failed epoch increment, again, aka 6291  (Daniel Farina <daniel@heroku.com>)
List pgsql-hackers
On Tue, Sep 04, 2012 at 02:07:30AM -0700, Daniel Farina wrote:
> It seems like this has reproduced once more.  And once again, there
> doesn't appear to be any funny business in pg_control (but the structs
> are pasted here for your re-check), and there are successful sensical
> updates to it.  The primary is running 9.0.6.

What version is the standby running?

> However, we do have a new piece of data: there was a very brief period
> where txid_snapshot did report an xmin greater than 2^33, our next
> epoch boundary, by a few thousand transactions.  That could be because
> the reporting function GetNextXidAndEpoch does its own epoch
> calculation before the checkpoint and then after a checkpoint that
> forgets to increment the epoch there is no need to add post-facto
> adjust the epoch anymore.

That makes sense.

> I've been reviewing the mechanism in CreateCheckPoint for this on and
> off for a couple of days, but so far I haven't come up with a
> convincing mechanism. However, given that it seems historically that
> this bug is more likely to surface than *not* surface on this system,
> perhaps we can try for a sometimes-reproducing test case. I'm still
> struggling for a hint of a solution, though, so toss your thoughts
> here.

The cause is not apparent to me, either, and simple tests here do not
reproduce the problem.

I am suspicious of xlog_redo() updating ControlData->checkPointCopy without
acquiring ControlFileLock.  If a restartpoint is finishing at about the same
time, ControlFile->checkPointCopy.nextXid might come from the just-read
checkpoint record while ControlFile->checkPointCopy.nextXidEpoch bears the
value from the older checkpoint just adopted as a restartpoint.  The resulting
inconsistency would, however, vanish at the next ControlFile->checkPointCopy
update.  This does not explain the symptoms you have reported, and it cannot
explain much of anything on a primary.

> (gdb) p *ControlFile
> $4 = {system_identifier = 5613733157253676693, pg_control_version =
> 903, catalog_version_no = 201008051,
>   state = DB_IN_ARCHIVE_RECOVERY, time = 1346746898, checkPoint =

This capture, it seems, is from a standby.

> {xlogid = 18751, xrecoff = 1072693824}, prevCheckPoint = {
>     xlogid = 18751, xrecoff = 1072693824}, checkPointCopy = {redo =
> {xlogid = 18751, xrecoff = 602482536}, ThisTimeLineID = 6,
>     nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524, nextMulti
> = 1119, nextMultiOffset = 3115, oldestXid = 4115479553,
>     oldestXidDB = 1, time = 1346746496, oldestActiveXid = 9558248},

You expected checkPointCopy = { ... nextXidEpoch = 2, ... }, correct?  (Well,
nextXidEpoch = 3 if you count the previous missed increment.)  Does pg_control
on the primary also bear epoch 1 where epoch 2 is expected?

Your last restartpoint was exactly five minutes before your last checkpoint,
so there's no evidence of a dearth of safe restartpoint opportunities.

nm



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: pg_upgrade del/rmdir path fix
Next
From: Andrew Dunstan
Date:
Subject: Re: pg_upgrade del/rmdir path fix