Home > mailing lists

Re: txid failed epoch increment, again, aka 6291 - Mailing list pgsql-hackers

From	Noah Misch
Subject	Re: txid failed epoch increment, again, aka 6291
Date	September 6, 2012 13:04:13
Msg-id	20120906100406.GA2399@tornado.leadboat.com Whole thread Raw
In response to	Re: txid failed epoch increment, again, aka 6291 (Daniel Farina <daniel@heroku.com>)
Responses	Re: txid failed epoch increment, again, aka 6291
List	pgsql-hackers

Tree view

On Tue, Sep 04, 2012 at 09:46:58AM -0700, Daniel Farina wrote:
> I might try to find the segments leading up to the overflow point and
> try xlogdumping them to see what we can see.

That would be helpful to see.

Just to grasp at yet-flimsier straws, could you post (URL preferred, else
private mail) the output of "objdump -dS" on your "postgres" executable?

> If there's anything to note about the workload, I'd say that it does
> tend to make fairly pervasive use of long running transactions which
> can span probably more than one checkpoint, and the txid reporting
> functions, and a concurrency level of about 300 or so backends ... but
> per my reading of the mechanism so far, it doesn't seem like any of
> this should matter.

Thanks for the details; I agree none of that sounds suspicious.

After some further pondering and testing, this remains a mystery to me.  These
symptoms imply a proper update of ControlFile->checkPointCopy.nextXid without
having properly updated ControlFile->checkPointCopy.nextXidEpoch.  After
recovery, only CreateCheckPoint() updates ControlFile->checkPointCopy at all.
Its logic for doing so looks simple and correct.

pgsql-hackers by date:

From: Amit kapila
Date: 06 September 2012, 12:09:53
Subject: Re: [WIP PATCH] for Performance Improvement in Buffer Management

From: Alvaro Herrera
Date: 06 September 2012, 16:00:20
Subject: Re: Draft release notes complete

Re: txid failed epoch increment, again, aka 6291 - Mailing list pgsql-hackers

Previous

Next