Thread: Standbys, txid_current_snapshot, wraparound

Standbys, txid_current_snapshot, wraparound

From
Daniel Farina
Date:
Some time ago I reported bug 6291[0], which reported a Xid wraparound,
both as reported in pg_controldata and by txid_current_snapshot.
Unfortunately, nobody could reproduce it.

Today, the same system of ours just passed the wraparound mark
successfully at this time, incrementing the epoch.  However, two
standbys have not done the same: they have wrapped to a low txid.  At
this time, pg_controldata does report the correct epoch, as I read it,
unlike the original case.

I have not yet tried to reproduce this in a minimal way, but I wanted
to relate this information as soon as possible.

These systems are 9.0.6, on Ubuntu 10.04 LTS, amd64.

[0]: http://archives.postgresql.org/pgsql-bugs/2011-11/msg00094.php

-- 
fdr


Re: Standbys, txid_current_snapshot, wraparound

From
Simon Riggs
Date:
On Fri, Mar 23, 2012 at 12:26 AM, Daniel Farina <daniel@heroku.com> wrote:

> Some time ago I reported bug 6291[0], which reported a Xid wraparound,
> both as reported in pg_controldata and by txid_current_snapshot.
> Unfortunately, nobody could reproduce it.
>
> Today, the same system of ours just passed the wraparound mark
> successfully at this time, incrementing the epoch.  However, two
> standbys have not done the same: they have wrapped to a low txid.  At
> this time, pg_controldata does report the correct epoch, as I read it,
> unlike the original case.
>
> I have not yet tried to reproduce this in a minimal way, but I wanted
> to relate this information as soon as possible.
>
> These systems are 9.0.6, on Ubuntu 10.04 LTS, amd64.
>
> [0]: http://archives.postgresql.org/pgsql-bugs/2011-11/msg00094.php

So we have this?

Master pg_controldata - OK txid_current_snapshot() - OK
Standby pg_controldata - OK txid_current_snapshot() - lower value

Are there just 2 standbys? So all standbys have acted identically?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Standbys, txid_current_snapshot, wraparound

From
Daniel Farina
Date:
On Fri, Mar 23, 2012 at 1:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> So we have this?
>
> Master pg_controldata - OK txid_current_snapshot() - OK
> Standby pg_controldata - OK txid_current_snapshot() - lower value
>
> Are there just 2 standbys? So all standbys have acted identically?

Yes, I believe this is the situation. All have acted identically.
Also, some new data:

I took a new base backup after the epoch increment and started a new
standby, and it reported txid_current_snapshot correctly, at least
moments after it became consistent.  This morning, however, it does
not, and reports the 0-epoch number.

-- 
fdr


Re: Standbys, txid_current_snapshot, wraparound

From
Marko Kreen
Date:
On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote:
> Master pg_controldata - OK txid_current_snapshot() - OK
> Standby pg_controldata - OK txid_current_snapshot() - lower value

On Skytools list is report about master with slaves, but the
lower value appears on master too:
 http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html

Cc'd original reporter too.

-- 
marko



Re: Standbys, txid_current_snapshot, wraparound

From
Simon Riggs
Date:
On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <markokr@gmail.com> wrote:
> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote:
>> Master pg_controldata - OK txid_current_snapshot() - OK
>> Standby pg_controldata - OK txid_current_snapshot() - lower value
>
> On Skytools list is report about master with slaves, but the
> lower value appears on master too:
>
>  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html
>
> Cc'd original reporter too.

Thanks. Am looking.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Standbys, txid_current_snapshot, wraparound

From
Simon Riggs
Date:
On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <markokr@gmail.com> wrote:
>> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote:
>>> Master pg_controldata - OK txid_current_snapshot() - OK
>>> Standby pg_controldata - OK txid_current_snapshot() - lower value
>>
>> On Skytools list is report about master with slaves, but the
>> lower value appears on master too:
>>
>>  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html
>>
>> Cc'd original reporter too.
>
> Thanks. Am looking.

I can't see how this could happen on the master at all.

On the standby, it can happen if we skip restartpoints for about a
couple of billion xids. Which would be a problem.

More on this tomorrow.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Standbys, txid_current_snapshot, wraparound

From
Marko Kreen
Date:
On Wed, Mar 28, 2012 at 10:54:58PM +0100, Simon Riggs wrote:
> On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <markokr@gmail.com> wrote:
> >> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote:
> >>> Master pg_controldata - OK txid_current_snapshot() - OK
> >>> Standby pg_controldata - OK txid_current_snapshot() - lower value
> >>
> >> On Skytools list is report about master with slaves, but the
> >> lower value appears on master too:
> >>
> >>  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html
> >>
> >> Cc'd original reporter too.
> >
> > Thanks. Am looking.
> 
> I can't see how this could happen on the master at all.
> 
> On the standby, it can happen if we skip restartpoints for about a
> couple of billion xids. Which would be a problem.
> 
> More on this tomorrow.

I can't find a place where WAL replay updates values under XLogCtl.
If that really does not happen, that would explain why standbys can
see wrong epoch.

No clue yet how master can get broken.

-- 
marko



Re: Standbys, txid_current_snapshot, wraparound

From
Simon Riggs
Date:
On Wed, Mar 28, 2012 at 10:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <markokr@gmail.com> wrote:
>>> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote:
>>>> Master pg_controldata - OK txid_current_snapshot() - OK
>>>> Standby pg_controldata - OK txid_current_snapshot() - lower value
>>>
>>> On Skytools list is report about master with slaves, but the
>>> lower value appears on master too:
>>>
>>>  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html
>>>
>>> Cc'd original reporter too.
>>
>> Thanks. Am looking.
>
> I can't see how this could happen on the master at all.
>
> On the standby, it can happen if we skip restartpoints for about a
> couple of billion xids. Which would be a problem.
>
> More on this tomorrow.

I've not been able to recreate the problem up till now. But "knowing"
there is a bug helps develop a theory based upon the code.

CreateCheckpoint() increments the epoch on the master at the next
checkpoint after wraparound. (I'd be happier if there was an explicit
link between those two points; there's not but I can't yet see a
problem).

When the standby receives the checkpoint record, it stores the
information in 2 places:
i) directly into ControlFile->checkPointCopy
ii) and then into XLogCtl when a safe restartpoint occurs

It's possible that a safe restartpoint could be delayed. When that
happens, the XLogCtl copy grows stale. If the delay is long enough,
then the NextXid counter will increase and will eventually be higher
than the last XLogCtl, causing the epoch returned by
GetNextXidAndEpoch() to go backwards by 1. If it is delayed even
further it would wrap again and increase again by one, then decrease,
then increase. Given enough time and/or a very busy server using lots
of xids.

At the same time, when we do UpdateControlFile() the other copy of the
epoch is written to the controlfile, so the controlfile shows the
accurate value for the epoch.

So I can explain how we get two different answers from the standby,
and I can explain how the error is very hard to reproduce and
apparently transient. I can't explain anything involving the master.

The key is "what delays the restartpoint?". And the answer there is
probably bugs in index code which purport to see invalid pages when
they probably shouldn't be there. So a REINDEX would likely help.

I'll look at a patch to improve this. Definite bug and will be backpatched.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Standbys, txid_current_snapshot, wraparound

From
Marko Kreen
Date:
On Thu, Mar 29, 2012 at 10:37:54AM +0100, Simon Riggs wrote:
> When the standby receives the checkpoint record, it stores the
> information in 2 places:
> i) directly into ControlFile->checkPointCopy
> ii) and then into XLogCtl when a safe restartpoint occurs

In RecoveryRestartPoint() I see:

- memcpy(&XLogCtl->lastCheckPoint, checkPoint, sizeof(CheckPoint));

but I still don't see how are the ckptXid* values updated?

What am I missing?

-- 
marko



Re: Standbys, txid_current_snapshot, wraparound

From
Simon Riggs
Date:
On Thu, Mar 29, 2012 at 11:12 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 10:37:54AM +0100, Simon Riggs wrote:
>> When the standby receives the checkpoint record, it stores the
>> information in 2 places:
>> i) directly into ControlFile->checkPointCopy
>> ii) and then into XLogCtl when a safe restartpoint occurs
>
> In RecoveryRestartPoint() I see:
>
> - memcpy(&XLogCtl->lastCheckPoint, checkPoint, sizeof(CheckPoint));
>
> but I still don't see how are the ckptXid* values updated?
>
> What am I missing?

It's updated on the master and then copied into place.

Patch coming in a few hours.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Standbys, txid_current_snapshot, wraparound

From
Simon Riggs
Date:
On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> Patch coming in a few hours.

This is more straightforward than I was thinking. We just need to
initialise XLogCtl at the right place.


--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Standbys, txid_current_snapshot, wraparound

From
Marko Kreen
Date:
On Thu, Mar 29, 2012 at 02:46:23PM +0100, Simon Riggs wrote:
> On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Patch coming in a few hours.
> 
> This is more straightforward than I was thinking. We just need to
> initialise XLogCtl at the right place.

Looks good to me.  That should fix the problem with standbys then.


Next question: how can flipping archive_mode on and off,
with restarts, near wraparound point, break epoch on master?
 http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html

Any ideas?

Could WAL playback from backend with different archive_mode setting
cause it somehow?

-- 
marko



Re: Standbys, txid_current_snapshot, wraparound

From
Simon Riggs
Date:
On Thu, Mar 29, 2012 at 3:04 PM, Marko Kreen <markokr@gmail.com> wrote:
> On Thu, Mar 29, 2012 at 02:46:23PM +0100, Simon Riggs wrote:
>> On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > Patch coming in a few hours.
>>
>> This is more straightforward than I was thinking. We just need to
>> initialise XLogCtl at the right place.
>
> Looks good to me.  That should fix the problem with standbys then.
>
>
> Next question: how can flipping archive_mode on and off,
> with restarts, near wraparound point, break epoch on master?
>
>  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html
>
> Any ideas?
>
> Could WAL playback from backend with different archive_mode setting
> cause it somehow?

The current code would allow it, if you managed to go for 4 billion
xids between checkpoints. If that happened, it would give an off by
one error each time it happened.

I don't believe that is possible.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Standbys, txid_current_snapshot, wraparound

From
Marko Kreen
Date:
On Thu, Mar 29, 2012 at 03:23:01PM +0100, Simon Riggs wrote:
> On Thu, Mar 29, 2012 at 3:04 PM, Marko Kreen <markokr@gmail.com> wrote:
> > Next question: how can flipping archive_mode on and off,
> > with restarts, near wraparound point, break epoch on master?
> >
> >  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html
> >
> > Any ideas?
> >
> > Could WAL playback from backend with different archive_mode setting
> > cause it somehow?
> 
> The current code would allow it, if you managed to go for 4 billion
> xids between checkpoints. If that happened, it would give an off by
> one error each time it happened.
> 
> I don't believe that is possible.

Yes, that does not sound like probable case.

-- 
marko



Re: Standbys, txid_current_snapshot, wraparound

From
Andrey Velikoredchanin
Date:
2012/3/29 Marko Kreen <markokr@gmail.com>
I can't find a place where WAL replay updates values under XLogCtl.
If that really does not happen, that would explain why standbys can
see wrong epoch.

No clue yet how master can get broken.


Details about environment:

Debian

Linux db 2.6.32-5-amd64 #1 SMP Fri Sep 9 20:23:16 UTC 2011 x86_64 GNU/Linux

# SELECT version();
 PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit

Re: Standbys, txid_current_snapshot, wraparound

From
Daniel Farina
Date:
On Thu, Mar 22, 2012 at 5:26 PM, Daniel Farina <daniel@heroku.com> wrote:
> Some time ago I reported bug 6291[0], which reported a Xid wraparound,
> both as reported in pg_controldata and by txid_current_snapshot.
> Unfortunately, nobody could reproduce it.
>
> Today, the same system of ours just passed the wraparound mark
> successfully at this time, incrementing the epoch.  However, two
> standbys have not done the same: they have wrapped to a low txid.  At
> this time, pg_controldata does report the correct epoch, as I read it,
> unlike the original case.

Hmm. Here we are again, to revive this old thread (whose archives can
be seen at http://archives.postgresql.org/pgsql-hackers/2012-03/msg01437.php)

So the system *has* wrapped this time -- and so has its standby -- but
not to the zero-epoch, but rather the epoch that it was previously in,
2^33 to 2^32.  That's not something I've seen before.  There was
another report of something similar in that thread also. So,
that's...different.

However, since we've started using txids for some
streaming/incremental copying of data roughly a year ago (and thus
started be alerted to this problem), two of the three known
wraparounds have been eventful, and the third exposed a potentially
unrelated bug in the standby whereby epochs were not being loaded.
This database is itself descended from from an old timeline on a
version of postgres that undoubtably showed this defect.

Version: 9.0.6
Ubuntu 10.04 LTS
amd64

--
fdr