Thread: Standbys, txid_current_snapshot, wraparound
Some time ago I reported bug 6291[0], which reported a Xid wraparound, both as reported in pg_controldata and by txid_current_snapshot. Unfortunately, nobody could reproduce it. Today, the same system of ours just passed the wraparound mark successfully at this time, incrementing the epoch. However, two standbys have not done the same: they have wrapped to a low txid. At this time, pg_controldata does report the correct epoch, as I read it, unlike the original case. I have not yet tried to reproduce this in a minimal way, but I wanted to relate this information as soon as possible. These systems are 9.0.6, on Ubuntu 10.04 LTS, amd64. [0]: http://archives.postgresql.org/pgsql-bugs/2011-11/msg00094.php -- fdr
On Fri, Mar 23, 2012 at 12:26 AM, Daniel Farina <daniel@heroku.com> wrote: > Some time ago I reported bug 6291[0], which reported a Xid wraparound, > both as reported in pg_controldata and by txid_current_snapshot. > Unfortunately, nobody could reproduce it. > > Today, the same system of ours just passed the wraparound mark > successfully at this time, incrementing the epoch. However, two > standbys have not done the same: they have wrapped to a low txid. At > this time, pg_controldata does report the correct epoch, as I read it, > unlike the original case. > > I have not yet tried to reproduce this in a minimal way, but I wanted > to relate this information as soon as possible. > > These systems are 9.0.6, on Ubuntu 10.04 LTS, amd64. > > [0]: http://archives.postgresql.org/pgsql-bugs/2011-11/msg00094.php So we have this? Master pg_controldata - OK txid_current_snapshot() - OK Standby pg_controldata - OK txid_current_snapshot() - lower value Are there just 2 standbys? So all standbys have acted identically? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Mar 23, 2012 at 1:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > So we have this? > > Master pg_controldata - OK txid_current_snapshot() - OK > Standby pg_controldata - OK txid_current_snapshot() - lower value > > Are there just 2 standbys? So all standbys have acted identically? Yes, I believe this is the situation. All have acted identically. Also, some new data: I took a new base backup after the epoch increment and started a new standby, and it reported txid_current_snapshot correctly, at least moments after it became consistent. This morning, however, it does not, and reports the 0-epoch number. -- fdr
On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote: > Master pg_controldata - OK txid_current_snapshot() - OK > Standby pg_controldata - OK txid_current_snapshot() - lower value On Skytools list is report about master with slaves, but the lower value appears on master too: http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html Cc'd original reporter too. -- marko
On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <markokr@gmail.com> wrote: > On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote: >> Master pg_controldata - OK txid_current_snapshot() - OK >> Standby pg_controldata - OK txid_current_snapshot() - lower value > > On Skytools list is report about master with slaves, but the > lower value appears on master too: > > http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html > > Cc'd original reporter too. Thanks. Am looking. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <markokr@gmail.com> wrote: >> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote: >>> Master pg_controldata - OK txid_current_snapshot() - OK >>> Standby pg_controldata - OK txid_current_snapshot() - lower value >> >> On Skytools list is report about master with slaves, but the >> lower value appears on master too: >> >> http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html >> >> Cc'd original reporter too. > > Thanks. Am looking. I can't see how this could happen on the master at all. On the standby, it can happen if we skip restartpoints for about a couple of billion xids. Which would be a problem. More on this tomorrow. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Mar 28, 2012 at 10:54:58PM +0100, Simon Riggs wrote: > On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <markokr@gmail.com> wrote: > >> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote: > >>> Master pg_controldata - OK txid_current_snapshot() - OK > >>> Standby pg_controldata - OK txid_current_snapshot() - lower value > >> > >> On Skytools list is report about master with slaves, but the > >> lower value appears on master too: > >> > >> http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html > >> > >> Cc'd original reporter too. > > > > Thanks. Am looking. > > I can't see how this could happen on the master at all. > > On the standby, it can happen if we skip restartpoints for about a > couple of billion xids. Which would be a problem. > > More on this tomorrow. I can't find a place where WAL replay updates values under XLogCtl. If that really does not happen, that would explain why standbys can see wrong epoch. No clue yet how master can get broken. -- marko
On Wed, Mar 28, 2012 at 10:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <markokr@gmail.com> wrote: >>> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote: >>>> Master pg_controldata - OK txid_current_snapshot() - OK >>>> Standby pg_controldata - OK txid_current_snapshot() - lower value >>> >>> On Skytools list is report about master with slaves, but the >>> lower value appears on master too: >>> >>> http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html >>> >>> Cc'd original reporter too. >> >> Thanks. Am looking. > > I can't see how this could happen on the master at all. > > On the standby, it can happen if we skip restartpoints for about a > couple of billion xids. Which would be a problem. > > More on this tomorrow. I've not been able to recreate the problem up till now. But "knowing" there is a bug helps develop a theory based upon the code. CreateCheckpoint() increments the epoch on the master at the next checkpoint after wraparound. (I'd be happier if there was an explicit link between those two points; there's not but I can't yet see a problem). When the standby receives the checkpoint record, it stores the information in 2 places: i) directly into ControlFile->checkPointCopy ii) and then into XLogCtl when a safe restartpoint occurs It's possible that a safe restartpoint could be delayed. When that happens, the XLogCtl copy grows stale. If the delay is long enough, then the NextXid counter will increase and will eventually be higher than the last XLogCtl, causing the epoch returned by GetNextXidAndEpoch() to go backwards by 1. If it is delayed even further it would wrap again and increase again by one, then decrease, then increase. Given enough time and/or a very busy server using lots of xids. At the same time, when we do UpdateControlFile() the other copy of the epoch is written to the controlfile, so the controlfile shows the accurate value for the epoch. So I can explain how we get two different answers from the standby, and I can explain how the error is very hard to reproduce and apparently transient. I can't explain anything involving the master. The key is "what delays the restartpoint?". And the answer there is probably bugs in index code which purport to see invalid pages when they probably shouldn't be there. So a REINDEX would likely help. I'll look at a patch to improve this. Definite bug and will be backpatched. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 29, 2012 at 10:37:54AM +0100, Simon Riggs wrote: > When the standby receives the checkpoint record, it stores the > information in 2 places: > i) directly into ControlFile->checkPointCopy > ii) and then into XLogCtl when a safe restartpoint occurs In RecoveryRestartPoint() I see: - memcpy(&XLogCtl->lastCheckPoint, checkPoint, sizeof(CheckPoint)); but I still don't see how are the ckptXid* values updated? What am I missing? -- marko
On Thu, Mar 29, 2012 at 11:12 AM, Marko Kreen <markokr@gmail.com> wrote: > On Thu, Mar 29, 2012 at 10:37:54AM +0100, Simon Riggs wrote: >> When the standby receives the checkpoint record, it stores the >> information in 2 places: >> i) directly into ControlFile->checkPointCopy >> ii) and then into XLogCtl when a safe restartpoint occurs > > In RecoveryRestartPoint() I see: > > - memcpy(&XLogCtl->lastCheckPoint, checkPoint, sizeof(CheckPoint)); > > but I still don't see how are the ckptXid* values updated? > > What am I missing? It's updated on the master and then copied into place. Patch coming in a few hours. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Patch coming in a few hours. This is more straightforward than I was thinking. We just need to initialise XLogCtl at the right place. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, Mar 29, 2012 at 02:46:23PM +0100, Simon Riggs wrote: > On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Patch coming in a few hours. > > This is more straightforward than I was thinking. We just need to > initialise XLogCtl at the right place. Looks good to me. That should fix the problem with standbys then. Next question: how can flipping archive_mode on and off, with restarts, near wraparound point, break epoch on master? http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html Any ideas? Could WAL playback from backend with different archive_mode setting cause it somehow? -- marko
On Thu, Mar 29, 2012 at 3:04 PM, Marko Kreen <markokr@gmail.com> wrote: > On Thu, Mar 29, 2012 at 02:46:23PM +0100, Simon Riggs wrote: >> On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > Patch coming in a few hours. >> >> This is more straightforward than I was thinking. We just need to >> initialise XLogCtl at the right place. > > Looks good to me. That should fix the problem with standbys then. > > > Next question: how can flipping archive_mode on and off, > with restarts, near wraparound point, break epoch on master? > > http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html > > Any ideas? > > Could WAL playback from backend with different archive_mode setting > cause it somehow? The current code would allow it, if you managed to go for 4 billion xids between checkpoints. If that happened, it would give an off by one error each time it happened. I don't believe that is possible. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 29, 2012 at 03:23:01PM +0100, Simon Riggs wrote: > On Thu, Mar 29, 2012 at 3:04 PM, Marko Kreen <markokr@gmail.com> wrote: > > Next question: how can flipping archive_mode on and off, > > with restarts, near wraparound point, break epoch on master? > > > > http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html > > > > Any ideas? > > > > Could WAL playback from backend with different archive_mode setting > > cause it somehow? > > The current code would allow it, if you managed to go for 4 billion > xids between checkpoints. If that happened, it would give an off by > one error each time it happened. > > I don't believe that is possible. Yes, that does not sound like probable case. -- marko
2012/3/29 Marko Kreen <markokr@gmail.com>
Details about environment:
Debian
Linux db 2.6.32-5-amd64 #1 SMP Fri Sep 9 20:23:16 UTC 2011 x86_64 GNU/Linux
# SELECT version();
PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
I can't find a place where WAL replay updates values under XLogCtl.
If that really does not happen, that would explain why standbys can
see wrong epoch.
No clue yet how master can get broken.
Details about environment:
Debian
Linux db 2.6.32-5-amd64 #1 SMP Fri Sep 9 20:23:16 UTC 2011 x86_64 GNU/Linux
# SELECT version();
PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
On Thu, Mar 22, 2012 at 5:26 PM, Daniel Farina <daniel@heroku.com> wrote: > Some time ago I reported bug 6291[0], which reported a Xid wraparound, > both as reported in pg_controldata and by txid_current_snapshot. > Unfortunately, nobody could reproduce it. > > Today, the same system of ours just passed the wraparound mark > successfully at this time, incrementing the epoch. However, two > standbys have not done the same: they have wrapped to a low txid. At > this time, pg_controldata does report the correct epoch, as I read it, > unlike the original case. Hmm. Here we are again, to revive this old thread (whose archives can be seen at http://archives.postgresql.org/pgsql-hackers/2012-03/msg01437.php) So the system *has* wrapped this time -- and so has its standby -- but not to the zero-epoch, but rather the epoch that it was previously in, 2^33 to 2^32. That's not something I've seen before. There was another report of something similar in that thread also. So, that's...different. However, since we've started using txids for some streaming/incremental copying of data roughly a year ago (and thus started be alerted to this problem), two of the three known wraparounds have been eventful, and the third exposed a potentially unrelated bug in the standby whereby epochs were not being loaded. This database is itself descended from from an old timeline on a version of postgres that undoubtably showed this defect. Version: 9.0.6 Ubuntu 10.04 LTS amd64 -- fdr