Thread: PITR potentially broken in 9.2
Doing PITR in 9.2.1, the system claims that it reached a consistent recovery state immediately after redo starts. This leads to it various mysterious failures, when it should instead throw a "requested recovery stop point is before consistent recovery point" error. (If you are unlucky, I think it might even silently start up in a corrupt state.) This seems to have been introduced in: commit 8366c7803ec3d0591cf2d1226fea1fee947d56c3 Author: Simon Riggs <simon@2ndQuadrant.com> Date: Wed Jan 25 18:02:04 2012 +0000 Allow pg_basebackup from standby node with safety checking. Base backup follows recommended procedure, plus goes to great lengths to ensure that partial page writes are avoided. Jun Ishizuka and Fujii Masao, with minor modifications the backup file: START WAL LOCATION: 1/CD89E48 (file 00000001000000010000000C) STOP WAL LOCATION: 1/1AFA11A0 (file 00000001000000010000001A) CHECKPOINT LOCATION: 1/188D8120 BACKUP METHOD: pg_start_backup BACKUP FROM: master START TIME: 2012-11-27 09:40:13 PST LABEL: label STOP TIME: 2012-11-27 09:42:10 PST (The file 00000001000000010000000C was archived at 9:37.) recovery.conf: restore_command = 'cp /tmp/archivedir/%f %p' recovery_target_time = '2012-11-27 09:38:00 PST' Log file: 22110 2012-11-27 09:49:15.220 PST LOG: database system was interrupted; last known up at 2012-11-27 09:40:13 PST 22110 2012-11-27 09:49:15.235 PST LOG: starting point-in-time recovery to 2012-11-27 09:38:00-08 22110 2012-11-27 09:49:15.271 PST LOG: restored log file "000000010000000100000018" from archive 22110 2012-11-27 09:49:15.367 PST LOG: restored log file "00000001000000010000000C" from archive 22110 2012-11-27 09:49:15.372 PST LOG: redo starts at 1/CD89E48 22110 2012-11-27 09:49:15.374 PST LOG: consistent recovery state reached at 1/CD8B7F0 22110 2012-11-27 09:49:15.490 PST LOG: restored log file "00000001000000010000000D" from archive 22110 2012-11-27 09:49:15.775 PST LOG: restored log file "00000001000000010000000E" from archive 22110 2012-11-27 09:49:16.078 PST LOG: restored log file "00000001000000010000000F" from archive 22110 2012-11-27 09:49:16.345 PST LOG: restored log file "000000010000000100000010" from archive 22110 2012-11-27 09:49:16.533 PST LOG: recovery stopping before commit of transaction 951967, time 2012-11-27 09:38:00.000689-08 22110 2012-11-27 09:49:16.533 PST LOG: redo done at 1/10F41900 22110 2012-11-27 09:49:16.533 PST LOG: last completed transaction was at log time 2012-11-27 09:37:59.998496-08 22110 2012-11-27 09:49:16.537 PST LOG: selected new timeline ID: 2 22110 2012-11-27 09:49:16.584 PST LOG: archive recovery complete 22113 2012-11-27 09:49:16.599 PST LOG: checkpoint starting: end-of-recovery immediate wait 22113 2012-11-27 09:49:17.815 PST LOG: checkpoint complete: wrote 8336 buffers (12.7%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.097 s, sync=1.115 s, total=1.230 s; sync files=14, longest=0.578 s, average=0.079 s 22110 2012-11-27 09:49:17.929 PST FATAL: could not access status of transaction 1014015 22110 2012-11-27 09:49:17.929 PST DETAIL: Could not read from file "pg_clog/0000" at offset 245760: Success. 22109 2012-11-27 09:49:17.932 PST LOG: startup process (PID 22110) exited with exit code 1 22109 2012-11-27 09:49:17.932 PST LOG: terminating any other active server processes Cheers, Jeff
On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote: > Doing PITR in 9.2.1, the system claims that it reached a consistent > recovery state immediately after redo starts. > This leads to it various mysterious failures, when it should instead > throw a "requested recovery stop point is before consistent recovery > point" error. > (If you are unlucky, I think it might even silently start up in a > corrupt state.) I observed a similar problem with 9.2. Despite a restore_command that failed every time, startup from a hot backup completed. At the time, I suspected a mistake on my part.
On 2012-11-27 10:08:12 -0800, Jeff Janes wrote: > Doing PITR in 9.2.1, the system claims that it reached a consistent > recovery state immediately after redo starts. > This leads to it various mysterious failures, when it should instead > throw a "requested recovery stop point is before consistent recovery > point" error. > (If you are unlucky, I think it might even silently start up in a > corrupt state.) > > This seems to have been introduced in: > commit 8366c7803ec3d0591cf2d1226fea1fee947d56c3 > Author: Simon Riggs <simon@2ndQuadrant.com> > Date: Wed Jan 25 18:02:04 2012 +0000 > > Allow pg_basebackup from standby node with safety checking. > Base backup follows recommended procedure, plus goes to great > lengths to ensure that partial page writes are avoided. > > Jun Ishizuka and Fujii Masao, with minor modifications > > > the backup file: > > START WAL LOCATION: 1/CD89E48 (file 00000001000000010000000C) > STOP WAL LOCATION: 1/1AFA11A0 (file 00000001000000010000001A) > CHECKPOINT LOCATION: 1/188D8120 > BACKUP METHOD: pg_start_backup > BACKUP FROM: master > START TIME: 2012-11-27 09:40:13 PST > LABEL: label > STOP TIME: 2012-11-27 09:42:10 PST > > (The file 00000001000000010000000C was archived at 9:37.) > > recovery.conf: > restore_command = 'cp /tmp/archivedir/%f %p' > recovery_target_time = '2012-11-27 09:38:00 PST' > > > Log file: > > 22110 2012-11-27 09:49:15.220 PST LOG: database system was > interrupted; last known up at 2012-11-27 09:40:13 PST > 22110 2012-11-27 09:49:15.235 PST LOG: starting point-in-time > recovery to 2012-11-27 09:38:00-08 > 22110 2012-11-27 09:49:15.271 PST LOG: restored log file > "000000010000000100000018" from archive > 22110 2012-11-27 09:49:15.367 PST LOG: restored log file > "00000001000000010000000C" from archive > 22110 2012-11-27 09:49:15.372 PST LOG: redo starts at 1/CD89E48 Hm. Are you sure its actually reading your backup file? Its hard to say without DEBUG1 output but I would tentatively say its not reading it at all because the the "redo starts at ..." message indicates its not using the checkpoint location from the backup file. Can you reproduce the issue? If so, can you give an exact guide? If not, do you still have the datadir et al. from above? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 28.11.2012 06:27, Noah Misch wrote: > On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote: >> Doing PITR in 9.2.1, the system claims that it reached a consistent >> recovery state immediately after redo starts. >> This leads to it various mysterious failures, when it should instead >> throw a "requested recovery stop point is before consistent recovery >> point" error. >> (If you are unlucky, I think it might even silently start up in a >> corrupt state.) > > I observed a similar problem with 9.2. Despite a restore_command that failed > every time, startup from a hot backup completed. At the time, I suspected a > mistake on my part. I believe this was caused by this little typo/thinko: > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -6763,7 +6763,7 @@ StartupXLOG(void) > /* Pop the error context stack */ > error_context_stack = errcontext.previous; > > - if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) && > + if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) && > XLByteLE(ControlFile->backupEndPoint, EndRecPtr)) > { > /* Normally, backupEndPoint is invalid, and we rely on seeing an end-of-backup WAL record to mark the location. backupEndPoint is only set when restoring from a backup that was taken from a standby, but thanks to the above, recovery incorrectly treats that as end-of-backup. Fixed, thanks for the report! - Heikki
On 28.11.2012 15:26, Andres Freund wrote: > Hm. Are you sure its actually reading your backup file? Its hard to say > without DEBUG1 output but I would tentatively say its not reading it at > all because the the "redo starts at ..." message indicates its not using > the checkpoint location from the backup file. By backup file, you mean the backup history file? Since 9.0, recovery does not read the backup history file, it's for informational/debugging purposes only. All the information recovery needs is in the backup_label, and an end-of-backup WAL record marks the location where pg_stop_backup() was called, ie. how far the WAL must be replayed for the backup to be consistent. > Can you reproduce the issue? If so, can you give an exact guide? If not, > do you still have the datadir et al. from above? I just committed a fix for this, but if you can, it would still be nice if you could double-check that it now really works. - Heikki
On 2012-11-28 15:37:38 +0200, Heikki Linnakangas wrote: > On 28.11.2012 15:26, Andres Freund wrote: > >Hm. Are you sure its actually reading your backup file? Its hard to say > >without DEBUG1 output but I would tentatively say its not reading it at > >all because the the "redo starts at ..." message indicates its not using > >the checkpoint location from the backup file. > > By backup file, you mean the backup history file? Since 9.0, recovery does > not read the backup history file, it's for informational/debugging purposes > only. All the information recovery needs is in the backup_label, and an > end-of-backup WAL record marks the location where pg_stop_backup() was > called, ie. how far the WAL must be replayed for the backup to be > consistent. > I mean the label read by read_backup_label(). Jeff's mail indicated it had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48. >Can you reproduce the issue? If so, can you give an exact guide? If not, > >do you still have the datadir et al. from above? > I just committed a fix for this, but if you can, it would still be nice if > you could double-check that it now really works. Yuck. Too bad that that got in. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 28.11.2012 15:47, Andres Freund wrote: > I mean the label read by read_backup_label(). Jeff's mail indicated it > had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48. That's correct. The checkpoint was at 1/188D8120, but it's redo pointer was earlier, at 1/CD89E48, so that's where redo had to start. - Heikki
On 2012-11-28 16:34:55 +0200, Heikki Linnakangas wrote: > On 28.11.2012 15:47, Andres Freund wrote: > >I mean the label read by read_backup_label(). Jeff's mail indicated it > >had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48. > > That's correct. The checkpoint was at 1/188D8120, but it's redo pointer was > earlier, at 1/CD89E48, so that's where redo had to start. Heh. I I just compared 18 with CD and didn't notice the different length of both... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 28.11.2012 06:27, Noah Misch wrote: >> I observed a similar problem with 9.2. Despite a restore_command that failed >> every time, startup from a hot backup completed. At the time, I suspected a >> mistake on my part. > I believe this was caused by this little typo/thinko: Is this related at all to the problem discussed over at http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php ? The conclusion-so-far in that thread seems to be that an error ought to be thrown for recovery_target_time earlier than the backup stop time, but one is not getting thrown. regards, tom lane
On Wed, Nov 28, 2012 at 5:37 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 28.11.2012 15:26, Andres Freund wrote: >> > > >> Can you reproduce the issue? If so, can you give an exact guide? If not, >> do you still have the datadir et al. from above? Yes, it is reliable enough to be used for "git bisect" rm /tmp/archivedir/0000000* initdb ## edit postgresql.conf to set up archiving etc. and set checkpoint_segments to 60 pg_ctl -D /tmp/data -l /tmp/data/logfile_master start -w createdb pgbench -i -s 10 pgbench -T 36000000 & sleep 120 psql -c "SELECT pg_start_backup('label');" cp -rp /tmp/data/ /tmp/data_slave sleep 120 psql -c "SELECT pg_stop_backup();" rm /tmp/data_slave/pg_xlog/0* rm /tmp/data_slave/postmaster.* rm /tmp/data_slave/logfile_master cp src/backend/access/transam/recovery.conf.sample /tmp/data_slave/recovery.conf ## edit /tmp/data_slave/recovery.conf to set up restore command and stop point. cp -rpi /tmp/data_slave /tmp/data_slave2 pg_ctl -D /tmp/data_slave2/ start -o "--port=9876" At some point, kill the pgbench: pg_ctl -D /tmp/data stop -m fast I run the master with fsync off, otherwise to takes to long to accumulate archived log files. The checkpoint associated with pg_start_backup takes ~2.5 minutes, so pick a time that is 1.25 minutes before the time reported in the backup history or backup_label file for the PITR end time. I copy data_slave to data_slave2 so that I can try different things without having to restart the whole process from the beginning. > I just committed a fix for this, but if you can, it would still be nice if > you could double-check that it now really works. Thanks. In REL9_2_STABLE, it now correctly gives the "requested recovery stop point is before consistent recovery point" error. Also if the recovery is started with hot_standby=on and with no recovery_target_time, in patched REL9_2_STABLE the database becomes "ready to accept read only connections" at the appropriate time, once the end-of-backup WAL has been replayed. In 9.2.0 and 9.2.1, it instead opened for read only connections at the point that the end-of-checkpoint record (the checkpoint associated with the pg_start_backup) has replayed, which I think is too early. Cheers, Jeff
On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> On 28.11.2012 06:27, Noah Misch wrote: >>> I observed a similar problem with 9.2. Despite a restore_command that failed >>> every time, startup from a hot backup completed. At the time, I suspected a >>> mistake on my part. > >> I believe this was caused by this little typo/thinko: > > Is this related at all to the problem discussed over at > http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php > ? The conclusion-so-far in that thread seems to be that an error > ought to be thrown for recovery_target_time earlier than the > backup stop time, but one is not getting thrown. It is not directly related. That thread was about 9.1.6. In the newly fixed 9_2_STABLE, that problem still shows up the same as it does in 9.1.6. (In 9.2.1, recovery sometimes blows up before that particular problem could be detected, which is what lead me here in the first place--that is the extent of the relationship AFAIK) To see this one, follow the instructions in my previous email, but set recovery_target_time to a time just after the end of the pg_start_backup checkpoint, rather than just before it, and turn on hot_standby Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Is this related at all to the problem discussed over at >> http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php >> ? The conclusion-so-far in that thread seems to be that an error >> ought to be thrown for recovery_target_time earlier than the >> backup stop time, but one is not getting thrown. > It is not directly related. That thread was about 9.1.6. > In the newly fixed 9_2_STABLE, that problem still shows up the same as > it does in 9.1.6. > (In 9.2.1, recovery sometimes blows up before that particular problem > could be detected, which is what lead me here in the first place--that > is the extent of the relationship AFAIK) > To see this one, follow the instructions in my previous email, but set > recovery_target_time to a time just after the end of the > pg_start_backup checkpoint, rather than just before it, and turn on > hot_standby I tried to reproduce this as per your directions, and see no problem in HEAD. Recovery advances to the specified stop time, hot standby mode wakes up, and it pauses waiting for me to do pg_xlog_replay_resume(). But I can connect to the standby and do that. So I'm unsure what the problem is. regards, tom lane
On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Is this related at all to the problem discussed over at >>> http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php >>> ? The conclusion-so-far in that thread seems to be that an error >>> ought to be thrown for recovery_target_time earlier than the >>> backup stop time, but one is not getting thrown. > >> It is not directly related. That thread was about 9.1.6. > >> In the newly fixed 9_2_STABLE, that problem still shows up the same as >> it does in 9.1.6. > >> (In 9.2.1, recovery sometimes blows up before that particular problem >> could be detected, which is what lead me here in the first place--that >> is the extent of the relationship AFAIK) > >> To see this one, follow the instructions in my previous email, but set >> recovery_target_time to a time just after the end of the >> pg_start_backup checkpoint, rather than just before it, and turn on >> hot_standby > > I tried to reproduce this as per your directions, and see no problem in > HEAD. Recovery advances to the specified stop time, hot standby mode > wakes up, Hot standby should only wake up once recovery has proceeded beyond the pg_stop_backup() point. If the specified stop point is after pg_start_backup() finishes, but before pg_stop_backup(), then hot standby should not start up (and with the newest HEAD, in my hands, it does not). Are you sure you set the stop time to just after pg_start_backup, not to just after pg_stop_backup? Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Jeff Janes <jeff.janes@gmail.com> writes: >>> In the newly fixed 9_2_STABLE, that problem still shows up the same as >>> it does in 9.1.6. >> I tried to reproduce this as per your directions, and see no problem in >> HEAD. Recovery advances to the specified stop time, hot standby mode >> wakes up, > Hot standby should only wake up once recovery has proceeded beyond the > pg_stop_backup() point. > If the specified stop point is after pg_start_backup() finishes, but > before pg_stop_backup(), then hot standby should not start up (and > with the newest HEAD, in my hands, it does not). Are you sure you set > the stop time to just after pg_start_backup, not to just after > pg_stop_backup? I'm confused. Are you now saying that this problem only exists in 9.1.x? I tested current HEAD because you indicated the problem was still there. regards, tom lane
On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Jeff Janes <jeff.janes@gmail.com> writes: >>>> In the newly fixed 9_2_STABLE, that problem still shows up the same as >>>> it does in 9.1.6. > >>> I tried to reproduce this as per your directions, and see no problem in >>> HEAD. Recovery advances to the specified stop time, hot standby mode >>> wakes up, > >> Hot standby should only wake up once recovery has proceeded beyond the >> pg_stop_backup() point. > >> If the specified stop point is after pg_start_backup() finishes, but >> before pg_stop_backup(), then hot standby should not start up (and >> with the newest HEAD, in my hands, it does not). Are you sure you set >> the stop time to just after pg_start_backup, not to just after >> pg_stop_backup? > > I'm confused. Are you now saying that this problem only exists in > 9.1.x? I tested current HEAD because you indicated the problem was > still there. No, I'm saying the problem exists both in 9.1.x and in hypothetical 9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it is masked by that other problem which has just been fixed. I'll try it again in b1346822f3048ede254647f3a46 just to be sure, but I'm already fairly sure. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm confused. Are you now saying that this problem only exists in >> 9.1.x? I tested current HEAD because you indicated the problem was >> still there. > No, I'm saying the problem exists both in 9.1.x and in hypothetical > 9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it > is masked by that other problem which has just been fixed. I'm still confused. I've now tried this in both HEAD and 9.1 branch tip, and I do not see any misbehavior. If I set recovery_target_time to before the pg_stop_backup time, I get "FATAL: requested recovery stop point is before consistent recovery point" which is what I expect; and if I set it to after the pg_stop_backup time, it starts up as expected. So if there's a remaining unfixed bug here, I don't understand what that is. regards, tom lane
On Sun, Dec 2, 2012 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm confused. Are you now saying that this problem only exists in >>> 9.1.x? I tested current HEAD because you indicated the problem was >>> still there. > >> No, I'm saying the problem exists both in 9.1.x and in hypothetical >> 9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it >> is masked by that other problem which has just been fixed. > > I'm still confused. I've now tried this in both HEAD and 9.1 branch > tip, and I do not see any misbehavior. If I set recovery_target_time to > before the pg_stop_backup time, I get "FATAL: requested recovery stop > point is before consistent recovery point" which is what I expect; and > if I set it to after the pg_stop_backup time, it starts up as expected. > So if there's a remaining unfixed bug here, I don't understand what > that is. I've reproduced it again using the just-tagged 9.2.2, and uploaded a 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google drive. The data directory contains the recovery.conf which is set to end recovery between the two critical time points. https://docs.google.com/open?id=0Bzqrh1SO9FcES181YXRVdU5NSlk This is the command line I use to start recovery, and the resulting log output. https://docs.google.com/open?id=0Bzqrh1SO9FcEaTQ2QXhFdDZYaUE I can't connect to the standby to execute pg_xlog_replay_resume() because: FATAL: the database system is starting up Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > I've reproduced it again using the just-tagged 9.2.2, and uploaded a > 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google > drive. The data directory contains the recovery.conf which is set to > end recovery between the two critical time points. Hmmm ... I can reproduce this with current 9.2 branch tip. However, more or less by accident I first tried it with a 9.2-branch postmaster from a couple weeks ago, and it works as expected with that: the log output looks like LOG: restored log file "00000001000000000000001B" from archive LOG: restored log file "00000001000000000000001C" from archive LOG: restored log file "00000001000000000000001D" from archive LOG: database system is ready to accept read only connections LOG: recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08 LOG: recovery has paused HINT: Execute pg_xlog_replay_resume() to continue. and I can connect and do the pg_xlog_replay_resume() thing. Note the "ready to accept read only connections" line, which does not show up with branch tip. So apparently this is something we broke since Nov 18. Don't know what yet --- any thoughts? Also, I am still not seeing what the connection is to the original report against 9.1.6. regards, tom lane
I wrote: > So apparently this is something we broke since Nov 18. Don't know what > yet --- any thoughts? Further experimentation shows that reverting commit ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's something wrong/incomplete about that fix. This is a bit urgent since we now have to consider whether to withdraw 9.2.2 and issue a hasty 9.2.3. Do we have a regression here since 9.2.1, and if so how bad is it? regards, tom lane
On 2012-12-04 19:35:48 -0500, Tom Lane wrote: > I wrote: > > So apparently this is something we broke since Nov 18. Don't know what > > yet --- any thoughts? > > Further experimentation shows that reverting commit > ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's > something wrong/incomplete about that fix. ISTM that the code should check ControlFile->backupEndRequired, not just check for an invalid backupEndPoint. I haven't looked into the specific issue though. > This is a bit urgent since we now have to consider whether to withdraw > 9.2.2 and issue a hasty 9.2.3. Do we have a regression here since > 9.2.1, and if so how bad is it? Not sure. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Dec 4, 2012 at 4:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> I've reproduced it again using the just-tagged 9.2.2, and uploaded a >> 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google >> drive. The data directory contains the recovery.conf which is set to >> end recovery between the two critical time points. > > Hmmm ... I can reproduce this with current 9.2 branch tip. However, > more or less by accident I first tried it with a 9.2-branch postmaster > from a couple weeks ago, and it works as expected with that: the log > output looks like > > LOG: restored log file "00000001000000000000001B" from archive > LOG: restored log file "00000001000000000000001C" from archive > LOG: restored log file "00000001000000000000001D" from archive > LOG: database system is ready to accept read only connections > LOG: recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08 > LOG: recovery has paused > HINT: Execute pg_xlog_replay_resume() to continue. > > and I can connect and do the pg_xlog_replay_resume() thing. But the key is, the database was not actually consistent at that point, and so opening hot standby was a dangerous thing to do. The bug that allowed the database to open early (the original topic if this email chain) was masking this secondary issue. > So apparently this is something we broke since Nov 18. Don't know what > yet --- any thoughts? Also, I am still not seeing what the connection > is to the original report against 9.1.6. The behavior that we both see in 9.2.2, where it waits for a pg_xlog_replay_resume() that cannot be delivered because the database is not yet open, is the same thing I'm seeing in 9.1.6. I'll see if I can repeat it in 9.1.7 and post the tarball of the data directory. Cheers, Jeff
On 2012-12-04 19:20:44 -0500, Tom Lane wrote: > Jeff Janes <jeff.janes@gmail.com> writes: > > I've reproduced it again using the just-tagged 9.2.2, and uploaded a > > 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google > > drive. The data directory contains the recovery.conf which is set to > > end recovery between the two critical time points. > > Hmmm ... I can reproduce this with current 9.2 branch tip. However, > more or less by accident I first tried it with a 9.2-branch postmaster > from a couple weeks ago, and it works as expected with that: the log > output looks like > > LOG: restored log file "00000001000000000000001B" from archive > LOG: restored log file "00000001000000000000001C" from archive > LOG: restored log file "00000001000000000000001D" from archive > LOG: database system is ready to accept read only connections > LOG: recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08 > LOG: recovery has paused > HINT: Execute pg_xlog_replay_resume() to continue. > > and I can connect and do the pg_xlog_replay_resume() thing. > Note the "ready to accept read only connections" line, which > does not show up with branch tip. > > So apparently this is something we broke since Nov 18. Don't know what > yet --- any thoughts? Also, I am still not seeing what the connection > is to the original report against 9.1.6. Maybe I am blind, but, looking at the backup label: START WAL LOCATION: 0/110006A8 (file 000000010000000000000011) CHECKPOINT LOCATION: 0/1D776B50 BACKUP METHOD: pg_start_backup BACKUP FROM: master START TIME: 2012-12-02 15:08:52 PST LABEL: label This was a straight archive based hot backup with manual pg_start/stop_backup. Which means we ought to find a XLOG_BACKUP_END record. Before that we shouldn't be consistent. Satoshi's version of xlogdump outputs this (besides heaps of errors ...): ~/bin/xlogdump-9.2 ~/tmp/recover/tmp/archivedir/0000000100000000000000* 2>&1|grep "backup end" [cur:0/22C361E0, xid:0, rmid:0(XLOG), len:8/40, prev:0/22C361B0] backup end: started at 0/110006A8. But again, according to xlogdump: [cur:0/1D861CD8, xid:305610, rmid:1(Transaction), len:12/44, prev:0/1D861C80] compact commit at 2012-12-03 00:08:54 CET So the target xid far before the backend end, so it seems absolutely correct not to permit access yet. So maybe I am (again) missing something here, but everything looks fine. Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Dec 4, 2012 at 4:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> So apparently this is something we broke since Nov 18. Don't know what >> yet --- any thoughts? > > Further experimentation shows that reverting commit > ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's > something wrong/incomplete about that fix. I can't independently vouch for the correctness of that fix, but I can vouch that there is so far no evidence that it is incorrect. It is re-revealing an undesirable (but safe, as far as we know) behavior that is present in 9.1.x but which was temporarily hidden by a corruption-risk bug in 9.2.0 and 9.2.1. > > This is a bit urgent since we now have to consider whether to withdraw > 9.2.2 and issue a hasty 9.2.3. Do we have a regression here since > 9.2.1, and if so how bad is it? I don't think this is urgent. The error-message issue in 9.1.6 and 9.2.2 is merely annoying, while the early-opening one in 9.2.0 and 9.2.1 seems fundamentally unsafe. Cheers, Jeff
On 2012-12-04 18:05:15 -0800, Jeff Janes wrote: > On Tue, Dec 4, 2012 at 4:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Jeff Janes <jeff.janes@gmail.com> writes: > >> I've reproduced it again using the just-tagged 9.2.2, and uploaded a > >> 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google > >> drive. The data directory contains the recovery.conf which is set to > >> end recovery between the two critical time points. > > > > Hmmm ... I can reproduce this with current 9.2 branch tip. However, > > more or less by accident I first tried it with a 9.2-branch postmaster > > from a couple weeks ago, and it works as expected with that: the log > > output looks like > > > > LOG: restored log file "00000001000000000000001B" from archive > > LOG: restored log file "00000001000000000000001C" from archive > > LOG: restored log file "00000001000000000000001D" from archive > > LOG: database system is ready to accept read only connections > > LOG: recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08 > > LOG: recovery has paused > > HINT: Execute pg_xlog_replay_resume() to continue. > > > > and I can connect and do the pg_xlog_replay_resume() thing. > > But the key is, the database was not actually consistent at that > point, and so opening hot standby was a dangerous thing to do. > > The bug that allowed the database to open early (the original topic if > this email chain) was masking this secondary issue. > > > So apparently this is something we broke since Nov 18. Don't know what > > yet --- any thoughts? Also, I am still not seeing what the connection > > is to the original report against 9.1.6. > > The behavior that we both see in 9.2.2, where it waits for a > pg_xlog_replay_resume() that cannot be delivered because the database > is not yet open, is the same thing I'm seeing in 9.1.6. I'll see if I > can repeat it in 9.1.7 and post the tarball of the data directory. Could you check whether the attached patch fixes the behaviour? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Andres Freund <andres@2ndquadrant.com> writes: >> But the key is, the database was not actually consistent at that >> point, and so opening hot standby was a dangerous thing to do. >> >> The bug that allowed the database to open early (the original topic if >> this email chain) was masking this secondary issue. > Could you check whether the attached patch fixes the behaviour? Yeah, I had just come to the same conclusion: the bug is not with Heikki's fix, but with the pause logic. The comment says that it shouldn't pause unless users can connect to un-pause, but the actual implementation of that test is several bricks shy of a load. Your patch is better, but it's still missing a bet: what we need to be sure of is not merely that we *could* have told the postmaster to start hot standby, but that we *actually have done so*. Otherwise, it's flow-of-control-dependent whether it works or not; we could fail if the main loop hasn't gotten to CheckRecoveryConsistency since the conditions became true. Looking at the code in CheckRecoveryConsistency, testing LocalHotStandbyActive seems appropriate. I also thought it was pretty dangerous that this absolutely critical test was not placed in recoveryPausesHere() itself, rather than relying on the call sites to remember to do it. So the upshot is that I propose a patch more like the attached. This is not a regression because the pause logic is broken this same way since 9.1. So I no longer think that we need a rewrap. regards, tom lane diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 50e2b22..593dcee 100644 *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** recoveryStopsHere(XLogRecord *record, bo *** 5923,5928 **** --- 5923,5932 ---- static void recoveryPausesHere(void) { + /* Don't pause unless users can connect! */ + if (!LocalHotStandbyActive) + return; + ereport(LOG, (errmsg("recovery has paused"), errhint("Execute pg_xlog_replay_resume() to continue."))); *************** StartupXLOG(void) *** 6697,6707 **** */ if (recoveryStopsHere(record, &recoveryApply)) { ! /* ! * Pause only if users can connect to send a resume ! * message ! */ ! if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY) { SetRecoveryPause(true); recoveryPausesHere(); --- 6701,6707 ---- */ if (recoveryStopsHere(record, &recoveryApply)) { ! if (recoveryPauseAtTarget) { SetRecoveryPause(true); recoveryPausesHere(); *************** StartupXLOG(void) *** 6737,6752 **** /* * Update shared replayEndRecPtr before replaying this record, * so that XLogFlush will update minRecoveryPoint correctly. */ SpinLockAcquire(&xlogctl->info_lck); xlogctl->replayEndRecPtr = EndRecPtr; recoveryPause = xlogctl->recoveryPause; SpinLockRelease(&xlogctl->info_lck); ! /* ! * Pause only if users can connect to send a resume message ! */ ! if (recoveryPause && standbyState == STANDBY_SNAPSHOT_READY) recoveryPausesHere(); /* --- 6737,6751 ---- /* * Update shared replayEndRecPtr before replaying this record, * so that XLogFlush will update minRecoveryPoint correctly. + * + * While we have the lock, also check for a pause request. */ SpinLockAcquire(&xlogctl->info_lck); xlogctl->replayEndRecPtr = EndRecPtr; recoveryPause = xlogctl->recoveryPause; SpinLockRelease(&xlogctl->info_lck); ! if (recoveryPause) recoveryPausesHere(); /*
On 2012-12-04 21:27:34 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > >> But the key is, the database was not actually consistent at that > >> point, and so opening hot standby was a dangerous thing to do. > >> > >> The bug that allowed the database to open early (the original topic if > >> this email chain) was masking this secondary issue. > > > Could you check whether the attached patch fixes the behaviour? > > Yeah, I had just come to the same conclusion: the bug is not with > Heikki's fix, but with the pause logic. The comment says that it > shouldn't pause unless users can connect to un-pause, but the actual > implementation of that test is several bricks shy of a load. > > Your patch is better, but it's still missing a bet: what we need to be > sure of is not merely that we *could* have told the postmaster to start > hot standby, but that we *actually have done so*. Otherwise, it's > flow-of-control-dependent whether it works or not; we could fail if the > main loop hasn't gotten to CheckRecoveryConsistency since the conditions > became true. Looking at the code in CheckRecoveryConsistency, testing > LocalHotStandbyActive seems appropriate. Good point. My patch wasn't intended as something final, I just wanted confirmation that it achieves what Jeff wants because I at least - you possibly as well? - misunderstood him earlier. Would probably have missed the interaction above anyway ;) > I also thought it was pretty dangerous that this absolutely critical > test was not placed in recoveryPausesHere() itself, rather than relying > on the call sites to remember to do it. Absolutely aggreed. > So the upshot is that I propose a patch more like the attached. Without having run anything so far it looks good to me. Ok, night now, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2012-12-04 21:27:34 -0500, Tom Lane wrote: >> So the upshot is that I propose a patch more like the attached. > Without having run anything so far it looks good to me. BTW, while on the theme of the pause feature being several bricks shy of a load, it looks to me like the place that it was added to the replay loop was less than sane as well. Presumably the purpose of a pause is to let you stop application of the WAL at exactly the current spot; but you can *not* do that midway through application of the record, and where it is is effectively that. As soon as we've updated xlogctl->replayEndRecPtr, we're committed to replay the record, because we can't guarantee that the controlfile minRecoveryPoint doesn't get pushed up to that point by buffer flush activity. So an abort here could leave the database in an unrestartable condition. I guess the idea of putting it there was to save one spinlock acquire, but I'm having a bit of a hard time believing that one spinlock acquire per WAL record means much. Still we could possibly preserve that attribute by moving the pause down to just after the update of xlogctl->recoveryLastRecPtr. Thoughts? regards, tom lane
On 5 December 2012 00:35, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> So apparently this is something we broke since Nov 18. Don't know what >> yet --- any thoughts? > > Further experimentation shows that reverting commit > ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's > something wrong/incomplete about that fix. > > This is a bit urgent since we now have to consider whether to withdraw > 9.2.2 and issue a hasty 9.2.3. Do we have a regression here since > 9.2.1, and if so how bad is it? I'll look at this now. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
So what status are we on? Are we going to release 9.2.2 as it is? Or withdraw current 9.2.2? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > Andres Freund <andres@2ndquadrant.com> writes: >> On 2012-12-04 21:27:34 -0500, Tom Lane wrote: >>> So the upshot is that I propose a patch more like the attached. > >> Without having run anything so far it looks good to me. > > BTW, while on the theme of the pause feature being several bricks shy of > a load, it looks to me like the place that it was added to the replay > loop was less than sane as well. Presumably the purpose of a pause is > to let you stop application of the WAL at exactly the current spot; > but you can *not* do that midway through application of the record, > and where it is is effectively that. As soon as we've updated > xlogctl->replayEndRecPtr, we're committed to replay the record, > because we can't guarantee that the controlfile minRecoveryPoint > doesn't get pushed up to that point by buffer flush activity. > So an abort here could leave the database in an unrestartable condition. > > I guess the idea of putting it there was to save one spinlock acquire, > but I'm having a bit of a hard time believing that one spinlock acquire > per WAL record means much. Still we could possibly preserve that > attribute by moving the pause down to just after the update of > xlogctl->recoveryLastRecPtr. > > Thoughts? > > regards, tom lane > > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs
On 2012-12-05 19:06:55 +0900, Tatsuo Ishii wrote: > So what status are we on? Are we going to release 9.2.2 as it is? > Or withdraw current 9.2.2? Releasing as-is sounds good. As Tom wrote upthread: On 2012-12-04 21:27:34 -0500, Tom Lane wrote: > This is not a regression because the pause logic is broken this same > way since 9.1. So I no longer think that we need a rewrap. Imo the bug isn't all that critical, it cause confusion but no data corruption or such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 5 December 2012 02:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: >>> But the key is, the database was not actually consistent at that >>> point, and so opening hot standby was a dangerous thing to do. >>> >>> The bug that allowed the database to open early (the original topic if >>> this email chain) was masking this secondary issue. > >> Could you check whether the attached patch fixes the behaviour? > > Yeah, I had just come to the same conclusion: the bug is not with > Heikki's fix, but with the pause logic. The comment says that it > shouldn't pause unless users can connect to un-pause, but the actual > implementation of that test is several bricks shy of a load. OK, I've had a look at this now. Andres correctly identified the bug above, which was that the backup end is noted by the XLOG_BACKUP_END record. recoveryStopsHere() was not changed when that record type was added, so it ignores the situation that we are waiting for end of backup and yet it stop anyway. So I concur with Jeff that there is a bug and think that Andres has provided the clue to a fix. I'll patch that now. Aboriginal bug extends back to 9.0. Pause has got nothing at all to do with this, even if there are other problems there. > Your patch is better, but it's still missing a bet: what we need to be > sure of is not merely that we *could* have told the postmaster to start > hot standby, but that we *actually have done so*. Otherwise, it's > flow-of-control-dependent whether it works or not; we could fail if the > main loop hasn't gotten to CheckRecoveryConsistency since the conditions > became true. Looking at the code in CheckRecoveryConsistency, testing > LocalHotStandbyActive seems appropriate. > > I also thought it was pretty dangerous that this absolutely critical > test was not placed in recoveryPausesHere() itself, rather than relying > on the call sites to remember to do it. > > So the upshot is that I propose a patch more like the attached. I've reworked pause logic along the lines you suggest. Attached here for further discussion. > This is not a regression because the pause logic is broken this same > way since 9.1. So I no longer think that we need a rewrap. I think we do need a rewrap, since the bug is not in pause logic. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Andres Freund <andres@2ndquadrant.com> writes: > On 2012-12-05 19:06:55 +0900, Tatsuo Ishii wrote: >> So what status are we on? Are we going to release 9.2.2 as it is? >> Or withdraw current 9.2.2? > Releasing as-is sounds good. As Tom wrote upthread: > On 2012-12-04 21:27:34 -0500, Tom Lane wrote: >> This is not a regression because the pause logic is broken this same >> way since 9.1. So I no longer think that we need a rewrap. > Imo the bug isn't all that critical, it cause confusion but no data corruption > or such. The real reason it's not critical is that this is a "don't do that" case anyway. Unpatched, the system hangs up, but if it were patched you'd just get a failure complaining "requested stop before consistent state reached" or whatever the wording is. Since we've seen only one report about that since 9.1 came out, the error isn't being made often enough to justify a panic re-release. We'll just fix it and move on. regards, tom lane
On 2012-12-05 13:34:05 +0000, Simon Riggs wrote: > On 5 December 2012 02:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > >>> But the key is, the database was not actually consistent at that > >>> point, and so opening hot standby was a dangerous thing to do. > >>> > >>> The bug that allowed the database to open early (the original topic if > >>> this email chain) was masking this secondary issue. > > > >> Could you check whether the attached patch fixes the behaviour? > > > > Yeah, I had just come to the same conclusion: the bug is not with > > Heikki's fix, but with the pause logic. The comment says that it > > shouldn't pause unless users can connect to un-pause, but the actual > > implementation of that test is several bricks shy of a load. > > OK, I've had a look at this now. > > Andres correctly identified the bug above, which was that the backup > end is noted by the XLOG_BACKUP_END record. recoveryStopsHere() was > not changed when that record type was added, so it ignores the > situation that we are waiting for end of backup and yet it stop > anyway. So I concur with Jeff that there is a bug and think that > Andres has provided the clue to a fix. I'll patch that now. Aboriginal > bug extends back to 9.0. I think we may have multiple issues here... RecoveryStopsHere doesn't need to check for backupEndRequired itself - it's handled in StartupXLOG itself, just below "Complain if we did not roll forward far enough" where it actually already checks for backupEndRequired. > Pause has got nothing at all to do with this, even if there are other > problems there. I think the problem of not erroring out properly as described in CAMkU=1wo9+TvTy-w9UkSEgwc49kXuu=V=8cEdyaH8CBFyUVeww@mail.gmail.com and shown in https://docs.google.com/file/d/0Bzqrh1SO9FcEaTQ2QXhFdDZYaUE/edit?pli=1 can be attributed to the premature recoveryPausesHere() in the if() below recoveryStopsHere() in StartupXLOG(). > > Your patch is better, but it's still missing a bet: what we need to be > > sure of is not merely that we *could* have told the postmaster to start > > hot standby, but that we *actually have done so*. Otherwise, it's > > flow-of-control-dependent whether it works or not; we could fail if the > > main loop hasn't gotten to CheckRecoveryConsistency since the conditions > > became true. Looking at the code in CheckRecoveryConsistency, testing > > LocalHotStandbyActive seems appropriate. > > > > I also thought it was pretty dangerous that this absolutely critical > > test was not placed in recoveryPausesHere() itself, rather than relying > > on the call sites to remember to do it. > > > > So the upshot is that I propose a patch more like the attached. > > I've reworked pause logic along the lines you suggest. Attached here > for further discussion. > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -5008,6 +5008,12 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis) > static void > recoveryPausesHere(void) > { > + /* > + * Pause only if users can connect to send a resume message > + */ > + if (!LocalHotStandbyActive) > + return; > + > ereport(LOG, > (errmsg("recovery has paused"), > errhint("Execute pg_xlog_replay_resume() to continue."))); > @@ -5783,10 +5789,11 @@ StartupXLOG(void) > if (recoveryStopsHere(record, &recoveryApply)) > { > /* > - * Pause only if users can connect to send a resume > - * message > + * We've reached stop point, but not yet applied last > + * record. Pause BEFORE final apply, if requested, but > + * only if users can connect to send a resume message > */ > - if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY) > + if (recoveryPauseAtTarget && !recoveryApply) > { > SetRecoveryPause(true); > recoveryPausesHere(); I personally would find it easier to read if we did a SetRecoveryPause(true) regardless of !recoveryApply and just made the recoveryPausesHere() conditional. > @@ -5820,28 +5827,26 @@ StartupXLOG(void) > } > > /* > + * If we are attempting to enter Hot Standby mode, check > + * for pauses and process incoming transactionids. > + */ > + if (standbyState >= STANDBY_INITIALIZED) > + { > + if (recoveryPause) > + recoveryPausesHere(); > + > + if (TransactionIdIsValid(record->xl_xid)) > + RecordKnownAssignedTransactionIds(record->xl_xid); > + } > + > + /* It feels wrong to do a RecordKnownAssignedTransactionIds before setting the replayEndRecPtr. Don't think its really hurtful, but... > * Update shared replayEndRecPtr before replaying this record, > * so that XLogFlush will update minRecoveryPoint correctly. > */ > SpinLockAcquire(&xlogctl->info_lck); > xlogctl->replayEndRecPtr = EndRecPtr; > - recoveryPause = xlogctl->recoveryPause; > SpinLockRelease(&xlogctl->info_lck); > > - /* > - * Pause only if users can connect to send a resume message > - */ > - if (recoveryPause && standbyState == STANDBY_SNAPSHOT_READY) > - recoveryPausesHere(); > - > - /* > - * If we are attempting to enter Hot Standby mode, process > - * XIDs we see > - */ > - if (standbyState >= STANDBY_INITIALIZED && > - TransactionIdIsValid(record->xl_xid)) > - RecordKnownAssignedTransactionIds(record->xl_xid); > - > /* Now apply the WAL record itself */ > RmgrTable[record->xl_rmid].rm_redo(EndRecPtr, record); > > @@ -5875,6 +5880,7 @@ StartupXLOG(void) > */ > SpinLockAcquire(&xlogctl->info_lck); > xlogctl->recoveryLastRecPtr = EndRecPtr; > + recoveryPause = xlogctl->recoveryPause; > SpinLockRelease(&xlogctl->info_lck); > > LastRec = ReadRecPtr; I think it would be better to set recoveryPause before the if (recoveryPause) above, otherwise we may need to wait for new records to stream in. That will only change when we see the log message about waiting, but still... > @@ -5883,6 +5889,17 @@ StartupXLOG(void) > } while (record != NULL && recoveryContinue); > > /* > + * We've reached stop point, but not yet applied last > + * record. Pause AFTER final apply, if requested, but > + * only if users can connect to send a resume message > + */ > + if (reachedStopPoint && recoveryPauseAtTarget && recoveryApply) > + { > + SetRecoveryPause(true); > + recoveryPausesHere(); > + } > + > + /* I find the above comment a bit misleading because by now we have in fact applied the last record... What about: When we had reached the stop point we hadn't yet applied the last record but pause *after stop* was requested, so pause now. Independent of this patch, I am slightly confused about the whole stop logic. Isn't the idea that you can stop/start/stop/start/... recovery? Because if !recoveryApply we break out of the whole recovery loop and are done with things. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 5 December 2012 13:34, Simon Riggs <simon@2ndquadrant.com> wrote: > Aboriginal bug extends back to 9.0. I don't see any bug in 9.0 and 9.1, just 9.2+ -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2012-12-05 14:33:36 +0000, Simon Riggs wrote: > On 5 December 2012 13:34, Simon Riggs <simon@2ndquadrant.com> wrote: > > > Aboriginal bug extends back to 9.0. > > I don't see any bug in 9.0 and 9.1, just 9.2+ Well the pausing logic is clearly broken in 9.1 as well, isn't it? I.e. you will get: LOG: recovery has paused HINT: Execute pg_xlog_replay_resume() to continue. Instead of FATAL: requested recovery stop point is before consistent recovery point Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 5 December 2012 14:33, Andres Freund <andres@2ndquadrant.com> wrote: > Independent of this patch, I am slightly confused about the whole stop > logic. Isn't the idea that you can stop/start/stop/start/... recovery? > Because if !recoveryApply we break out of the whole recovery loop and > are done with things. You can, but by shutting down server, updating recovery target, then restarting server. That's not a beautiful design, its just waiting for someone to make recovery targets changeable, which is best done when recovery.conf parameters are in postgresql.conf -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2012-12-05 13:34:05 +0000, Simon Riggs wrote: >> @@ -5883,6 +5889,17 @@ StartupXLOG(void) >> } while (record != NULL && recoveryContinue); >> >> /* >> + * We've reached stop point, but not yet applied last >> + * record. Pause AFTER final apply, if requested, but >> + * only if users can connect to send a resume message >> + */ >> + if (reachedStopPoint && recoveryPauseAtTarget && recoveryApply) >> + { >> + SetRecoveryPause(true); >> + recoveryPausesHere(); >> + } >> + >> + /* > I find the above comment a bit misleading because by now we have in fact > applied the last record... I'd go further than that: a pause after we've exited the loop is completely stupid. The only reason for pausing is to let the user choose whether to continue applying WAL or not. If you've already made that choice, you might as well let the system come up fully. But I can't make any sense of the rest of this patch, because it seems to be randomly rearranging a whole lot of stuff that's unrelated to pausing. If you think all these changes are in fact necessary, could you break it down a little more? regards, tom lane
On 2012-12-05 11:11:23 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2012-12-05 13:34:05 +0000, Simon Riggs wrote: > >> @@ -5883,6 +5889,17 @@ StartupXLOG(void) > >> } while (record != NULL && recoveryContinue); > >> > >> /* > >> + * We've reached stop point, but not yet applied last > >> + * record. Pause AFTER final apply, if requested, but > >> + * only if users can connect to send a resume message > >> + */ > >> + if (reachedStopPoint && recoveryPauseAtTarget && recoveryApply) > >> + { > >> + SetRecoveryPause(true); > >> + recoveryPausesHere(); > >> + } > >> + > >> + /* > > > I find the above comment a bit misleading because by now we have in fact > > applied the last record... > > I'd go further than that: a pause after we've exited the loop is > completely stupid. The only reason for pausing is to let the user > choose whether to continue applying WAL or not. If you've already > made that choice, you might as well let the system come up fully. Yes, that puzzles me as well. Basically the whole logical arround recoveryApply seems to be broken currently. Because if recoveryApply=false we currently don't pause at all because we jump out of the apply loop with the break. I guess thats what Simon tried to address with the patch. But the new behaviour seems to be strane as well a > But I can't make any sense of the rest of this patch, because it seems > to be randomly rearranging a whole lot of stuff that's unrelated to > pausing. If you think all these changes are in fact necessary, could > you break it down a little more? I think part !recoveryApply changes makes sense, but it needs more... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > Basically the whole logical arround recoveryApply seems to be broken > currently. Because if recoveryApply=false we currently don't pause at > all because we jump out of the apply loop with the break. Huh? That break is after the pause: /* * Have we reached our recovery target? */ if (recoveryStopsHere(record, &recoveryApply)) { /* * Pause only if users can connect to send a resume * message */ if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY) { SetRecoveryPause(true); recoveryPausesHere(); } reachedStopPoint = true; /* see below */ recoveryContinue = false; if (!recoveryApply) break; } The point of recoveryApply is that the stop can be defined as occurring either before or after the current WAL record. However, I do see your point, which is that if the stop is defined to be after the current WAL record then we probably should apply same before pausing. Otherwise the pause is a bit useless since the user can't see the state he's being asked to approve. The real question here probably needs to be "what is the point of recoveryPauseAtTarget in the first place?". I find it hard to envision what's the point of pausing unless the user has an opportunity to make a decision about whether to continue applying WAL. As Simon mentioned, we seem to be lacking some infrastructure that would let the user adjust the recovery_target parameters before resuming WAL processing. But, assuming for the moment that our workaround for that is "shutdown the server, adjust recovery.conf, and restart", is the pause placed in a useful spot for that? BTW, could we make this more convenient by letting recoveryPausesHere() reread recovery.conf? Also, shouldn't the code re-evaluate recoveryStopsHere() after that? regards, tom lane
On 2012-12-05 11:40:16 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > Basically the whole logical arround recoveryApply seems to be broken > > currently. Because if recoveryApply=false we currently don't pause at > > all because we jump out of the apply loop with the break. > > Huh? That break is after the pause: > > /* > * Have we reached our recovery target? > */ > if (recoveryStopsHere(record, &recoveryApply)) > { > /* > * Pause only if users can connect to send a resume > * message > */ > if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY) > { > SetRecoveryPause(true); > recoveryPausesHere(); > } > reachedStopPoint = true; /* see below */ > recoveryContinue = false; > if (!recoveryApply) > break; > } Oh, yea. I mixed what I read in the patch and what exists today in my mind, sorry for that. But it still seems rather strange that we break out of the loop depending on recovery_target_inclusive which is what recoveryApply basically is set to. We do: ereport(LOG, (errmsg("recovery has paused"), errhint("Execute pg_xlog_replay_resume() to continue."))); if we reach the target, so jumping out of the loop seems really counter intuitive to me. > The point of recoveryApply is that the stop can be defined as occurring > either before or after the current WAL record. However, I do see your > point, which is that if the stop is defined to be after the current WAL > record then we probably should apply same before pausing. Otherwise > the pause is a bit useless since the user can't see the state he's being > asked to approve. Yes. Seems easy enough if we do the SetRecoveryPause() up there and just fall through to the pause later in the loop. > The real question here probably needs to be "what is the point of > recoveryPauseAtTarget in the first place?". I find it hard to envision > what's the point of pausing unless the user has an opportunity to > make a decision about whether to continue applying WAL. You can do SELECT pg_xlog_replay_pause|resume(); but that obviously sucks without further support... > As Simon > mentioned, we seem to be lacking some infrastructure that would let > the user adjust the recovery_target parameters before resuming WAL > processing. But, assuming for the moment that our workaround for > that is "shutdown the server, adjust recovery.conf, and restart", > is the pause placed in a useful spot for that? I actually think Simon's proposed pause location makes more sense if thats the only mode we support, but if so it probably should be the one for inclusive/non-inclusive replay. > BTW, could we make this more convenient by letting recoveryPausesHere() > reread recovery.conf? Also, shouldn't the code re-evaluate > recoveryStopsHere() after that? At least it seems like something we could do without introducing more functions (i.e. possibly back-branch fit). Given that pauses are controlled via sql functions setting the target also via an sql functions seems kinda fitting otherwise. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2012-12-05 18:08:01 +0100, Andres Freund wrote: > On 2012-12-05 11:40:16 -0500, Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > Basically the whole logical arround recoveryApply seems to be broken > > > currently. Because if recoveryApply=false we currently don't pause at > > > all because we jump out of the apply loop with the break. > > > > Huh? That break is after the pause: > > > > /* > > * Have we reached our recovery target? > > */ > > if (recoveryStopsHere(record, &recoveryApply)) > > { > > /* > > * Pause only if users can connect to send a resume > > * message > > */ > > if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY) > > { > > SetRecoveryPause(true); > > recoveryPausesHere(); > > } > > reachedStopPoint = true; /* see below */ > > recoveryContinue = false; > > if (!recoveryApply) > > break; > > } > > Oh, yea. I mixed what I read in the patch and what exists today in my > mind, sorry for that. > > But it still seems rather strange that we break out of the loop > depending on recovery_target_inclusive which is what recoveryApply > basically is set to. That part is obviously bullshit as we also set recoveryContinue = false which means we will jump out of the loop later. Only after waiting for another record though ... Man, that file is confusing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 5 December 2012 16:40, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The real question here probably needs to be "what is the point of > recoveryPauseAtTarget in the first place?". I find it hard to envision > what's the point of pausing unless the user has an opportunity to > make a decision about whether to continue applying WAL. As Simon > mentioned, we seem to be lacking some infrastructure that would let > the user adjust the recovery_target parameters before resuming WAL > processing. But, assuming for the moment that our workaround for > that is "shutdown the server, adjust recovery.conf, and restart", > is the pause placed in a useful spot for that? > BTW, could we make this more convenient by letting recoveryPausesHere() > reread recovery.conf? Also, shouldn't the code re-evaluate > recoveryStopsHere() after that? The recovery target and the consistency point are in some ways in conflict. If the recovery target is before the consistency point there is no point in stopping there, whether or not we pause. What we should do is say "recovery target reached, yet recovery not yet consistent, continuing". So ISTM that we should make recoveryStopsHere() return false while we are inconsistent. Problems solved. We can re-read parameters after we wake from a pause. Presumably only in HEAD, or do you mean in 9.2/9,1 also? My earlier patch to rearrange pause logic did a few things, nothing very dramatic * Put the logic for whether we're consistent inside recoveryPausesHere() as requested * rearrange logic so setting EndPtr is the very last thing we do before redo for the current record * Fixes the bug that we don't pause in the correct place if recoveryApply = true The comment on that new pause location was slightly wrong -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 5 December 2012 17:17, Simon Riggs <simon@2ndquadrant.com> wrote: > The recovery target and the consistency point are in some ways in > conflict. If the recovery target is before the consistency point there > is no point in stopping there, whether or not we pause. What we should > do is say "recovery target reached, yet recovery not yet consistent, > continuing". > So ISTM that we should make recoveryStopsHere() return false while we > are inconsistent. Problems solved. Patch -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2012-12-05 17:24:42 +0000, Simon Riggs wrote: > On 5 December 2012 17:17, Simon Riggs <simon@2ndquadrant.com> wrote: > > > The recovery target and the consistency point are in some ways in > > conflict. If the recovery target is before the consistency point there > > is no point in stopping there, whether or not we pause. What we should > > do is say "recovery target reached, yet recovery not yet consistent, > > continuing". > > So ISTM that we should make recoveryStopsHere() return false while we > > are inconsistent. Problems solved. I prefer the previous (fixed) behaviour where we error out if we reach a recovery target before we are consistent: /* * Complain if we did not roll forward far enough to render the backup * dump consistent. Note: it is indeed okay to look at the local variable * minRecoveryPoint here, even though ControlFile->minRecoveryPoint might * be further ahead --- ControlFile->minRecoveryPoint cannot have been * advanced beyond the WAL we processed. */ if (InRecovery && (XLByteLT(EndOfLog, minRecoveryPoint) || !XLogRecPtrIsInvalid(ControlFile->backupStartPoint))) { if (reachedStopPoint) { /* stopped because of stop request */ ereport(FATAL, (errmsg("requested recovery stop point is before consistent recovery point"))); } /* * Ran off end of WAL before reaching end-of-backup WAL record, or * minRecoveryPoint. That's usually a bad sign, indicating that you * tried to recover from an online backup but never called * pg_stop_backup(), or you didn't archive all the WAL up to that * point. However, this also happens in crash recovery, if the system * crashes while an online backup is in progress. We must not treat * that as an error, or the database will refuse to start up. */ if (InArchiveRecovery || ControlFile->backupEndRequired) { if (ControlFile->backupEndRequired) ereport(FATAL, (errmsg("WAL ends before end of online backup"), errhint("All WAL generated while online backup was taken must be available at recovery."))); else if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) ereport(FATAL, (errmsg("WAL ends before end of online backup"), errhint("Online backup started with pg_start_backup() must be ended with pg_stop_backup(), and allWAL up to that point must be available at recovery."))); else ereport(FATAL, (errmsg("WAL ends before consistent recovery point"))); } } Seems to be good enough to me. Once the pause logic is fixed obviously. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2012-12-05 17:24:42 +0000, Simon Riggs wrote: >> So ISTM that we should make recoveryStopsHere() return false while we >> are inconsistent. Problems solved. > I prefer the previous (fixed) behaviour where we error out if we reach a > recovery target before we are consistent: I agree. Silently ignoring the user's specification is not good. (I'm not totally sure about ignoring the pause spec, either, but there is no good reason to change the established behavior for the recovery target spec.) regards, tom lane
I wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> On 2012-12-05 17:24:42 +0000, Simon Riggs wrote: >>> So ISTM that we should make recoveryStopsHere() return false while we >>> are inconsistent. Problems solved. >> I prefer the previous (fixed) behaviour where we error out if we reach a >> recovery target before we are consistent: > I agree. Silently ignoring the user's specification is not good. > (I'm not totally sure about ignoring the pause spec, either, but > there is no good reason to change the established behavior for > the recovery target spec.) On further thought, it seems like recovery_pause_at_target is rather misdesigned anyway, and taking recovery target parameters from recovery.conf is an obsolete API that was designed in a world before hot standby. What I suggest people really want, if they're trying to interactively determine how far to roll forward, is this: (1) A recovery.conf parameter that specifies "pause when hot standby opens up" (that is, as soon as we have consistency). (2) A SQL command/function that releases the pause mode *and* specifies a new target stop point (ie, an interactive way of setting the recovery target parameters). The startup process then rolls forward to that target and pauses again. (3) A SQL command/function that releases the pause mode and specifies coming up normally, ie not following the archived WAL any further (I imagine this would force a timeline switch). The existing "pause now" function could still fit into this framework; but it seems to me to have mighty limited usefulness, considering the speed of WAL replay versus human reaction time. regards, tom lane
On Wed, Dec 5, 2012 at 8:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The real question here probably needs to be "what is the point of > recoveryPauseAtTarget in the first place?". I find it hard to envision > what's the point of pausing unless the user has an opportunity to > make a decision about whether to continue applying WAL. Right now if I'm doing a PITR and want to look around before blessing the restore, I have to: 0) restore from the base backup. 1) change pg_hba to lock out everyone else 2) do the restore to my best guess of what was just before the tragic DML. 3) Look around and see if I guessed correctly. 4a) If I replayed to far, blow everything away and start over. 4b) If I replayed not far enough, and opened the database normally upon completion so I could look around, then blow everything away and start over. 4c) If I replayed not far enough and opened the database in hot standby, I don't have to blow everything away, I just have to shut down the server, change the restore point forward, and restart it. 5) change pg_hba back to normal and restart the server. It would be nice if I could execute 4c without a shutdown/restart, but certainly a shutdown/restart is better than blowing everything away and starting over, like you have to do once the restore picks a new time line. So while the current pause behavior is incomplete, it is still useful. I would also be nice if only the superuser is allowed to connect to the hot standby when pause_at_recovery_target=true, until after pg_xlog_replay_resume() is called. That way I could skip steps 1 and 5--steps which are very easy to forget to do. Cheers, Jeff
On 2012-12-05 13:48:53 -0500, Tom Lane wrote: > I wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > >> On 2012-12-05 17:24:42 +0000, Simon Riggs wrote: > >>> So ISTM that we should make recoveryStopsHere() return false while we > >>> are inconsistent. Problems solved. > > >> I prefer the previous (fixed) behaviour where we error out if we reach a > >> recovery target before we are consistent: > > > I agree. Silently ignoring the user's specification is not good. > > (I'm not totally sure about ignoring the pause spec, either, but > > there is no good reason to change the established behavior for > > the recovery target spec.) > > On further thought, it seems like recovery_pause_at_target is rather > misdesigned anyway, and taking recovery target parameters from > recovery.conf is an obsolete API that was designed in a world before hot > standby. What I suggest people really want, if they're trying to > interactively determine how far to roll forward, is this: > > (1) A recovery.conf parameter that specifies "pause when hot standby > opens up" (that is, as soon as we have consistency). > (2) A SQL command/function that releases the pause mode *and* specifies > a new target stop point (ie, an interactive way of setting the recovery > target parameters). The startup process then rolls forward to that > target and pauses again. > > (3) A SQL command/function that releases the pause mode and specifies > coming up normally, ie not following the archived WAL any further > (I imagine this would force a timeline switch). That sounds good. The multitude of options for 2) sounds a bit annoying, but I am not sure where to cut there. > > The existing "pause now" function could still fit into this framework; > but it seems to me to have mighty limited usefulness, considering the > speed of WAL replay versus human reaction time. I think "pause now" is useful for independent purposes. You can use while operating a normal standby to stop replay for some time if you need consistent data and then happily resume afterwards (if you have enough wal stored...). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Jeff Janes <jeff.janes@gmail.com> writes: > Right now if I'm doing a PITR and want to look around before blessing > the restore, I have to: > [ do painful stuff ] Yeah. The worst thing about this is the cost of stepping too far forward, but I doubt we can do much about that --- WAL isn't reversible and I can't see us making it so. What we can get rid of is the pain of shutting down to move the recovery target forward. Another thought here is that it would be good to have some kind of visibility of the last few potential stop points (timestamps/XIDs), so that if you do roll too far forward, you have some idea of what to try after you reset everything. A zero-order implementation of that would be to emit LOG messages as we replay each potential commit, but maybe we can do better. > I would also be nice if only the superuser is allowed to connect to > the hot standby when pause_at_recovery_target=true, until after > pg_xlog_replay_resume() is called. Uh, why? Other users won't be able to do anything except look around; they can't force the database to become read/write. I can't see that it's a good idea for recovery to play games with the pg_hba rules; too much chance of screwing things up for too little benefit. regards, tom lane
On 5 December 2012 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >>> On 2012-12-05 17:24:42 +0000, Simon Riggs wrote: >>>> So ISTM that we should make recoveryStopsHere() return false while we >>>> are inconsistent. Problems solved. > >>> I prefer the previous (fixed) behaviour where we error out if we reach a >>> recovery target before we are consistent: > >> I agree. Silently ignoring the user's specification is not good. >> (I'm not totally sure about ignoring the pause spec, either, but >> there is no good reason to change the established behavior for >> the recovery target spec.) > > On further thought, it seems like recovery_pause_at_target is rather > misdesigned anyway, and taking recovery target parameters from > recovery.conf is an obsolete API that was designed in a world before hot > standby. What I suggest people really want, if they're trying to > interactively determine how far to roll forward, is this: > > (1) A recovery.conf parameter that specifies "pause when hot standby > opens up" (that is, as soon as we have consistency). Agreed. > (2) A SQL command/function that releases the pause mode *and* specifies > a new target stop point (ie, an interactive way of setting the recovery > target parameters). The startup process then rolls forward to that > target and pauses again. That was my original patch, IIRC, with different functions for each target type, plus an advance N records function for use in debugging/automated testing. Having a single function would be possible, taking a single TEXT parameter that we parse just like the recovery.conf, taking semi-colons as newlines. > (3) A SQL command/function that releases the pause mode and specifies > coming up normally, ie not following the archived WAL any further > (I imagine this would force a timeline switch). > > The existing "pause now" function could still fit into this framework; > but it seems to me to have mighty limited usefulness, considering the > speed of WAL replay versus human reaction time. I think that was in there also. Can't remember why we didn't go for the full API last time. I'll have another go, in HEAD. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 5 December 2012 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> On further thought, it seems like recovery_pause_at_target is rather >> misdesigned anyway, and taking recovery target parameters from >> recovery.conf is an obsolete API that was designed in a world before hot >> standby. What I suggest people really want, if they're trying to >> interactively determine how far to roll forward, is this: >> ... > Can't remember why we didn't go for the full API last time. I'll have > another go, in HEAD. That's fine, but the immediate question is what are we doing to fix the back branches. I think everyone is clear that we should be testing LocalHotStandbyActive rather than precursor conditions to see if a pause is allowed, but are we going to do anything more than that? The only other thing I really wanted to do is not have the in-loop pause occur after we've taken actions that are effectively part of the WAL apply step. I think ideally it should happen just before or after the recoveryStopsHere stanza. Last night I worried about adding an extra spinlock acquire to make that work, but today I wonder if we couldn't get away with just a naked if (xlogctl->recoveryPause) recoveryPausesHere(); The argument for this is that although we might fetch a slightly stale value of the shared variable, it can't be very stale --- certainly no older than the spinlock acquisition near the bottom of the previous iteration of the loop. And this is a highly asynchronous feature anyhow, so fuzziness of plus or minus one WAL record in when the pause gets honored is not going to be detectable. Hence an extra spinlock acquisition is not worth the cost. regards, tom lane
On 5 December 2012 21:15, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 5 December 2012 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> On further thought, it seems like recovery_pause_at_target is rather >>> misdesigned anyway, and taking recovery target parameters from >>> recovery.conf is an obsolete API that was designed in a world before hot >>> standby. What I suggest people really want, if they're trying to >>> interactively determine how far to roll forward, is this: >>> ... > >> Can't remember why we didn't go for the full API last time. I'll have >> another go, in HEAD. > > That's fine, but the immediate question is what are we doing to fix > the back branches. I think everyone is clear that we should be testing > LocalHotStandbyActive rather than precursor conditions to see if a pause > is allowed, but are we going to do anything more than that? No > The only other thing I really wanted to do is not have the in-loop pause > occur after we've taken actions that are effectively part of the WAL > apply step. I think ideally it should happen just before or after the > recoveryStopsHere stanza. Last night I worried about adding an extra > spinlock acquire to make that work, but today I wonder if we couldn't > get away with just a naked > > if (xlogctl->recoveryPause) > recoveryPausesHere(); > > The argument for this is that although we might fetch a slightly stale > value of the shared variable, it can't be very stale --- certainly no > older than the spinlock acquisition near the bottom of the previous > iteration of the loop. And this is a highly asynchronous feature > anyhow, so fuzziness of plus or minus one WAL record in when the pause > gets honored is not going to be detectable. Hence an extra spinlock > acquisition is not worth the cost. Yep, thats fine. Are you doing this or do you want me to? Don't mind either way. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > Yep, thats fine. > Are you doing this or do you want me to? Don't mind either way. I've got a patch for most of it already, so happy to do it. regards, tom lane
On 2012-12-05 16:15:38 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On 5 December 2012 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> On further thought, it seems like recovery_pause_at_target is rather > >> misdesigned anyway, and taking recovery target parameters from > >> recovery.conf is an obsolete API that was designed in a world before hot > >> standby. What I suggest people really want, if they're trying to > >> interactively determine how far to roll forward, is this: > >> ... > > > Can't remember why we didn't go for the full API last time. I'll have > > another go, in HEAD. > > That's fine, but the immediate question is what are we doing to fix > the back branches. I think everyone is clear that we should be testing > LocalHotStandbyActive rather than precursor conditions to see if a pause > is allowed, but are we going to do anything more than that? I'd like to have inclusive/non-inclusive stops some resemblance of sanity. Raw patch including your earlier LocalHotStandbyActive one attached. > The only other thing I really wanted to do is not have the in-loop pause > occur after we've taken actions that are effectively part of the WAL > apply step. I think ideally it should happen just before or after the > recoveryStopsHere stanza. Last night I worried about adding an extra > spinlock acquire to make that work, but today I wonder if we couldn't > get away with just a naked > > if (xlogctl->recoveryPause) > recoveryPausesHere(); > The argument for this is that although we might fetch a slightly stale > value of the shared variable, it can't be very stale --- certainly no > older than the spinlock acquisition near the bottom of the previous > iteration of the loop. And this is a highly asynchronous feature > anyhow, so fuzziness of plus or minus one WAL record in when the pause > gets honored is not going to be detectable. Hence an extra spinlock > acquisition is not worth the cost. Seems safe enough to me. But I am not sure its necessary, if we move the recoveryPausesHere() down after replaying the record, fetching inside the spinlock for recoveryLastRecPtr seems to be natural enough. Greetings, Andres Freund
Attachment
On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The argument for this is that although we might fetch a slightly stale > value of the shared variable, it can't be very stale --- certainly no > older than the spinlock acquisition near the bottom of the previous > iteration of the loop. And this is a highly asynchronous feature > anyhow, so fuzziness of plus or minus one WAL record in when the pause > gets honored is not going to be detectable. Hence an extra spinlock > acquisition is not worth the cost. I wonder whether the cost of an extra spinlock acquire/release cycle is really noticeable here. That can get expensive in a hurry when lots of processes are contending the spinlock ... but I think that shouldn't be the case here; most of the accesses will be coming from the startup process. Of course atomic operations are much more expensive than ordinary CPU instructions even under the best of circumstances, but is that really material here? I'm just wondering whether this is premature optimization that's going to potentially bite us later in the form of subtle, hard-to-reproduce bugs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The argument for this is that although we might fetch a slightly stale >> value of the shared variable, it can't be very stale --- certainly no >> older than the spinlock acquisition near the bottom of the previous >> iteration of the loop. And this is a highly asynchronous feature >> anyhow, so fuzziness of plus or minus one WAL record in when the pause >> gets honored is not going to be detectable. Hence an extra spinlock >> acquisition is not worth the cost. > I wonder whether the cost of an extra spinlock acquire/release cycle > is really noticeable here. That can get expensive in a hurry when > lots of processes are contending the spinlock ... but I think that > shouldn't be the case here; most of the accesses will be coming from > the startup process. Of course atomic operations are much more > expensive than ordinary CPU instructions even under the best of > circumstances, but is that really material here? I'm just wondering > whether this is premature optimization that's going to potentially > bite us later in the form of subtle, hard-to-reproduce bugs. I have the same doubt about whether it's really significant. However, I think it's much more likely that we'd find out it is significant than that we'd find out anybody could detect the lack of a memory barrier there. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2012-12-05 16:15:38 -0500, Tom Lane wrote: >> That's fine, but the immediate question is what are we doing to fix >> the back branches. I think everyone is clear that we should be testing >> LocalHotStandbyActive rather than precursor conditions to see if a pause >> is allowed, but are we going to do anything more than that? > I'd like to have inclusive/non-inclusive stops some resemblance of > sanity. > Raw patch including your earlier LocalHotStandbyActive one attached. I looked at this but couldn't get excited about using it. There were some obvious bugs ("if (!recoveryStopsHere)"?) but the real problem is that I think we're going to end up reworking the interaction between recoveryPausesHere and the recoveryStopsHere stanza quite a bit. In particular, we should expect that we're going to need to respond to a changed recovery target after any pause. So placing a call of recoveryPausesHere down at the loop bottom where the action is already predetermined seems like the wrong thing. I'm not entirely sure what a clean design would look like, but that's not it. I'll leave it to Simon to think about what we want to do there next. regards, tom lane
On 5 December 2012 22:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The argument for this is that although we might fetch a slightly stale >>> value of the shared variable, it can't be very stale --- certainly no >>> older than the spinlock acquisition near the bottom of the previous >>> iteration of the loop. And this is a highly asynchronous feature >>> anyhow, so fuzziness of plus or minus one WAL record in when the pause >>> gets honored is not going to be detectable. Hence an extra spinlock >>> acquisition is not worth the cost. > >> I wonder whether the cost of an extra spinlock acquire/release cycle >> is really noticeable here. That can get expensive in a hurry when >> lots of processes are contending the spinlock ... but I think that >> shouldn't be the case here; most of the accesses will be coming from >> the startup process. Of course atomic operations are much more >> expensive than ordinary CPU instructions even under the best of >> circumstances, but is that really material here? I'm just wondering >> whether this is premature optimization that's going to potentially >> bite us later in the form of subtle, hard-to-reproduce bugs. > > I have the same doubt about whether it's really significant. However, > I think it's much more likely that we'd find out it is significant than > that we'd find out anybody could detect the lack of a memory barrier > there. Agreed. And any future logic to stop at a specific point will be exactly precise because the decision and action will be taken in same process. Patch looks good. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2012-12-05 18:35:47 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2012-12-05 16:15:38 -0500, Tom Lane wrote: > >> That's fine, but the immediate question is what are we doing to fix > >> the back branches. I think everyone is clear that we should be testing > >> LocalHotStandbyActive rather than precursor conditions to see if a pause > >> is allowed, but are we going to do anything more than that? > > > I'd like to have inclusive/non-inclusive stops some resemblance of > > sanity. > > Raw patch including your earlier LocalHotStandbyActive one attached. > > I looked at this but couldn't get excited about using it. There were > some obvious bugs ("if (!recoveryStopsHere)"?) but the real problem is > that I think we're going to end up reworking the interaction between > recoveryPausesHere and the recoveryStopsHere stanza quite a bit. > In particular, we should expect that we're going to need to respond > to a changed recovery target after any pause. So placing a call of > recoveryPausesHere down at the loop bottom where the action is already > predetermined seems like the wrong thing. I'm not entirely sure what > a clean design would look like, but that's not it. I'll leave it to > Simon to think about what we want to do there next. Yes, the patch obviously wasn't ready for anything. Just seemed easier that way than trying to describe what I think is sensible. What I dislike with what you committed is that the state you're investigating during the pause isn't the one youre going to end up recoveryApply == true. That seems dangerous to me, even if its going to be reworked in HEAD. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Dec 5, 2012 at 11:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> Right now if I'm doing a PITR and want to look around before blessing >> the restore, I have to: >> [ do painful stuff ] > > Yeah. The worst thing about this is the cost of stepping too far > forward, but I doubt we can do much about that --- WAL isn't reversible > and I can't see us making it so. What we can get rid of is the pain > of shutting down to move the recovery target forward. > > Another thought here is that it would be good to have some kind of > visibility of the last few potential stop points (timestamps/XIDs), > so that if you do roll too far forward, you have some idea of what > to try after you reset everything. A zero-order implementation of > that would be to emit LOG messages as we replay each potential > commit, but maybe we can do better. probably embellishments on xlogdump or xlogreader would be the way to go. > >> I would also be nice if only the superuser is allowed to connect to >> the hot standby when pause_at_recovery_target=true, until after >> pg_xlog_replay_resume() is called. > > Uh, why? Other users won't be able to do anything except look around; On some systems, 95% of users never do anything (that the database knows about) except look around. But I think it would be unfortunate to accidentally show them data that will soon be revoked. > they can't force the database to become read/write. I can't see that > it's a good idea for recovery to play games with the pg_hba rules; > too much chance of screwing things up for too little benefit. OK. I don't know at all what is involved in implementing such a thing. But a DBA in the middle of a rather arcane but urgent task has a pretty high chance of screwing things up, too. Cheers, Jeff
Andres Freund <andres@2ndquadrant.com> writes: > What I dislike with what you committed is that the state you're > investigating during the pause isn't the one youre going to end up > recoveryApply == true. That seems dangerous to me, even if its going to > be reworked in HEAD. Agreed, but it's been like that since the pause feature was created, and we've not heard complaints (which TBH makes me wonder how much use the feature has actually seen). I'm hesitant to change the semantics in the back branches more than necessary to avoid clear-showstopper cases. regards, tom lane