Thread: PITR potentially broken in 9.2

PITR potentially broken in 9.2

From
Jeff Janes
Date:
Doing PITR in 9.2.1, the system claims that it reached a consistent
recovery state immediately after redo starts.
This leads to it various mysterious failures, when it should instead
throw a "requested recovery stop point is before consistent recovery
point" error.
(If you are unlucky, I think it might even silently start up in a
corrupt state.)

This seems to have been introduced in:
commit 8366c7803ec3d0591cf2d1226fea1fee947d56c3
Author: Simon Riggs <simon@2ndQuadrant.com>
Date:   Wed Jan 25 18:02:04 2012 +0000

    Allow pg_basebackup from standby node with safety checking.
    Base backup follows recommended procedure, plus goes to great
    lengths to ensure that partial page writes are avoided.

    Jun Ishizuka and Fujii Masao, with minor modifications


the backup file:

START WAL LOCATION: 1/CD89E48 (file 00000001000000010000000C)
STOP WAL LOCATION: 1/1AFA11A0 (file 00000001000000010000001A)
CHECKPOINT LOCATION: 1/188D8120
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2012-11-27 09:40:13 PST
LABEL: label
STOP TIME: 2012-11-27 09:42:10 PST

(The file 00000001000000010000000C was archived at 9:37.)

recovery.conf:
restore_command = 'cp /tmp/archivedir/%f %p'
recovery_target_time = '2012-11-27 09:38:00 PST'


Log file:

22110  2012-11-27 09:49:15.220 PST LOG:  database system was
interrupted; last known up at 2012-11-27 09:40:13 PST
22110  2012-11-27 09:49:15.235 PST LOG:  starting point-in-time
recovery to 2012-11-27 09:38:00-08
22110  2012-11-27 09:49:15.271 PST LOG:  restored log file
"000000010000000100000018" from archive
22110  2012-11-27 09:49:15.367 PST LOG:  restored log file
"00000001000000010000000C" from archive
22110  2012-11-27 09:49:15.372 PST LOG:  redo starts at 1/CD89E48
22110  2012-11-27 09:49:15.374 PST LOG:  consistent recovery state
reached at 1/CD8B7F0
22110  2012-11-27 09:49:15.490 PST LOG:  restored log file
"00000001000000010000000D" from archive
22110  2012-11-27 09:49:15.775 PST LOG:  restored log file
"00000001000000010000000E" from archive
22110  2012-11-27 09:49:16.078 PST LOG:  restored log file
"00000001000000010000000F" from archive
22110  2012-11-27 09:49:16.345 PST LOG:  restored log file
"000000010000000100000010" from archive
22110  2012-11-27 09:49:16.533 PST LOG:  recovery stopping before
commit of transaction 951967, time 2012-11-27 09:38:00.000689-08
22110  2012-11-27 09:49:16.533 PST LOG:  redo done at 1/10F41900
22110  2012-11-27 09:49:16.533 PST LOG:  last completed transaction
was at log time 2012-11-27 09:37:59.998496-08
22110  2012-11-27 09:49:16.537 PST LOG:  selected new timeline ID: 2
22110  2012-11-27 09:49:16.584 PST LOG:  archive recovery complete
22113  2012-11-27 09:49:16.599 PST LOG:  checkpoint starting:
end-of-recovery immediate wait
22113  2012-11-27 09:49:17.815 PST LOG:  checkpoint complete: wrote
8336 buffers (12.7%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=0.097 s, sync=1.115 s, total=1.230 s; sync files=14,
longest=0.578 s, average=0.079 s
22110  2012-11-27 09:49:17.929 PST FATAL:  could not access status of
transaction 1014015
22110  2012-11-27 09:49:17.929 PST DETAIL:  Could not read from file
"pg_clog/0000" at offset 245760: Success.
22109  2012-11-27 09:49:17.932 PST LOG:  startup process (PID 22110)
exited with exit code 1
22109  2012-11-27 09:49:17.932 PST LOG:  terminating any other active
server processes

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Noah Misch
Date:
On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote:
> Doing PITR in 9.2.1, the system claims that it reached a consistent
> recovery state immediately after redo starts.
> This leads to it various mysterious failures, when it should instead
> throw a "requested recovery stop point is before consistent recovery
> point" error.
> (If you are unlucky, I think it might even silently start up in a
> corrupt state.)

I observed a similar problem with 9.2.  Despite a restore_command that failed
every time, startup from a hot backup completed.  At the time, I suspected a
mistake on my part.

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-11-27 10:08:12 -0800, Jeff Janes wrote:
> Doing PITR in 9.2.1, the system claims that it reached a consistent
> recovery state immediately after redo starts.
> This leads to it various mysterious failures, when it should instead
> throw a "requested recovery stop point is before consistent recovery
> point" error.
> (If you are unlucky, I think it might even silently start up in a
> corrupt state.)
>
> This seems to have been introduced in:
> commit 8366c7803ec3d0591cf2d1226fea1fee947d56c3
> Author: Simon Riggs <simon@2ndQuadrant.com>
> Date:   Wed Jan 25 18:02:04 2012 +0000
>
>     Allow pg_basebackup from standby node with safety checking.
>     Base backup follows recommended procedure, plus goes to great
>     lengths to ensure that partial page writes are avoided.
>
>     Jun Ishizuka and Fujii Masao, with minor modifications
>
>
> the backup file:
>
> START WAL LOCATION: 1/CD89E48 (file 00000001000000010000000C)
> STOP WAL LOCATION: 1/1AFA11A0 (file 00000001000000010000001A)
> CHECKPOINT LOCATION: 1/188D8120
> BACKUP METHOD: pg_start_backup
> BACKUP FROM: master
> START TIME: 2012-11-27 09:40:13 PST
> LABEL: label
> STOP TIME: 2012-11-27 09:42:10 PST
>
> (The file 00000001000000010000000C was archived at 9:37.)
>
> recovery.conf:
> restore_command = 'cp /tmp/archivedir/%f %p'
> recovery_target_time = '2012-11-27 09:38:00 PST'
>
>
> Log file:
>
> 22110  2012-11-27 09:49:15.220 PST LOG:  database system was
> interrupted; last known up at 2012-11-27 09:40:13 PST
> 22110  2012-11-27 09:49:15.235 PST LOG:  starting point-in-time
> recovery to 2012-11-27 09:38:00-08
> 22110  2012-11-27 09:49:15.271 PST LOG:  restored log file
> "000000010000000100000018" from archive
> 22110  2012-11-27 09:49:15.367 PST LOG:  restored log file
> "00000001000000010000000C" from archive
> 22110  2012-11-27 09:49:15.372 PST LOG:  redo starts at 1/CD89E48

Hm. Are you sure its actually reading your backup file? Its hard to say
without DEBUG1 output but I would tentatively say its not reading it at
all because the the "redo starts at ..." message indicates its not using
the checkpoint location from the backup file.

Can you reproduce the issue? If so, can you give an exact guide? If not,
do you still have the datadir et al. from above?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Heikki Linnakangas
Date:
On 28.11.2012 06:27, Noah Misch wrote:
> On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote:
>> Doing PITR in 9.2.1, the system claims that it reached a consistent
>> recovery state immediately after redo starts.
>> This leads to it various mysterious failures, when it should instead
>> throw a "requested recovery stop point is before consistent recovery
>> point" error.
>> (If you are unlucky, I think it might even silently start up in a
>> corrupt state.)
>
> I observed a similar problem with 9.2.  Despite a restore_command that failed
> every time, startup from a hot backup completed.  At the time, I suspected a
> mistake on my part.

I believe this was caused by this little typo/thinko:

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -6763,7 +6763,7 @@ StartupXLOG(void)
>                  /* Pop the error context stack */
>                  error_context_stack = errcontext.previous;
>
> -                if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
> +                if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) &&
>                      XLByteLE(ControlFile->backupEndPoint, EndRecPtr))
>                  {
>                      /*

Normally, backupEndPoint is invalid, and we rely on seeing an
end-of-backup WAL record to mark the location. backupEndPoint is only
set when restoring from a backup that was taken from a standby, but
thanks to the above, recovery incorrectly treats that as end-of-backup.

Fixed, thanks for the report!

- Heikki

Re: PITR potentially broken in 9.2

From
Heikki Linnakangas
Date:
On 28.11.2012 15:26, Andres Freund wrote:
> Hm. Are you sure its actually reading your backup file? Its hard to say
> without DEBUG1 output but I would tentatively say its not reading it at
> all because the the "redo starts at ..." message indicates its not using
> the checkpoint location from the backup file.

By backup file, you mean the backup history file? Since 9.0, recovery
does not read the backup history file, it's for informational/debugging
purposes only. All the information recovery needs is in the
backup_label, and an end-of-backup WAL record marks the location where
pg_stop_backup() was called, ie. how far the WAL must be replayed for
the backup to be consistent.

> Can you reproduce the issue? If so, can you give an exact guide? If not,
> do you still have the datadir et al. from above?

I just committed a fix for this, but if you can, it would still be nice
if you could double-check that it now really works.

- Heikki

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-11-28 15:37:38 +0200, Heikki Linnakangas wrote:
> On 28.11.2012 15:26, Andres Freund wrote:
> >Hm. Are you sure its actually reading your backup file? Its hard to say
> >without DEBUG1 output but I would tentatively say its not reading it at
> >all because the the "redo starts at ..." message indicates its not using
> >the checkpoint location from the backup file.
>
> By backup file, you mean the backup history file? Since 9.0, recovery does
> not read the backup history file, it's for informational/debugging purposes
> only. All the information recovery needs is in the backup_label, and an
> end-of-backup WAL record marks the location where pg_stop_backup() was
> called, ie. how far the WAL must be replayed for the backup to be
> consistent.
>

I mean the label read by read_backup_label(). Jeff's mail indicated it
had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48.

 >Can you reproduce the issue? If so, can you give an exact guide? If not,
> >do you still have the datadir et al. from above?

> I just committed a fix for this, but if you can, it would still be nice if
> you could double-check that it now really works.

Yuck. Too bad that that got in.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Heikki Linnakangas
Date:
On 28.11.2012 15:47, Andres Freund wrote:
> I mean the label read by read_backup_label(). Jeff's mail indicated it
> had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48.

That's correct. The checkpoint was at 1/188D8120, but it's redo pointer
was earlier, at 1/CD89E48, so that's where redo had to start.

- Heikki

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-11-28 16:34:55 +0200, Heikki Linnakangas wrote:
> On 28.11.2012 15:47, Andres Freund wrote:
> >I mean the label read by read_backup_label(). Jeff's mail indicated it
> >had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48.
>
> That's correct. The checkpoint was at 1/188D8120, but it's redo pointer was
> earlier, at 1/CD89E48, so that's where redo had to start.

Heh. I I just compared 18 with CD and didn't notice the different length
of both...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 28.11.2012 06:27, Noah Misch wrote:
>> I observed a similar problem with 9.2.  Despite a restore_command that failed
>> every time, startup from a hot backup completed.  At the time, I suspected a
>> mistake on my part.

> I believe this was caused by this little typo/thinko:

Is this related at all to the problem discussed over at
http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php
?  The conclusion-so-far in that thread seems to be that an error
ought to be thrown for recovery_target_time earlier than the
backup stop time, but one is not getting thrown.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Wed, Nov 28, 2012 at 5:37 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 28.11.2012 15:26, Andres Freund wrote:
>>
>
>
>> Can you reproduce the issue? If so, can you give an exact guide? If not,
>> do you still have the datadir et al. from above?

Yes, it is reliable enough to be used for "git bisect"

rm /tmp/archivedir/0000000*
initdb
## edit postgresql.conf to set up archiving etc. and set
checkpoint_segments to 60
pg_ctl -D /tmp/data -l /tmp/data/logfile_master start -w
createdb
pgbench -i -s 10
pgbench -T 36000000 &
sleep 120
psql  -c "SELECT pg_start_backup('label');"
cp -rp /tmp/data/ /tmp/data_slave
sleep 120
psql -c  "SELECT pg_stop_backup();"
rm /tmp/data_slave/pg_xlog/0*
rm /tmp/data_slave/postmaster.*
rm /tmp/data_slave/logfile_master
cp  src/backend/access/transam/recovery.conf.sample
/tmp/data_slave/recovery.conf
## edit /tmp/data_slave/recovery.conf to set up restore command and stop point.
cp -rpi /tmp/data_slave /tmp/data_slave2
pg_ctl -D /tmp/data_slave2/ start -o "--port=9876"

At some point, kill the pgbench:
pg_ctl -D /tmp/data stop -m fast

I run the master with fsync off, otherwise to takes to long to
accumulate archived log files.
The checkpoint associated with pg_start_backup takes ~2.5 minutes, so
pick a time that is 1.25 minutes before the time reported in the
backup history or backup_label file for the PITR end time.

I copy data_slave to data_slave2 so that I can try different things
without having to restart the whole process from the beginning.



> I just committed a fix for this, but if you can, it would still be nice if
> you could double-check that it now really works.

Thanks.  In REL9_2_STABLE, it now correctly gives the  "requested
recovery stop point is before consistent recovery point" error.


Also if the recovery is started with hot_standby=on and with no
recovery_target_time, in patched REL9_2_STABLE the database becomes
"ready to accept read only connections" at the appropriate time, once
the end-of-backup WAL has been replayed.  In 9.2.0 and 9.2.1, it
instead opened for read only connections at the point that the
end-of-checkpoint record (the checkpoint associated with the
pg_start_backup) has replayed, which I think is too early.

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> On 28.11.2012 06:27, Noah Misch wrote:
>>> I observed a similar problem with 9.2.  Despite a restore_command that failed
>>> every time, startup from a hot backup completed.  At the time, I suspected a
>>> mistake on my part.
>
>> I believe this was caused by this little typo/thinko:
>
> Is this related at all to the problem discussed over at
> http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php
> ?  The conclusion-so-far in that thread seems to be that an error
> ought to be thrown for recovery_target_time earlier than the
> backup stop time, but one is not getting thrown.

It is not directly related.  That thread was about 9.1.6.

In the newly fixed 9_2_STABLE, that problem still shows up the same as
it does in 9.1.6.

(In 9.2.1, recovery sometimes blows up before that particular problem
could be detected, which is what lead me here in the first place--that
is the extent of the relationship AFAIK)

To see this one, follow the instructions in my previous email, but set
recovery_target_time to a time just after the end of the
pg_start_backup checkpoint, rather than just before it, and turn on
hot_standby

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Is this related at all to the problem discussed over at
>> http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php
>> ?  The conclusion-so-far in that thread seems to be that an error
>> ought to be thrown for recovery_target_time earlier than the
>> backup stop time, but one is not getting thrown.

> It is not directly related.  That thread was about 9.1.6.

> In the newly fixed 9_2_STABLE, that problem still shows up the same as
> it does in 9.1.6.

> (In 9.2.1, recovery sometimes blows up before that particular problem
> could be detected, which is what lead me here in the first place--that
> is the extent of the relationship AFAIK)

> To see this one, follow the instructions in my previous email, but set
> recovery_target_time to a time just after the end of the
> pg_start_backup checkpoint, rather than just before it, and turn on
> hot_standby

I tried to reproduce this as per your directions, and see no problem in
HEAD.  Recovery advances to the specified stop time, hot standby mode
wakes up, and it pauses waiting for me to do pg_xlog_replay_resume().
But I can connect to the standby and do that.  So I'm unsure what the
problem is.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Is this related at all to the problem discussed over at
>>> http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php
>>> ?  The conclusion-so-far in that thread seems to be that an error
>>> ought to be thrown for recovery_target_time earlier than the
>>> backup stop time, but one is not getting thrown.
>
>> It is not directly related.  That thread was about 9.1.6.
>
>> In the newly fixed 9_2_STABLE, that problem still shows up the same as
>> it does in 9.1.6.
>
>> (In 9.2.1, recovery sometimes blows up before that particular problem
>> could be detected, which is what lead me here in the first place--that
>> is the extent of the relationship AFAIK)
>
>> To see this one, follow the instructions in my previous email, but set
>> recovery_target_time to a time just after the end of the
>> pg_start_backup checkpoint, rather than just before it, and turn on
>> hot_standby
>
> I tried to reproduce this as per your directions, and see no problem in
> HEAD.  Recovery advances to the specified stop time, hot standby mode
> wakes up,

Hot standby should only wake up once recovery has proceeded beyond the
pg_stop_backup() point.

If the specified stop point is after pg_start_backup() finishes, but
before pg_stop_backup(), then hot standby should not start up (and
with the newest HEAD, in my hands, it does not).  Are you sure you set
the stop time to just after pg_start_backup, not to just after
pg_stop_backup?

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jeff Janes <jeff.janes@gmail.com> writes:
>>> In the newly fixed 9_2_STABLE, that problem still shows up the same as
>>> it does in 9.1.6.

>> I tried to reproduce this as per your directions, and see no problem in
>> HEAD.  Recovery advances to the specified stop time, hot standby mode
>> wakes up,

> Hot standby should only wake up once recovery has proceeded beyond the
> pg_stop_backup() point.

> If the specified stop point is after pg_start_backup() finishes, but
> before pg_stop_backup(), then hot standby should not start up (and
> with the newest HEAD, in my hands, it does not).  Are you sure you set
> the stop time to just after pg_start_backup, not to just after
> pg_stop_backup?

I'm confused.  Are you now saying that this problem only exists in
9.1.x?  I tested current HEAD because you indicated the problem was
still there.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Jeff Janes <jeff.janes@gmail.com> writes:
>>>> In the newly fixed 9_2_STABLE, that problem still shows up the same as
>>>> it does in 9.1.6.
>
>>> I tried to reproduce this as per your directions, and see no problem in
>>> HEAD.  Recovery advances to the specified stop time, hot standby mode
>>> wakes up,
>
>> Hot standby should only wake up once recovery has proceeded beyond the
>> pg_stop_backup() point.
>
>> If the specified stop point is after pg_start_backup() finishes, but
>> before pg_stop_backup(), then hot standby should not start up (and
>> with the newest HEAD, in my hands, it does not).  Are you sure you set
>> the stop time to just after pg_start_backup, not to just after
>> pg_stop_backup?
>
> I'm confused.  Are you now saying that this problem only exists in
> 9.1.x?  I tested current HEAD because you indicated the problem was
> still there.

No, I'm saying the problem exists both in 9.1.x and in hypothetical
9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it
is masked by that other problem which has just been fixed.

I'll try it again in b1346822f3048ede254647f3a46 just to be sure, but
I'm already fairly sure.

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm confused.  Are you now saying that this problem only exists in
>> 9.1.x?  I tested current HEAD because you indicated the problem was
>> still there.

> No, I'm saying the problem exists both in 9.1.x and in hypothetical
> 9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it
> is masked by that other problem which has just been fixed.

I'm still confused.  I've now tried this in both HEAD and 9.1 branch
tip, and I do not see any misbehavior.  If I set recovery_target_time to
before the pg_stop_backup time, I get "FATAL:  requested recovery stop
point is before consistent recovery point" which is what I expect; and
if I set it to after the pg_stop_backup time, it starts up as expected.
So if there's a remaining unfixed bug here, I don't understand what
that is.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Sun, Dec 2, 2012 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm confused.  Are you now saying that this problem only exists in
>>> 9.1.x?  I tested current HEAD because you indicated the problem was
>>> still there.
>
>> No, I'm saying the problem exists both in 9.1.x and in hypothetical
>> 9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it
>> is masked by that other problem which has just been fixed.
>
> I'm still confused.  I've now tried this in both HEAD and 9.1 branch
> tip, and I do not see any misbehavior.  If I set recovery_target_time to
> before the pg_stop_backup time, I get "FATAL:  requested recovery stop
> point is before consistent recovery point" which is what I expect; and
> if I set it to after the pg_stop_backup time, it starts up as expected.
> So if there's a remaining unfixed bug here, I don't understand what
> that is.

I've reproduced it again using the just-tagged 9.2.2, and uploaded a
135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google
drive.  The data directory contains the recovery.conf which is set to
end recovery between the two critical time points.

https://docs.google.com/open?id=0Bzqrh1SO9FcES181YXRVdU5NSlk

This is the command line I use to start recovery, and the resulting log output.

https://docs.google.com/open?id=0Bzqrh1SO9FcEaTQ2QXhFdDZYaUE

I can't connect to the standby to execute pg_xlog_replay_resume() because:
FATAL:  the database system is starting up

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> I've reproduced it again using the just-tagged 9.2.2, and uploaded a
> 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google
> drive.  The data directory contains the recovery.conf which is set to
> end recovery between the two critical time points.

Hmmm ... I can reproduce this with current 9.2 branch tip.  However,
more or less by accident I first tried it with a 9.2-branch postmaster
from a couple weeks ago, and it works as expected with that: the log
output looks like

LOG:  restored log file "00000001000000000000001B" from archive
LOG:  restored log file "00000001000000000000001C" from archive
LOG:  restored log file "00000001000000000000001D" from archive
LOG:  database system is ready to accept read only connections
LOG:  recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08
LOG:  recovery has paused
HINT:  Execute pg_xlog_replay_resume() to continue.

and I can connect and do the pg_xlog_replay_resume() thing.
Note the "ready to accept read only connections" line, which
does not show up with branch tip.

So apparently this is something we broke since Nov 18.  Don't know what
yet --- any thoughts?  Also, I am still not seeing what the connection
is to the original report against 9.1.6.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
I wrote:
> So apparently this is something we broke since Nov 18.  Don't know what
> yet --- any thoughts?

Further experimentation shows that reverting commit
ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work.  So there's
something wrong/incomplete about that fix.

This is a bit urgent since we now have to consider whether to withdraw
9.2.2 and issue a hasty 9.2.3.  Do we have a regression here since
9.2.1, and if so how bad is it?
        regards, tom lane



Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-04 19:35:48 -0500, Tom Lane wrote:
> I wrote:
> > So apparently this is something we broke since Nov 18.  Don't know what
> > yet --- any thoughts?
>
> Further experimentation shows that reverting commit
> ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work.  So there's
> something wrong/incomplete about that fix.

ISTM that the code should check ControlFile->backupEndRequired, not just
check for an invalid backupEndPoint. I haven't looked into the specific
issue though.

> This is a bit urgent since we now have to consider whether to withdraw
> 9.2.2 and issue a hasty 9.2.3.  Do we have a regression here since
> 9.2.1, and if so how bad is it?

Not sure.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Tue, Dec 4, 2012 at 4:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I've reproduced it again using the just-tagged 9.2.2, and uploaded a
>> 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google
>> drive.  The data directory contains the recovery.conf which is set to
>> end recovery between the two critical time points.
>
> Hmmm ... I can reproduce this with current 9.2 branch tip.  However,
> more or less by accident I first tried it with a 9.2-branch postmaster
> from a couple weeks ago, and it works as expected with that: the log
> output looks like
>
> LOG:  restored log file "00000001000000000000001B" from archive
> LOG:  restored log file "00000001000000000000001C" from archive
> LOG:  restored log file "00000001000000000000001D" from archive
> LOG:  database system is ready to accept read only connections
> LOG:  recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08
> LOG:  recovery has paused
> HINT:  Execute pg_xlog_replay_resume() to continue.
>
> and I can connect and do the pg_xlog_replay_resume() thing.

But the key is, the database was not actually consistent at that
point, and so opening hot standby was a dangerous thing to do.

The bug that allowed the database to open early (the original topic if
this email chain) was masking this secondary issue.

> So apparently this is something we broke since Nov 18.  Don't know what
> yet --- any thoughts?  Also, I am still not seeing what the connection
> is to the original report against 9.1.6.

The behavior that we both see in 9.2.2, where it waits for a
pg_xlog_replay_resume() that cannot be delivered because the database
is not yet open, is the same thing I'm seeing in 9.1.6.  I'll see if I
can repeat it in 9.1.7 and post the tarball of the data directory.

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-04 19:20:44 -0500, Tom Lane wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
> > I've reproduced it again using the just-tagged 9.2.2, and uploaded a
> > 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google
> > drive.  The data directory contains the recovery.conf which is set to
> > end recovery between the two critical time points.
>
> Hmmm ... I can reproduce this with current 9.2 branch tip.  However,
> more or less by accident I first tried it with a 9.2-branch postmaster
> from a couple weeks ago, and it works as expected with that: the log
> output looks like
>
> LOG:  restored log file "00000001000000000000001B" from archive
> LOG:  restored log file "00000001000000000000001C" from archive
> LOG:  restored log file "00000001000000000000001D" from archive
> LOG:  database system is ready to accept read only connections
> LOG:  recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08
> LOG:  recovery has paused
> HINT:  Execute pg_xlog_replay_resume() to continue.
>
> and I can connect and do the pg_xlog_replay_resume() thing.
> Note the "ready to accept read only connections" line, which
> does not show up with branch tip.
>
> So apparently this is something we broke since Nov 18.  Don't know what
> yet --- any thoughts?  Also, I am still not seeing what the connection
> is to the original report against 9.1.6.

Maybe I am blind, but, looking at the backup label:

START WAL LOCATION: 0/110006A8 (file 000000010000000000000011)
CHECKPOINT LOCATION: 0/1D776B50
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2012-12-02 15:08:52 PST
LABEL: label

This was a straight archive based hot backup with manual
pg_start/stop_backup. Which means we ought to find a XLOG_BACKUP_END
record. Before that we shouldn't be consistent.

Satoshi's version of xlogdump outputs this (besides heaps of errors ...):
~/bin/xlogdump-9.2 ~/tmp/recover/tmp/archivedir/0000000100000000000000* 2>&1|grep "backup end"
[cur:0/22C361E0, xid:0, rmid:0(XLOG), len:8/40, prev:0/22C361B0] backup end: started at 0/110006A8.

But again, according to xlogdump:
[cur:0/1D861CD8, xid:305610, rmid:1(Transaction), len:12/44, prev:0/1D861C80] compact commit at 2012-12-03 00:08:54 CET

So the target xid far before the backend end, so it seems absolutely
correct not to permit access yet.

So maybe I am (again) missing something here, but everything looks fine.

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Tue, Dec 4, 2012 at 4:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> So apparently this is something we broke since Nov 18.  Don't know what
>> yet --- any thoughts?
>
> Further experimentation shows that reverting commit
> ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work.  So there's
> something wrong/incomplete about that fix.

I can't independently vouch for the correctness of that fix, but I can
vouch that there is so far no evidence that it is incorrect.

It is re-revealing an undesirable (but safe, as far as we know)
behavior that is present in 9.1.x but which was temporarily hidden by
a corruption-risk bug in 9.2.0 and 9.2.1.

>
> This is a bit urgent since we now have to consider whether to withdraw
> 9.2.2 and issue a hasty 9.2.3.  Do we have a regression here since
> 9.2.1, and if so how bad is it?

I don't think this is urgent.  The error-message issue in 9.1.6 and
9.2.2 is merely annoying, while the early-opening one in 9.2.0 and
9.2.1 seems fundamentally unsafe.

Cheers,

Jeff



Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-04 18:05:15 -0800, Jeff Janes wrote:
> On Tue, Dec 4, 2012 at 4:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Jeff Janes <jeff.janes@gmail.com> writes:
> >> I've reproduced it again using the just-tagged 9.2.2, and uploaded a
> >> 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google
> >> drive.  The data directory contains the recovery.conf which is set to
> >> end recovery between the two critical time points.
> >
> > Hmmm ... I can reproduce this with current 9.2 branch tip.  However,
> > more or less by accident I first tried it with a 9.2-branch postmaster
> > from a couple weeks ago, and it works as expected with that: the log
> > output looks like
> >
> > LOG:  restored log file "00000001000000000000001B" from archive
> > LOG:  restored log file "00000001000000000000001C" from archive
> > LOG:  restored log file "00000001000000000000001D" from archive
> > LOG:  database system is ready to accept read only connections
> > LOG:  recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08
> > LOG:  recovery has paused
> > HINT:  Execute pg_xlog_replay_resume() to continue.
> >
> > and I can connect and do the pg_xlog_replay_resume() thing.
>
> But the key is, the database was not actually consistent at that
> point, and so opening hot standby was a dangerous thing to do.
>
> The bug that allowed the database to open early (the original topic if
> this email chain) was masking this secondary issue.
>
> > So apparently this is something we broke since Nov 18.  Don't know what
> > yet --- any thoughts?  Also, I am still not seeing what the connection
> > is to the original report against 9.1.6.
>
> The behavior that we both see in 9.2.2, where it waits for a
> pg_xlog_replay_resume() that cannot be delivered because the database
> is not yet open, is the same thing I'm seeing in 9.1.6.  I'll see if I
> can repeat it in 9.1.7 and post the tarball of the data directory.

Could you check whether the attached patch fixes the behaviour?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
>> But the key is, the database was not actually consistent at that
>> point, and so opening hot standby was a dangerous thing to do.
>>
>> The bug that allowed the database to open early (the original topic if
>> this email chain) was masking this secondary issue.

> Could you check whether the attached patch fixes the behaviour?

Yeah, I had just come to the same conclusion: the bug is not with
Heikki's fix, but with the pause logic.  The comment says that it
shouldn't pause unless users can connect to un-pause, but the actual
implementation of that test is several bricks shy of a load.

Your patch is better, but it's still missing a bet: what we need to be
sure of is not merely that we *could* have told the postmaster to start
hot standby, but that we *actually have done so*.  Otherwise, it's
flow-of-control-dependent whether it works or not; we could fail if the
main loop hasn't gotten to CheckRecoveryConsistency since the conditions
became true.  Looking at the code in CheckRecoveryConsistency, testing
LocalHotStandbyActive seems appropriate.

I also thought it was pretty dangerous that this absolutely critical
test was not placed in recoveryPausesHere() itself, rather than relying
on the call sites to remember to do it.

So the upshot is that I propose a patch more like the attached.

This is not a regression because the pause logic is broken this same
way since 9.1.  So I no longer think that we need a rewrap.

            regards, tom lane

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 50e2b22..593dcee 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** recoveryStopsHere(XLogRecord *record, bo
*** 5923,5928 ****
--- 5923,5932 ----
  static void
  recoveryPausesHere(void)
  {
+     /* Don't pause unless users can connect! */
+     if (!LocalHotStandbyActive)
+         return;
+
      ereport(LOG,
              (errmsg("recovery has paused"),
               errhint("Execute pg_xlog_replay_resume() to continue.")));
*************** StartupXLOG(void)
*** 6697,6707 ****
                   */
                  if (recoveryStopsHere(record, &recoveryApply))
                  {
!                     /*
!                      * Pause only if users can connect to send a resume
!                      * message
!                      */
!                     if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY)
                      {
                          SetRecoveryPause(true);
                          recoveryPausesHere();
--- 6701,6707 ----
                   */
                  if (recoveryStopsHere(record, &recoveryApply))
                  {
!                     if (recoveryPauseAtTarget)
                      {
                          SetRecoveryPause(true);
                          recoveryPausesHere();
*************** StartupXLOG(void)
*** 6737,6752 ****
                  /*
                   * Update shared replayEndRecPtr before replaying this record,
                   * so that XLogFlush will update minRecoveryPoint correctly.
                   */
                  SpinLockAcquire(&xlogctl->info_lck);
                  xlogctl->replayEndRecPtr = EndRecPtr;
                  recoveryPause = xlogctl->recoveryPause;
                  SpinLockRelease(&xlogctl->info_lck);

!                 /*
!                  * Pause only if users can connect to send a resume message
!                  */
!                 if (recoveryPause && standbyState == STANDBY_SNAPSHOT_READY)
                      recoveryPausesHere();

                  /*
--- 6737,6751 ----
                  /*
                   * Update shared replayEndRecPtr before replaying this record,
                   * so that XLogFlush will update minRecoveryPoint correctly.
+                  *
+                  * While we have the lock, also check for a pause request.
                   */
                  SpinLockAcquire(&xlogctl->info_lck);
                  xlogctl->replayEndRecPtr = EndRecPtr;
                  recoveryPause = xlogctl->recoveryPause;
                  SpinLockRelease(&xlogctl->info_lck);

!                 if (recoveryPause)
                      recoveryPausesHere();

                  /*

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-04 21:27:34 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> >> But the key is, the database was not actually consistent at that
> >> point, and so opening hot standby was a dangerous thing to do.
> >>
> >> The bug that allowed the database to open early (the original topic if
> >> this email chain) was masking this secondary issue.
>
> > Could you check whether the attached patch fixes the behaviour?
>
> Yeah, I had just come to the same conclusion: the bug is not with
> Heikki's fix, but with the pause logic.  The comment says that it
> shouldn't pause unless users can connect to un-pause, but the actual
> implementation of that test is several bricks shy of a load.
>
> Your patch is better, but it's still missing a bet: what we need to be
> sure of is not merely that we *could* have told the postmaster to start
> hot standby, but that we *actually have done so*.  Otherwise, it's
> flow-of-control-dependent whether it works or not; we could fail if the
> main loop hasn't gotten to CheckRecoveryConsistency since the conditions
> became true.  Looking at the code in CheckRecoveryConsistency, testing
> LocalHotStandbyActive seems appropriate.

Good point.

My patch wasn't intended as something final, I just wanted confirmation
that it achieves what Jeff wants because I at least - you possibly as
well? - misunderstood him earlier. Would probably have missed the
interaction above anyway ;)

> I also thought it was pretty dangerous that this absolutely critical
> test was not placed in recoveryPausesHere() itself, rather than relying
> on the call sites to remember to do it.

Absolutely aggreed.

> So the upshot is that I propose a patch more like the attached.

Without having run anything so far it looks good to me.

Ok, night now,

Andres
--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-12-04 21:27:34 -0500, Tom Lane wrote:
>> So the upshot is that I propose a patch more like the attached.

> Without having run anything so far it looks good to me.

BTW, while on the theme of the pause feature being several bricks shy of
a load, it looks to me like the place that it was added to the replay
loop was less than sane as well.  Presumably the purpose of a pause is
to let you stop application of the WAL at exactly the current spot;
but you can *not* do that midway through application of the record,
and where it is is effectively that.  As soon as we've updated
xlogctl->replayEndRecPtr, we're committed to replay the record,
because we can't guarantee that the controlfile minRecoveryPoint
doesn't get pushed up to that point by buffer flush activity.
So an abort here could leave the database in an unrestartable condition.

I guess the idea of putting it there was to save one spinlock acquire,
but I'm having a bit of a hard time believing that one spinlock acquire
per WAL record means much.  Still we could possibly preserve that
attribute by moving the pause down to just after the update of
xlogctl->recoveryLastRecPtr.

Thoughts?

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 00:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> So apparently this is something we broke since Nov 18.  Don't know what
>> yet --- any thoughts?
>
> Further experimentation shows that reverting commit
> ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work.  So there's
> something wrong/incomplete about that fix.
>
> This is a bit urgent since we now have to consider whether to withdraw
> 9.2.2 and issue a hasty 9.2.3.  Do we have a regression here since
> 9.2.1, and if so how bad is it?

I'll look at this now.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PITR potentially broken in 9.2

From
Tatsuo Ishii
Date:
So what status are we on? Are we going to release 9.2.2 as it is?
Or withdraw current 9.2.2?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Andres Freund <andres@2ndquadrant.com> writes:
>> On 2012-12-04 21:27:34 -0500, Tom Lane wrote:
>>> So the upshot is that I propose a patch more like the attached.
>
>> Without having run anything so far it looks good to me.
>
> BTW, while on the theme of the pause feature being several bricks shy of
> a load, it looks to me like the place that it was added to the replay
> loop was less than sane as well.  Presumably the purpose of a pause is
> to let you stop application of the WAL at exactly the current spot;
> but you can *not* do that midway through application of the record,
> and where it is is effectively that.  As soon as we've updated
> xlogctl->replayEndRecPtr, we're committed to replay the record,
> because we can't guarantee that the controlfile minRecoveryPoint
> doesn't get pushed up to that point by buffer flush activity.
> So an abort here could leave the database in an unrestartable condition.
>
> I guess the idea of putting it there was to save one spinlock acquire,
> but I'm having a bit of a hard time believing that one spinlock acquire
> per WAL record means much.  Still we could possibly preserve that
> attribute by moving the pause down to just after the update of
> xlogctl->recoveryLastRecPtr.
>
> Thoughts?
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 19:06:55 +0900, Tatsuo Ishii wrote:
> So what status are we on? Are we going to release 9.2.2 as it is?
> Or withdraw current 9.2.2?

Releasing as-is sounds good. As Tom wrote upthread:

On 2012-12-04 21:27:34 -0500, Tom Lane wrote:
> This is not a regression because the pause logic is broken this same
> way since 9.1.  So I no longer think that we need a rewrap.

Imo the bug isn't all that critical, it cause confusion but no data corruption
or such.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 02:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>>> But the key is, the database was not actually consistent at that
>>> point, and so opening hot standby was a dangerous thing to do.
>>>
>>> The bug that allowed the database to open early (the original topic if
>>> this email chain) was masking this secondary issue.
>
>> Could you check whether the attached patch fixes the behaviour?
>
> Yeah, I had just come to the same conclusion: the bug is not with
> Heikki's fix, but with the pause logic.  The comment says that it
> shouldn't pause unless users can connect to un-pause, but the actual
> implementation of that test is several bricks shy of a load.

OK, I've had a look at this now.

Andres correctly identified the bug above, which was that the backup
end is noted by the XLOG_BACKUP_END record. recoveryStopsHere() was
not changed when that record type was added, so it ignores the
situation that we are waiting for end of backup and yet it stop
anyway. So I concur with Jeff that there is a bug and think that
Andres has provided the clue to a fix. I'll patch that now. Aboriginal
bug extends back to 9.0.

Pause has got nothing at all to do with this, even if there are other
problems there.

> Your patch is better, but it's still missing a bet: what we need to be
> sure of is not merely that we *could* have told the postmaster to start
> hot standby, but that we *actually have done so*.  Otherwise, it's
> flow-of-control-dependent whether it works or not; we could fail if the
> main loop hasn't gotten to CheckRecoveryConsistency since the conditions
> became true.  Looking at the code in CheckRecoveryConsistency, testing
> LocalHotStandbyActive seems appropriate.
>
> I also thought it was pretty dangerous that this absolutely critical
> test was not placed in recoveryPausesHere() itself, rather than relying
> on the call sites to remember to do it.
>
> So the upshot is that I propose a patch more like the attached.

I've reworked pause logic along the lines you suggest. Attached here
for further discussion.

> This is not a regression because the pause logic is broken this same
> way since 9.1.  So I no longer think that we need a rewrap.

I think we do need a rewrap, since the bug is not in pause logic.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-12-05 19:06:55 +0900, Tatsuo Ishii wrote:
>> So what status are we on? Are we going to release 9.2.2 as it is?
>> Or withdraw current 9.2.2?

> Releasing as-is sounds good. As Tom wrote upthread:

> On 2012-12-04 21:27:34 -0500, Tom Lane wrote:
>> This is not a regression because the pause logic is broken this same
>> way since 9.1.  So I no longer think that we need a rewrap.

> Imo the bug isn't all that critical, it cause confusion but no data corruption
> or such.

The real reason it's not critical is that this is a "don't do that" case
anyway.  Unpatched, the system hangs up, but if it were patched you'd
just get a failure complaining "requested stop before consistent state
reached" or whatever the wording is.  Since we've seen only one report
about that since 9.1 came out, the error isn't being made often enough
to justify a panic re-release.  We'll just fix it and move on.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 13:34:05 +0000, Simon Riggs wrote:
> On 5 December 2012 02:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> >>> But the key is, the database was not actually consistent at that
> >>> point, and so opening hot standby was a dangerous thing to do.
> >>>
> >>> The bug that allowed the database to open early (the original topic if
> >>> this email chain) was masking this secondary issue.
> >
> >> Could you check whether the attached patch fixes the behaviour?
> >
> > Yeah, I had just come to the same conclusion: the bug is not with
> > Heikki's fix, but with the pause logic.  The comment says that it
> > shouldn't pause unless users can connect to un-pause, but the actual
> > implementation of that test is several bricks shy of a load.
>
> OK, I've had a look at this now.
>
> Andres correctly identified the bug above, which was that the backup
> end is noted by the XLOG_BACKUP_END record. recoveryStopsHere() was
> not changed when that record type was added, so it ignores the
> situation that we are waiting for end of backup and yet it stop
> anyway. So I concur with Jeff that there is a bug and think that
> Andres has provided the clue to a fix. I'll patch that now. Aboriginal
> bug extends back to 9.0.

I think we may have multiple issues here...

RecoveryStopsHere doesn't need to check for backupEndRequired itself -
it's handled in StartupXLOG itself, just below "Complain if we did not
roll forward far enough" where it actually already checks for
backupEndRequired.

> Pause has got nothing at all to do with this, even if there are other
> problems there.

I think the problem of not erroring out properly as described in
CAMkU=1wo9+TvTy-w9UkSEgwc49kXuu=V=8cEdyaH8CBFyUVeww@mail.gmail.com and
shown in
https://docs.google.com/file/d/0Bzqrh1SO9FcEaTQ2QXhFdDZYaUE/edit?pli=1
can be attributed to the premature recoveryPausesHere() in the if()
below recoveryStopsHere() in StartupXLOG().

> > Your patch is better, but it's still missing a bet: what we need to be
> > sure of is not merely that we *could* have told the postmaster to start
> > hot standby, but that we *actually have done so*.  Otherwise, it's
> > flow-of-control-dependent whether it works or not; we could fail if the
> > main loop hasn't gotten to CheckRecoveryConsistency since the conditions
> > became true.  Looking at the code in CheckRecoveryConsistency, testing
> > LocalHotStandbyActive seems appropriate.
> >
> > I also thought it was pretty dangerous that this absolutely critical
> > test was not placed in recoveryPausesHere() itself, rather than relying
> > on the call sites to remember to do it.
> >
> > So the upshot is that I propose a patch more like the attached.
>
> I've reworked pause logic along the lines you suggest. Attached here
> for further discussion.

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -5008,6 +5008,12 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
>  static void
>  recoveryPausesHere(void)
>  {
> +    /*
> +     * Pause only if users can connect to send a resume message
> +     */
> +    if (!LocalHotStandbyActive)
> +        return;
> +
>      ereport(LOG,
>              (errmsg("recovery has paused"),
>               errhint("Execute pg_xlog_replay_resume() to continue.")));
> @@ -5783,10 +5789,11 @@ StartupXLOG(void)
>                  if (recoveryStopsHere(record, &recoveryApply))
>                  {
>                      /*
> -                     * Pause only if users can connect to send a resume
> -                     * message
> +                     * We've reached stop point, but not yet applied last
> +                     * record. Pause BEFORE final apply, if requested, but
> +                     * only if users can connect to send a resume message
>                       */
> -                    if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY)
> +                    if (recoveryPauseAtTarget && !recoveryApply)
>                      {
>                          SetRecoveryPause(true);
>                          recoveryPausesHere();

I personally would find it easier to read if we did a
SetRecoveryPause(true) regardless of !recoveryApply and just made the
recoveryPausesHere() conditional.

> @@ -5820,28 +5827,26 @@ StartupXLOG(void)
>                  }
>
>                  /*
> +                 * If we are attempting to enter Hot Standby mode, check
> +                 * for pauses and process incoming transactionids.
> +                 */
> +                if (standbyState >= STANDBY_INITIALIZED)
> +                {
> +                    if (recoveryPause)
> +                        recoveryPausesHere();
> +
> +                    if (TransactionIdIsValid(record->xl_xid))
> +                        RecordKnownAssignedTransactionIds(record->xl_xid);
> +                }
> +
> +                /*

It feels wrong to do a RecordKnownAssignedTransactionIds before setting
the replayEndRecPtr. Don't think its really hurtful, but...


>                   * Update shared replayEndRecPtr before replaying this record,
>                   * so that XLogFlush will update minRecoveryPoint correctly.
>                   */
>                  SpinLockAcquire(&xlogctl->info_lck);
>                  xlogctl->replayEndRecPtr = EndRecPtr;
> -                recoveryPause = xlogctl->recoveryPause;
>                  SpinLockRelease(&xlogctl->info_lck);
>
> -                /*
> -                 * Pause only if users can connect to send a resume message
> -                 */
> -                if (recoveryPause && standbyState == STANDBY_SNAPSHOT_READY)
> -                    recoveryPausesHere();
> -
> -                /*
> -                 * If we are attempting to enter Hot Standby mode, process
> -                 * XIDs we see
> -                 */
> -                if (standbyState >= STANDBY_INITIALIZED &&
> -                    TransactionIdIsValid(record->xl_xid))
> -                    RecordKnownAssignedTransactionIds(record->xl_xid);
> -
>                  /* Now apply the WAL record itself */
>                  RmgrTable[record->xl_rmid].rm_redo(EndRecPtr, record);
>
> @@ -5875,6 +5880,7 @@ StartupXLOG(void)
>                   */
>                  SpinLockAcquire(&xlogctl->info_lck);
>                  xlogctl->recoveryLastRecPtr = EndRecPtr;
> +                recoveryPause = xlogctl->recoveryPause;
>                  SpinLockRelease(&xlogctl->info_lck);
>
>                  LastRec = ReadRecPtr;

I think it would be better to set recoveryPause before the if
(recoveryPause) above, otherwise we may need to wait for new records to
stream in. That will only change when we see the log message about
waiting, but still...

> @@ -5883,6 +5889,17 @@ StartupXLOG(void)
>              } while (record != NULL && recoveryContinue);
>
>              /*
> +             * We've reached stop point, but not yet applied last
> +             * record. Pause AFTER final apply, if requested, but
> +             * only if users can connect to send a resume message
> +             */
> +            if (reachedStopPoint && recoveryPauseAtTarget && recoveryApply)
> +            {
> +                SetRecoveryPause(true);
> +                recoveryPausesHere();
> +            }
> +
> +            /*

I find the above comment a bit misleading because by now we have in fact
applied the last record... What about:

When we had reached the stop point we hadn't yet applied the last record
but pause *after stop* was requested, so pause now.



Independent of this patch, I am slightly confused about the whole stop
logic. Isn't the idea that you can stop/start/stop/start/... recovery?
Because if !recoveryApply we break out of the whole recovery loop and
are done with things.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 13:34, Simon Riggs <simon@2ndquadrant.com> wrote:

> Aboriginal bug extends back to 9.0.

I don't see any bug in 9.0 and 9.1, just 9.2+

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 14:33:36 +0000, Simon Riggs wrote:
> On 5 December 2012 13:34, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> > Aboriginal bug extends back to 9.0.
>
> I don't see any bug in 9.0 and 9.1, just 9.2+

Well the pausing logic is clearly broken in 9.1 as well, isn't it?
I.e. you will get:
LOG:  recovery has paused
HINT:  Execute pg_xlog_replay_resume() to continue.

Instead of

FATAL: requested recovery stop point is before consistent recovery point

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 14:33, Andres Freund <andres@2ndquadrant.com> wrote:

> Independent of this patch, I am slightly confused about the whole stop
> logic. Isn't the idea that you can stop/start/stop/start/... recovery?
> Because if !recoveryApply we break out of the whole recovery loop and
> are done with things.

You can, but by shutting down server, updating recovery target, then
restarting server.

That's not a beautiful design, its just waiting for someone to make
recovery targets changeable, which is best done when recovery.conf
parameters are in postgresql.conf

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-12-05 13:34:05 +0000, Simon Riggs wrote:
>> @@ -5883,6 +5889,17 @@ StartupXLOG(void)
>> } while (record != NULL && recoveryContinue);
>>
>> /*
>> +             * We've reached stop point, but not yet applied last
>> +             * record. Pause AFTER final apply, if requested, but
>> +             * only if users can connect to send a resume message
>> +             */
>> +            if (reachedStopPoint && recoveryPauseAtTarget && recoveryApply)
>> +            {
>> +                SetRecoveryPause(true);
>> +                recoveryPausesHere();
>> +            }
>> +
>> +            /*

> I find the above comment a bit misleading because by now we have in fact
> applied the last record...

I'd go further than that: a pause after we've exited the loop is
completely stupid.  The only reason for pausing is to let the user
choose whether to continue applying WAL or not.  If you've already
made that choice, you might as well let the system come up fully.

But I can't make any sense of the rest of this patch, because it seems
to be randomly rearranging a whole lot of stuff that's unrelated to
pausing.  If you think all these changes are in fact necessary, could
you break it down a little more?

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 11:11:23 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2012-12-05 13:34:05 +0000, Simon Riggs wrote:
> >> @@ -5883,6 +5889,17 @@ StartupXLOG(void)
> >> } while (record != NULL && recoveryContinue);
> >>
> >> /*
> >> +             * We've reached stop point, but not yet applied last
> >> +             * record. Pause AFTER final apply, if requested, but
> >> +             * only if users can connect to send a resume message
> >> +             */
> >> +            if (reachedStopPoint && recoveryPauseAtTarget && recoveryApply)
> >> +            {
> >> +                SetRecoveryPause(true);
> >> +                recoveryPausesHere();
> >> +            }
> >> +
> >> +            /*
>
> > I find the above comment a bit misleading because by now we have in fact
> > applied the last record...
>
> I'd go further than that: a pause after we've exited the loop is
> completely stupid.  The only reason for pausing is to let the user
> choose whether to continue applying WAL or not.  If you've already
> made that choice, you might as well let the system come up fully.

Yes, that puzzles me as well.

Basically the whole logical arround recoveryApply seems to be broken
currently. Because if recoveryApply=false we currently don't pause at
all because we jump out of the apply loop with the break. I guess thats
what Simon tried to address with the patch. But the new behaviour seems
to be strane as well a

> But I can't make any sense of the rest of this patch, because it seems
> to be randomly rearranging a whole lot of stuff that's unrelated to
> pausing.  If you think all these changes are in fact necessary, could
> you break it down a little more?

I think part !recoveryApply changes makes sense, but it needs more...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> Basically the whole logical arround recoveryApply seems to be broken
> currently. Because if recoveryApply=false we currently don't pause at
> all because we jump out of the apply loop with the break.

Huh?  That break is after the pause:

                /*
                 * Have we reached our recovery target?
                 */
                if (recoveryStopsHere(record, &recoveryApply))
                {
                    /*
                     * Pause only if users can connect to send a resume
                     * message
                     */
                    if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY)
                    {
                        SetRecoveryPause(true);
                        recoveryPausesHere();
                    }
                    reachedStopPoint = true;    /* see below */
                    recoveryContinue = false;
                    if (!recoveryApply)
                        break;
                }

The point of recoveryApply is that the stop can be defined as occurring
either before or after the current WAL record.  However, I do see your
point, which is that if the stop is defined to be after the current WAL
record then we probably should apply same before pausing.  Otherwise
the pause is a bit useless since the user can't see the state he's being
asked to approve.

The real question here probably needs to be "what is the point of
recoveryPauseAtTarget in the first place?".  I find it hard to envision
what's the point of pausing unless the user has an opportunity to
make a decision about whether to continue applying WAL.  As Simon
mentioned, we seem to be lacking some infrastructure that would let
the user adjust the recovery_target parameters before resuming WAL
processing.  But, assuming for the moment that our workaround for
that is "shutdown the server, adjust recovery.conf, and restart",
is the pause placed in a useful spot for that?

BTW, could we make this more convenient by letting recoveryPausesHere()
reread recovery.conf?  Also, shouldn't the code re-evaluate
recoveryStopsHere() after that?

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 11:40:16 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Basically the whole logical arround recoveryApply seems to be broken
> > currently. Because if recoveryApply=false we currently don't pause at
> > all because we jump out of the apply loop with the break.
>
> Huh?  That break is after the pause:
>
>                 /*
>                  * Have we reached our recovery target?
>                  */
>                 if (recoveryStopsHere(record, &recoveryApply))
>                 {
>                     /*
>                      * Pause only if users can connect to send a resume
>                      * message
>                      */
>                     if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY)
>                     {
>                         SetRecoveryPause(true);
>                         recoveryPausesHere();
>                     }
>                     reachedStopPoint = true;    /* see below */
>                     recoveryContinue = false;
>                     if (!recoveryApply)
>                         break;
>                 }

Oh, yea. I mixed what I read in the patch and what exists today in my
mind, sorry for that.

But it still seems rather strange that we break out of the loop
depending on recovery_target_inclusive which is what recoveryApply
basically is set to.

We do:
    ereport(LOG,
            (errmsg("recovery has paused"),
             errhint("Execute pg_xlog_replay_resume() to continue.")));

if we reach the target, so jumping out of the loop seems really counter
intuitive to me.

> The point of recoveryApply is that the stop can be defined as occurring
> either before or after the current WAL record.  However, I do see your
> point, which is that if the stop is defined to be after the current WAL
> record then we probably should apply same before pausing.  Otherwise
> the pause is a bit useless since the user can't see the state he's being
> asked to approve.

Yes. Seems easy enough if we do the SetRecoveryPause() up there and just
fall through to the pause later in the loop.

> The real question here probably needs to be "what is the point of
> recoveryPauseAtTarget in the first place?".  I find it hard to envision
> what's the point of pausing unless the user has an opportunity to
> make a decision about whether to continue applying WAL.

You can do SELECT pg_xlog_replay_pause|resume(); but that obviously
sucks without further support...

> As Simon
> mentioned, we seem to be lacking some infrastructure that would let
> the user adjust the recovery_target parameters before resuming WAL
> processing.  But, assuming for the moment that our workaround for
> that is "shutdown the server, adjust recovery.conf, and restart",
> is the pause placed in a useful spot for that?

I actually think Simon's proposed pause location makes more sense if
thats the only mode we support, but if so it probably should be the one
for inclusive/non-inclusive replay.

> BTW, could we make this more convenient by letting recoveryPausesHere()
> reread recovery.conf?  Also, shouldn't the code re-evaluate
> recoveryStopsHere() after that?

At least it seems like something we could do without introducing more
functions (i.e. possibly back-branch fit). Given that pauses are
controlled via sql functions setting the target also via an sql
functions seems kinda fitting otherwise.

Greetings,

Andres

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 18:08:01 +0100, Andres Freund wrote:
> On 2012-12-05 11:40:16 -0500, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > Basically the whole logical arround recoveryApply seems to be broken
> > > currently. Because if recoveryApply=false we currently don't pause at
> > > all because we jump out of the apply loop with the break.
> >
> > Huh?  That break is after the pause:
> >
> >                 /*
> >                  * Have we reached our recovery target?
> >                  */
> >                 if (recoveryStopsHere(record, &recoveryApply))
> >                 {
> >                     /*
> >                      * Pause only if users can connect to send a resume
> >                      * message
> >                      */
> >                     if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY)
> >                     {
> >                         SetRecoveryPause(true);
> >                         recoveryPausesHere();
> >                     }
> >                     reachedStopPoint = true;    /* see below */
> >                     recoveryContinue = false;
> >                     if (!recoveryApply)
> >                         break;
> >                 }
>
> Oh, yea. I mixed what I read in the patch and what exists today in my
> mind, sorry for that.
>
> But it still seems rather strange that we break out of the loop
> depending on recovery_target_inclusive which is what recoveryApply
> basically is set to.

That part is obviously bullshit as we also set recoveryContinue = false
which means we will jump out of the loop later. Only after waiting for
another record though ...

Man, that file is confusing.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 16:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The real question here probably needs to be "what is the point of
> recoveryPauseAtTarget in the first place?".  I find it hard to envision
> what's the point of pausing unless the user has an opportunity to
> make a decision about whether to continue applying WAL.  As Simon
> mentioned, we seem to be lacking some infrastructure that would let
> the user adjust the recovery_target parameters before resuming WAL
> processing.  But, assuming for the moment that our workaround for
> that is "shutdown the server, adjust recovery.conf, and restart",
> is the pause placed in a useful spot for that?

> BTW, could we make this more convenient by letting recoveryPausesHere()
> reread recovery.conf?  Also, shouldn't the code re-evaluate
> recoveryStopsHere() after that?

The recovery target and the consistency point are in some ways in
conflict. If the recovery target is before the consistency point there
is no point in stopping there, whether or not we pause. What we should
do is say "recovery target reached, yet recovery not yet consistent,
continuing".
So ISTM that we should make recoveryStopsHere() return false while we
are inconsistent. Problems solved.

We can re-read parameters after we wake from a pause. Presumably only
in HEAD, or do you mean in 9.2/9,1 also?

My earlier patch to rearrange pause logic did a few things, nothing
very dramatic
* Put the logic for whether we're consistent inside
recoveryPausesHere() as requested
* rearrange logic so setting EndPtr is the very last thing we do
before redo for the current record
* Fixes the bug that we don't pause in the correct place if recoveryApply = true
The comment on that new pause location was slightly wrong

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 17:17, Simon Riggs <simon@2ndquadrant.com> wrote:

> The recovery target and the consistency point are in some ways in
> conflict. If the recovery target is before the consistency point there
> is no point in stopping there, whether or not we pause. What we should
> do is say "recovery target reached, yet recovery not yet consistent,
> continuing".
> So ISTM that we should make recoveryStopsHere() return false while we
> are inconsistent. Problems solved.

Patch

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 17:24:42 +0000, Simon Riggs wrote:
> On 5 December 2012 17:17, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> > The recovery target and the consistency point are in some ways in
> > conflict. If the recovery target is before the consistency point there
> > is no point in stopping there, whether or not we pause. What we should
> > do is say "recovery target reached, yet recovery not yet consistent,
> > continuing".
> > So ISTM that we should make recoveryStopsHere() return false while we
> > are inconsistent. Problems solved.

I prefer the previous (fixed) behaviour where we error out if we reach a
recovery target before we are consistent:

    /*
     * Complain if we did not roll forward far enough to render the backup
     * dump consistent.  Note: it is indeed okay to look at the local variable
     * minRecoveryPoint here, even though ControlFile->minRecoveryPoint might
     * be further ahead --- ControlFile->minRecoveryPoint cannot have been
     * advanced beyond the WAL we processed.
     */
    if (InRecovery &&
        (XLByteLT(EndOfLog, minRecoveryPoint) ||
         !XLogRecPtrIsInvalid(ControlFile->backupStartPoint)))
    {
        if (reachedStopPoint)
        {
            /* stopped because of stop request */
            ereport(FATAL,
                    (errmsg("requested recovery stop point is before consistent recovery point")));
        }

        /*
         * Ran off end of WAL before reaching end-of-backup WAL record, or
         * minRecoveryPoint. That's usually a bad sign, indicating that you
         * tried to recover from an online backup but never called
         * pg_stop_backup(), or you didn't archive all the WAL up to that
         * point. However, this also happens in crash recovery, if the system
         * crashes while an online backup is in progress. We must not treat
         * that as an error, or the database will refuse to start up.
         */
        if (InArchiveRecovery || ControlFile->backupEndRequired)
        {
            if (ControlFile->backupEndRequired)
                ereport(FATAL,
                        (errmsg("WAL ends before end of online backup"),
                         errhint("All WAL generated while online backup was taken must be available at recovery.")));
            else if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
                ereport(FATAL,
                        (errmsg("WAL ends before end of online backup"),
                         errhint("Online backup started with pg_start_backup() must be ended with pg_stop_backup(), and
allWAL up to that point must be available at recovery."))); 
            else
                ereport(FATAL,
                      (errmsg("WAL ends before consistent recovery point")));
        }
    }

Seems to be good enough to me. Once the pause logic is fixed obviously.


Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-12-05 17:24:42 +0000, Simon Riggs wrote:
>> So ISTM that we should make recoveryStopsHere() return false while we
>> are inconsistent. Problems solved.

> I prefer the previous (fixed) behaviour where we error out if we reach a
> recovery target before we are consistent:

I agree.  Silently ignoring the user's specification is not good.
(I'm not totally sure about ignoring the pause spec, either, but
there is no good reason to change the established behavior for
the recovery target spec.)

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
I wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On 2012-12-05 17:24:42 +0000, Simon Riggs wrote:
>>> So ISTM that we should make recoveryStopsHere() return false while we
>>> are inconsistent. Problems solved.

>> I prefer the previous (fixed) behaviour where we error out if we reach a
>> recovery target before we are consistent:

> I agree.  Silently ignoring the user's specification is not good.
> (I'm not totally sure about ignoring the pause spec, either, but
> there is no good reason to change the established behavior for
> the recovery target spec.)

On further thought, it seems like recovery_pause_at_target is rather
misdesigned anyway, and taking recovery target parameters from
recovery.conf is an obsolete API that was designed in a world before hot
standby.  What I suggest people really want, if they're trying to
interactively determine how far to roll forward, is this:

(1) A recovery.conf parameter that specifies "pause when hot standby
opens up" (that is, as soon as we have consistency).

(2) A SQL command/function that releases the pause mode *and* specifies
a new target stop point (ie, an interactive way of setting the recovery
target parameters).  The startup process then rolls forward to that
target and pauses again.

(3) A SQL command/function that releases the pause mode and specifies
coming up normally, ie not following the archived WAL any further
(I imagine this would force a timeline switch).

The existing "pause now" function could still fit into this framework;
but it seems to me to have mighty limited usefulness, considering the
speed of WAL replay versus human reaction time.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Wed, Dec 5, 2012 at 8:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The real question here probably needs to be "what is the point of
> recoveryPauseAtTarget in the first place?".  I find it hard to envision
> what's the point of pausing unless the user has an opportunity to
> make a decision about whether to continue applying WAL.

Right now if I'm doing a PITR and want to look around before blessing
the restore, I have to:

0) restore from the base backup.

1) change pg_hba to lock out everyone else

2) do the restore to my best guess of what was just before the tragic DML.

3) Look around and see if I guessed correctly.

4a) If I replayed to far, blow everything away and start over.

4b) If I replayed not far enough, and opened the database normally
upon completion so I could look around, then blow everything away and
start over.

4c) If I replayed not far enough and opened the database in hot
standby, I don't have to blow everything away, I just have to shut
down the server, change the restore point forward, and restart it.

5) change pg_hba back to normal and restart the server.

It would be nice if I could execute 4c without a shutdown/restart, but
certainly a shutdown/restart is better than blowing everything away
and starting over, like you have to do once the restore picks a new
time line.  So while the current pause behavior is incomplete, it is
still useful.

I would also be nice if only the superuser is allowed to connect to
the hot standby when pause_at_recovery_target=true, until after
pg_xlog_replay_resume() is called.  That way I could skip steps 1 and
5--steps which are very easy to forget to do.

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 13:48:53 -0500, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> >> On 2012-12-05 17:24:42 +0000, Simon Riggs wrote:
> >>> So ISTM that we should make recoveryStopsHere() return false while we
> >>> are inconsistent. Problems solved.
>
> >> I prefer the previous (fixed) behaviour where we error out if we reach a
> >> recovery target before we are consistent:
>
> > I agree.  Silently ignoring the user's specification is not good.
> > (I'm not totally sure about ignoring the pause spec, either, but
> > there is no good reason to change the established behavior for
> > the recovery target spec.)
>
> On further thought, it seems like recovery_pause_at_target is rather
> misdesigned anyway, and taking recovery target parameters from
> recovery.conf is an obsolete API that was designed in a world before hot
> standby.  What I suggest people really want, if they're trying to
> interactively determine how far to roll forward, is this:
>
> (1) A recovery.conf parameter that specifies "pause when hot standby
> opens up" (that is, as soon as we have consistency).

> (2) A SQL command/function that releases the pause mode *and* specifies
> a new target stop point (ie, an interactive way of setting the recovery
> target parameters).  The startup process then rolls forward to that
> target and pauses again.
>
> (3) A SQL command/function that releases the pause mode and specifies
> coming up normally, ie not following the archived WAL any further
> (I imagine this would force a timeline switch).

That sounds good. The multitude of options for 2) sounds a bit annoying,
but I am not sure where to cut there.

>
> The existing "pause now" function could still fit into this framework;
> but it seems to me to have mighty limited usefulness, considering the
> speed of WAL replay versus human reaction time.

I think "pause now" is useful for independent purposes. You can use
while operating a normal standby to stop replay for some time if you
need consistent data and then happily resume afterwards (if you have
enough wal stored...).

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> Right now if I'm doing a PITR and want to look around before blessing
> the restore, I have to:
> [ do painful stuff ]

Yeah.  The worst thing about this is the cost of stepping too far
forward, but I doubt we can do much about that --- WAL isn't reversible
and I can't see us making it so.  What we can get rid of is the pain
of shutting down to move the recovery target forward.

Another thought here is that it would be good to have some kind of
visibility of the last few potential stop points (timestamps/XIDs),
so that if you do roll too far forward, you have some idea of what
to try after you reset everything.  A zero-order implementation of
that would be to emit LOG messages as we replay each potential
commit, but maybe we can do better.

> I would also be nice if only the superuser is allowed to connect to
> the hot standby when pause_at_recovery_target=true, until after
> pg_xlog_replay_resume() is called.

Uh, why?  Other users won't be able to do anything except look around;
they can't force the database to become read/write.  I can't see that
it's a good idea for recovery to play games with the pg_hba rules;
too much chance of screwing things up for too little benefit.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>>> On 2012-12-05 17:24:42 +0000, Simon Riggs wrote:
>>>> So ISTM that we should make recoveryStopsHere() return false while we
>>>> are inconsistent. Problems solved.
>
>>> I prefer the previous (fixed) behaviour where we error out if we reach a
>>> recovery target before we are consistent:
>
>> I agree.  Silently ignoring the user's specification is not good.
>> (I'm not totally sure about ignoring the pause spec, either, but
>> there is no good reason to change the established behavior for
>> the recovery target spec.)
>
> On further thought, it seems like recovery_pause_at_target is rather
> misdesigned anyway, and taking recovery target parameters from
> recovery.conf is an obsolete API that was designed in a world before hot
> standby.  What I suggest people really want, if they're trying to
> interactively determine how far to roll forward, is this:
>
> (1) A recovery.conf parameter that specifies "pause when hot standby
> opens up" (that is, as soon as we have consistency).

Agreed.

> (2) A SQL command/function that releases the pause mode *and* specifies
> a new target stop point (ie, an interactive way of setting the recovery
> target parameters).  The startup process then rolls forward to that
> target and pauses again.

That was my original patch, IIRC, with different functions for each
target type, plus an advance N records function for use in
debugging/automated testing.

Having a single function would be possible, taking a single TEXT
parameter that we parse just like the recovery.conf, taking
semi-colons as newlines.

> (3) A SQL command/function that releases the pause mode and specifies
> coming up normally, ie not following the archived WAL any further
> (I imagine this would force a timeline switch).
>
> The existing "pause now" function could still fit into this framework;
> but it seems to me to have mighty limited usefulness, considering the
> speed of WAL replay versus human reaction time.

I think that was in there also.

Can't remember why we didn't go for the full API last time. I'll have
another go, in HEAD.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 5 December 2012 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> On further thought, it seems like recovery_pause_at_target is rather
>> misdesigned anyway, and taking recovery target parameters from
>> recovery.conf is an obsolete API that was designed in a world before hot
>> standby.  What I suggest people really want, if they're trying to
>> interactively determine how far to roll forward, is this:
>> ...

> Can't remember why we didn't go for the full API last time. I'll have
> another go, in HEAD.

That's fine, but the immediate question is what are we doing to fix
the back branches.  I think everyone is clear that we should be testing
LocalHotStandbyActive rather than precursor conditions to see if a pause
is allowed, but are we going to do anything more than that?

The only other thing I really wanted to do is not have the in-loop pause
occur after we've taken actions that are effectively part of the WAL
apply step.  I think ideally it should happen just before or after the
recoveryStopsHere stanza.  Last night I worried about adding an extra
spinlock acquire to make that work, but today I wonder if we couldn't
get away with just a naked

    if (xlogctl->recoveryPause)
        recoveryPausesHere();

The argument for this is that although we might fetch a slightly stale
value of the shared variable, it can't be very stale --- certainly no
older than the spinlock acquisition near the bottom of the previous
iteration of the loop.  And this is a highly asynchronous feature
anyhow, so fuzziness of plus or minus one WAL record in when the pause
gets honored is not going to be detectable.  Hence an extra spinlock
acquisition is not worth the cost.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 21:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 5 December 2012 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> On further thought, it seems like recovery_pause_at_target is rather
>>> misdesigned anyway, and taking recovery target parameters from
>>> recovery.conf is an obsolete API that was designed in a world before hot
>>> standby.  What I suggest people really want, if they're trying to
>>> interactively determine how far to roll forward, is this:
>>> ...
>
>> Can't remember why we didn't go for the full API last time. I'll have
>> another go, in HEAD.
>
> That's fine, but the immediate question is what are we doing to fix
> the back branches.  I think everyone is clear that we should be testing
> LocalHotStandbyActive rather than precursor conditions to see if a pause
> is allowed, but are we going to do anything more than that?

No

> The only other thing I really wanted to do is not have the in-loop pause
> occur after we've taken actions that are effectively part of the WAL
> apply step.  I think ideally it should happen just before or after the
> recoveryStopsHere stanza.  Last night I worried about adding an extra
> spinlock acquire to make that work, but today I wonder if we couldn't
> get away with just a naked
>
>         if (xlogctl->recoveryPause)
>                 recoveryPausesHere();
>
> The argument for this is that although we might fetch a slightly stale
> value of the shared variable, it can't be very stale --- certainly no
> older than the spinlock acquisition near the bottom of the previous
> iteration of the loop.  And this is a highly asynchronous feature
> anyhow, so fuzziness of plus or minus one WAL record in when the pause
> gets honored is not going to be detectable.  Hence an extra spinlock
> acquisition is not worth the cost.

Yep, thats fine.

Are you doing this or do you want me to? Don't mind either way.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Yep, thats fine.

> Are you doing this or do you want me to? Don't mind either way.

I've got a patch for most of it already, so happy to do it.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 16:15:38 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On 5 December 2012 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> On further thought, it seems like recovery_pause_at_target is rather
> >> misdesigned anyway, and taking recovery target parameters from
> >> recovery.conf is an obsolete API that was designed in a world before hot
> >> standby.  What I suggest people really want, if they're trying to
> >> interactively determine how far to roll forward, is this:
> >> ...
>
> > Can't remember why we didn't go for the full API last time. I'll have
> > another go, in HEAD.
>
> That's fine, but the immediate question is what are we doing to fix
> the back branches.  I think everyone is clear that we should be testing
> LocalHotStandbyActive rather than precursor conditions to see if a pause
> is allowed, but are we going to do anything more than that?

I'd like to have inclusive/non-inclusive stops some resemblance of
sanity.

Raw patch including your earlier LocalHotStandbyActive one attached.

> The only other thing I really wanted to do is not have the in-loop pause
> occur after we've taken actions that are effectively part of the WAL
> apply step.  I think ideally it should happen just before or after the
> recoveryStopsHere stanza.  Last night I worried about adding an extra
> spinlock acquire to make that work, but today I wonder if we couldn't
> get away with just a naked
>
>     if (xlogctl->recoveryPause)
>         recoveryPausesHere();

> The argument for this is that although we might fetch a slightly stale
> value of the shared variable, it can't be very stale --- certainly no
> older than the spinlock acquisition near the bottom of the previous
> iteration of the loop.  And this is a highly asynchronous feature
> anyhow, so fuzziness of plus or minus one WAL record in when the pause
> gets honored is not going to be detectable.  Hence an extra spinlock
> acquisition is not worth the cost.

Seems safe enough to me.  But I am not sure its necessary, if we move
the recoveryPausesHere() down after replaying the record, fetching
inside the spinlock for recoveryLastRecPtr seems to be natural enough.

Greetings,

Andres Freund

Attachment

Re: PITR potentially broken in 9.2

From
Robert Haas
Date:
On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The argument for this is that although we might fetch a slightly stale
> value of the shared variable, it can't be very stale --- certainly no
> older than the spinlock acquisition near the bottom of the previous
> iteration of the loop.  And this is a highly asynchronous feature
> anyhow, so fuzziness of plus or minus one WAL record in when the pause
> gets honored is not going to be detectable.  Hence an extra spinlock
> acquisition is not worth the cost.

I wonder whether the cost of an extra spinlock acquire/release cycle
is really noticeable here.  That can get expensive in a hurry when
lots of processes are contending the spinlock ... but I think that
shouldn't be the case here; most of the accesses will be coming from
the startup process.  Of course atomic operations are much more
expensive than ordinary CPU instructions even under the best of
circumstances, but is that really material here?  I'm just wondering
whether this is premature optimization that's going to potentially
bite us later in the form of subtle, hard-to-reproduce bugs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The argument for this is that although we might fetch a slightly stale
>> value of the shared variable, it can't be very stale --- certainly no
>> older than the spinlock acquisition near the bottom of the previous
>> iteration of the loop.  And this is a highly asynchronous feature
>> anyhow, so fuzziness of plus or minus one WAL record in when the pause
>> gets honored is not going to be detectable.  Hence an extra spinlock
>> acquisition is not worth the cost.

> I wonder whether the cost of an extra spinlock acquire/release cycle
> is really noticeable here.  That can get expensive in a hurry when
> lots of processes are contending the spinlock ... but I think that
> shouldn't be the case here; most of the accesses will be coming from
> the startup process.  Of course atomic operations are much more
> expensive than ordinary CPU instructions even under the best of
> circumstances, but is that really material here?  I'm just wondering
> whether this is premature optimization that's going to potentially
> bite us later in the form of subtle, hard-to-reproduce bugs.

I have the same doubt about whether it's really significant.  However,
I think it's much more likely that we'd find out it is significant than
that we'd find out anybody could detect the lack of a memory barrier
there.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2012-12-05 16:15:38 -0500, Tom Lane wrote:
>> That's fine, but the immediate question is what are we doing to fix
>> the back branches.  I think everyone is clear that we should be testing
>> LocalHotStandbyActive rather than precursor conditions to see if a pause
>> is allowed, but are we going to do anything more than that?

> I'd like to have inclusive/non-inclusive stops some resemblance of
> sanity.
> Raw patch including your earlier LocalHotStandbyActive one attached.

I looked at this but couldn't get excited about using it.  There were
some obvious bugs ("if (!recoveryStopsHere)"?) but the real problem is
that I think we're going to end up reworking the interaction between
recoveryPausesHere and the recoveryStopsHere stanza quite a bit.
In particular, we should expect that we're going to need to respond
to a changed recovery target after any pause.  So placing a call of
recoveryPausesHere down at the loop bottom where the action is already
predetermined seems like the wrong thing.  I'm not entirely sure what
a clean design would look like, but that's not it.  I'll leave it to
Simon to think about what we want to do there next.

            regards, tom lane

Re: PITR potentially broken in 9.2

From
Simon Riggs
Date:
On 5 December 2012 22:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The argument for this is that although we might fetch a slightly stale
>>> value of the shared variable, it can't be very stale --- certainly no
>>> older than the spinlock acquisition near the bottom of the previous
>>> iteration of the loop.  And this is a highly asynchronous feature
>>> anyhow, so fuzziness of plus or minus one WAL record in when the pause
>>> gets honored is not going to be detectable.  Hence an extra spinlock
>>> acquisition is not worth the cost.
>
>> I wonder whether the cost of an extra spinlock acquire/release cycle
>> is really noticeable here.  That can get expensive in a hurry when
>> lots of processes are contending the spinlock ... but I think that
>> shouldn't be the case here; most of the accesses will be coming from
>> the startup process.  Of course atomic operations are much more
>> expensive than ordinary CPU instructions even under the best of
>> circumstances, but is that really material here?  I'm just wondering
>> whether this is premature optimization that's going to potentially
>> bite us later in the form of subtle, hard-to-reproduce bugs.
>
> I have the same doubt about whether it's really significant.  However,
> I think it's much more likely that we'd find out it is significant than
> that we'd find out anybody could detect the lack of a memory barrier
> there.

Agreed. And any future logic to stop at a specific point will be
exactly precise because the decision and action will be taken in same
process.

Patch looks good.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Andres Freund
Date:
On 2012-12-05 18:35:47 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2012-12-05 16:15:38 -0500, Tom Lane wrote:
> >> That's fine, but the immediate question is what are we doing to fix
> >> the back branches.  I think everyone is clear that we should be testing
> >> LocalHotStandbyActive rather than precursor conditions to see if a pause
> >> is allowed, but are we going to do anything more than that?
>
> > I'd like to have inclusive/non-inclusive stops some resemblance of
> > sanity.
> > Raw patch including your earlier LocalHotStandbyActive one attached.
>
> I looked at this but couldn't get excited about using it. There were
> some obvious bugs ("if (!recoveryStopsHere)"?) but the real problem is
> that I think we're going to end up reworking the interaction between
> recoveryPausesHere and the recoveryStopsHere stanza quite a bit.
> In particular, we should expect that we're going to need to respond
> to a changed recovery target after any pause.  So placing a call of
> recoveryPausesHere down at the loop bottom where the action is already
> predetermined seems like the wrong thing.  I'm not entirely sure what
> a clean design would look like, but that's not it.  I'll leave it to
> Simon to think about what we want to do there next.

Yes, the patch obviously wasn't ready for anything. Just seemed easier
that way than trying to describe what I think is sensible.

What I dislike with what you committed is that the state you're
investigating during the pause isn't the one youre going to end up
recoveryApply == true. That seems dangerous to me, even if its going to
be reworked in HEAD.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PITR potentially broken in 9.2

From
Jeff Janes
Date:
On Wed, Dec 5, 2012 at 11:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> Right now if I'm doing a PITR and want to look around before blessing
>> the restore, I have to:
>> [ do painful stuff ]
>
> Yeah.  The worst thing about this is the cost of stepping too far
> forward, but I doubt we can do much about that --- WAL isn't reversible
> and I can't see us making it so.  What we can get rid of is the pain
> of shutting down to move the recovery target forward.
>
> Another thought here is that it would be good to have some kind of
> visibility of the last few potential stop points (timestamps/XIDs),
> so that if you do roll too far forward, you have some idea of what
> to try after you reset everything.  A zero-order implementation of
> that would be to emit LOG messages as we replay each potential
> commit, but maybe we can do better.

probably embellishments on xlogdump or xlogreader would be the way to go.

>
>> I would also be nice if only the superuser is allowed to connect to
>> the hot standby when pause_at_recovery_target=true, until after
>> pg_xlog_replay_resume() is called.
>
> Uh, why?  Other users won't be able to do anything except look around;

On some systems, 95% of users never do anything (that the database
knows about) except look around.  But I think it would be unfortunate
to accidentally show them data that will soon be revoked.

> they can't force the database to become read/write.  I can't see that
> it's a good idea for recovery to play games with the pg_hba rules;
> too much chance of screwing things up for too little benefit.

OK.  I don't know at all what is involved in implementing such a
thing.  But a DBA in the middle of a rather arcane but urgent task has
a pretty high chance of screwing things up, too.

Cheers,

Jeff

Re: PITR potentially broken in 9.2

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> What I dislike with what you committed is that the state you're
> investigating during the pause isn't the one youre going to end up
> recoveryApply == true. That seems dangerous to me, even if its going to
> be reworked in HEAD.

Agreed, but it's been like that since the pause feature was created, and
we've not heard complaints (which TBH makes me wonder how much use the
feature has actually seen).  I'm hesitant to change the semantics in the
back branches more than necessary to avoid clear-showstopper cases.

            regards, tom lane