Thread: Confusing recovery message when target not hit

Confusing recovery message when target not hit

From
Thom Brown
Date:
Hi all,

When recovery_target_time is set, but recovery finishes before it reaches that time, it outputs "before 2000-01-01 00:00:00+00" to the .history file.  This is because it uses recoveryStopTime, which is initialised to 0, but is never set, and is then passed to timestamptz_to_str, which gives it this date output.

A similar problem exists for recovery_target_xid.  When recovery finishes before reaching the specified xid, it outputs "before transaction 0" to the .history file, which is also confusing.

Could we produce something more meaningful?  I've attached a patch which changes it to say 'recovery reached consistency before recovery target time of "<recovery_target_time>"' and 'recovery reached consistency before recovery target xid of "<recovery_target_xid>"'.

It may be the wrong way of going about it, but you get the idea of what I'm suggesting we output instead.

Thom
Attachment

Re: Confusing recovery message when target not hit

From
Michael Paquier
Date:
On Sat, Jun 11, 2016 at 9:44 AM, Thom Brown <thom@linux.com> wrote:
> It may be the wrong way of going about it, but you get the idea of what I'm
> suggesting we output instead.

Surely things could be better. So +1 to be more verbose here.

+            if (recoveryStopTime == 0)
+                snprintf(reason, sizeof(reason),
+                        "recovery reached consistency before recovery
target time of \"%s\"\n",
+                        timestamptz_to_str(recoveryTargetTime));
"Reaching consistency" is not exact for here. I'd rather say "finished
recovery without reaching target blah"

+            if (recoveryStopXid == 0)
Checking for InvalidTransactionId is better here.

And it would be good to initialize recoveryStopTime and
recoveryStopXid as those are set only when a recovery target is
reached.
-- 
Michael



Re: Confusing recovery message when target not hit

From
David Steele
Date:
On 6/11/16 8:22 AM, Michael Paquier wrote:
> On Sat, Jun 11, 2016 at 9:44 AM, Thom Brown <thom@linux.com> wrote:
>> It may be the wrong way of going about it, but you get the idea of what I'm
>> suggesting we output instead.
> 
> Surely things could be better. So +1 to be more verbose here.
> 
> +            if (recoveryStopTime == 0)
> +                snprintf(reason, sizeof(reason),
> +                        "recovery reached consistency before recovery
> target time of \"%s\"\n",
> +                        timestamptz_to_str(recoveryTargetTime));
> "Reaching consistency" is not exact for here. I'd rather say "finished
> recovery without reaching target blah"
> 
> +            if (recoveryStopXid == 0)
> Checking for InvalidTransactionId is better here.
> 
> And it would be good to initialize recoveryStopTime and
> recoveryStopXid as those are set only when a recovery target is
> reached.

+1 for Michael's wording.  It's not very clear in the logs right now if
a recovery target was missed.  That makes it very difficult for the user
to determine if PITR worked or not.

-- 
-David
david@pgmasters.net



Re: Confusing recovery message when target not hit

From
Michael Paquier
Date:
On Sun, Jun 12, 2016 at 12:46 AM, David Steele <david@pgmasters.net> wrote:
> On 6/11/16 8:22 AM, Michael Paquier wrote:
>> On Sat, Jun 11, 2016 at 9:44 AM, Thom Brown <thom@linux.com> wrote:
>>> It may be the wrong way of going about it, but you get the idea of what I'm
>>> suggesting we output instead.
>>
>> Surely things could be better. So +1 to be more verbose here.
>>
>> +            if (recoveryStopTime == 0)
>> +                snprintf(reason, sizeof(reason),
>> +                        "recovery reached consistency before recovery
>> target time of \"%s\"\n",
>> +                        timestamptz_to_str(recoveryTargetTime));
>> "Reaching consistency" is not exact for here. I'd rather say "finished
>> recovery without reaching target blah"
>>
>> +            if (recoveryStopXid == 0)
>> Checking for InvalidTransactionId is better here.
>>
>> And it would be good to initialize recoveryStopTime and
>> recoveryStopXid as those are set only when a recovery target is
>> reached.
>
> +1 for Michael's wording.  It's not very clear in the logs right now if
> a recovery target was missed.  That makes it very difficult for the user
> to determine if PITR worked or not.

By the way, we surely want to have the same logic for a recovery
target name. It could be possible to reach the end of recovery without
reaching the goal of recovery.conf. It would be good to be verbose in
this case as well by checking for recoveryStopName[0] = '\0'.
-- 
Michael



Re: Confusing recovery message when target not hit

From
Thom Brown
Date:
On 11 June 2016 at 13:22, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sat, Jun 11, 2016 at 9:44 AM, Thom Brown <thom@linux.com> wrote:
> It may be the wrong way of going about it, but you get the idea of what I'm
> suggesting we output instead.

Surely things could be better. So +1 to be more verbose here.

+            if (recoveryStopTime == 0)
+                snprintf(reason, sizeof(reason),
+                        "recovery reached consistency before recovery
target time of \"%s\"\n",
+                        timestamptz_to_str(recoveryTargetTime));
"Reaching consistency" is not exact for here. I'd rather say "finished
recovery without reaching target blah"

Yeah, sounds fine.
 

+            if (recoveryStopXid == 0)
Checking for InvalidTransactionId is better here.

Agreed.
 
And it would be good to initialize recoveryStopTime and
recoveryStopXid as those are set only when a recovery target is
reached.

Aren't those already set by recoveryStopsBefore()?

Revised patch attached, with new wording and covering recovery target name case.
 
Thom
Attachment

Re: Confusing recovery message when target not hit

From
Michael Paquier
Date:
On Sun, Jun 12, 2016 at 7:52 PM, Thom Brown <thom@linux.com> wrote:
> Aren't those already set by recoveryStopsBefore()?

It is possible to exit the main redo loop if a NULL record is found
after calling ReadRecord, in which case those would not be set, no?
-- 
Michael



Re: Confusing recovery message when target not hit

From
Thom Brown
Date:
On 12 June 2016 at 12:51, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sun, Jun 12, 2016 at 7:52 PM, Thom Brown <thom@linux.com> wrote:
> Aren't those already set by recoveryStopsBefore()?

It is possible to exit the main redo loop if a NULL record is found
after calling ReadRecord, in which case those would not be set, no?

I'm apprehensive about initialising those values myself as I don't want to set them at a point where they may potentially already be set.

And I've noticed that I didn't re-read my own output messages, so I've corrected them in the attached patch.

Thom
Attachment

Re: Confusing recovery message when target not hit

From
Michael Paquier
Date:
On Mon, Jun 13, 2016 at 9:53 AM, Thom Brown <thom@linux.com> wrote:
> On 12 June 2016 at 12:51, Michael Paquier <michael.paquier@gmail.com> wrote:
>>
>> On Sun, Jun 12, 2016 at 7:52 PM, Thom Brown <thom@linux.com> wrote:
>> > Aren't those already set by recoveryStopsBefore()?
>>
>> It is possible to exit the main redo loop if a NULL record is found
>> after calling ReadRecord, in which case those would not be set, no?
>
>
> I'm apprehensive about initialising those values myself as I don't want to
> set them at a point where they may potentially already be set.

As your patch relies on checks on the variables holding the recovery
stop information as not being set, initializing them before entering
in the REDO phase (say just before 6435:xlog.c on HEAD) is the safest
thing to do IMO.
-- 
Michael