Thread: WAL Restore process during recovery

WAL Restore process during recovery

From

Simon Riggs

Date:

15 January 2012, 13:06:27

WALRestore process asynchronously executes restore_command while
recovery continues working.

Overlaps downloading of next WAL file to reduce time delays in file
based archive recovery.

Handles cases of file-only and streaming/file correctly.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

walrestore_process.v1.patch

Re: WAL Restore process during recovery

From

Fujii Masao

Date:

17 January 2012, 02:52:48

On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> WALRestore process asynchronously executes restore_command while
> recovery continues working.
>
> Overlaps downloading of next WAL file to reduce time delays in file
> based archive recovery.
>
> Handles cases of file-only and streaming/file correctly.

Though I've not reviewed the patch deeply yet, I observed the following
two problems when I tested the patch.

When I set up streaming replication + archive (i.e., restore_command is set)
and started the standby, I got the following error:
   FATAL:  all AuxiliaryProcs are in use   LOG:  walrestore process (PID 18839) exited with exit code 1

When I started an archive recovery without setting restore_command,
it successfully finished.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: WAL Restore process during recovery

From

Simon Riggs

Date:

17 January 2012, 06:39:55

On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> WALRestore process asynchronously executes restore_command while
>> recovery continues working.
>>
>> Overlaps downloading of next WAL file to reduce time delays in file
>> based archive recovery.
>>
>> Handles cases of file-only and streaming/file correctly.
>
> Though I've not reviewed the patch deeply yet, I observed the following
> two problems when I tested the patch.
>
> When I set up streaming replication + archive (i.e., restore_command is set)
> and started the standby, I got the following error:
>
>    FATAL:  all AuxiliaryProcs are in use
>    LOG:  walrestore process (PID 18839) exited with exit code 1
>
> When I started an archive recovery without setting restore_command,
> it successfully finished.

Oh, I did have NUM_AUXILIARY_PROCS increased at one point, but I
"realised" it wasn't needed and removed it. Will change that. Thanks.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: WAL Restore process during recovery

From

Simon Riggs

Date:

19 January 2012, 15:17:36

On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> WALRestore process asynchronously executes restore_command while
>> recovery continues working.
>>
>> Overlaps downloading of next WAL file to reduce time delays in file
>> based archive recovery.
>>
>> Handles cases of file-only and streaming/file correctly.
>
> Though I've not reviewed the patch deeply yet, I observed the following
> two problems when I tested the patch.
>
> When I set up streaming replication + archive (i.e., restore_command is set)
> and started the standby, I got the following error:
>
>    FATAL:  all AuxiliaryProcs are in use
>    LOG:  walrestore process (PID 18839) exited with exit code 1

Fixed and better documented.

> When I started an archive recovery without setting restore_command,
> it successfully finished.

Not sure exactly what you mean, but I fixed a bug that might be
something you're seeing.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

walrestore_process.v2.patch

Re: WAL Restore process during recovery

From

Fujii Masao

Date:

19 January 2012, 23:43:40

On Fri, Jan 20, 2012 at 4:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> WALRestore process asynchronously executes restore_command while
>>> recovery continues working.
>>>
>>> Overlaps downloading of next WAL file to reduce time delays in file
>>> based archive recovery.
>>>
>>> Handles cases of file-only and streaming/file correctly.
>>
>> Though I've not reviewed the patch deeply yet, I observed the following
>> two problems when I tested the patch.
>>
>> When I set up streaming replication + archive (i.e., restore_command is set)
>> and started the standby, I got the following error:
>>
>>    FATAL:  all AuxiliaryProcs are in use
>>    LOG:  walrestore process (PID 18839) exited with exit code 1
>
> Fixed and better documented.
>
>> When I started an archive recovery without setting restore_command,
>> it successfully finished.
>
> Not sure exactly what you mean, but I fixed a bug that might be
> something you're seeing.

Thanks!

But you forgot to include walrestore.c and .h in the patch. Can you submit
the updated version of the patch?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: WAL Restore process during recovery

From

Simon Riggs

Date:

20 January 2012, 06:39:27

On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Requested update


--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

walrestore_process.v2.patch

Re: WAL Restore process during recovery

From

Fujii Masao

Date:

20 January 2012, 06:51:17

On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Requested update

Thanks! Will review.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: WAL Restore process during recovery

From

Fujii Masao

Date:

23 January 2012, 08:24:09

On Fri, Jan 20, 2012 at 7:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>> Requested update
>
> Thanks! Will review.

In StartChildProcess(), the code which emits an error when fork of walrestore
fails is required.

In reaper(), the following comment needs to be updated because an unexpected
exit (including FATAL) is treated as a crash in the patch.
/* * Was it the wal restore? If exit status is zero (normal) or one * (FATAL exit), we assume everything
isall right just like normal * backends. */ if (pid == WalRestorePID)

Why does walrestore need to be invoked even when restore_command is
not specified? It seems to be useless. We invoke walreceiver only when
primary_conninfo is specified now. Similarly we should invoke walrestore
only when restore_command is specified?

When I set up the file-based log-shipping environment using pg_standby,
ran "pgbench -i -s2", waited for walrestore to restore at least one WAL
file, and created the trigger file, then I encounterd the following error in
the standby.
sby LOG: startup process requests 000000010000000000000003 from archive trigger file found: smart failover sby
LOG: startup process sees last file was 000000010000000000000003 sby FATAL: could not rename file
"pg_xlog/RECOVERYXLOG"to

"pg_xlog/000000010000000000000003": No such file or directory sby LOG: startup process (PID 11079) exited with exit
code1 sby LOG: terminating any other active server processes

When I set up streaming replication with setting restore_command,
I got the following messages repeatedly. The WAL file name was always
"000000000000000000000000".
sby1 LOG: walrestore checking for next file to restore sby1 LOG: restore of 000000000000000000000000 is already
complete,so sleep

In PostmasterStateMachine(), the following comment needs to mention WALRestore.
* PM_WAIT_READONLY state ends when we have no regular backends that * have been started during recovery. We
killthe startup and * walreceiver processes and transition to PM_WAIT_BACKENDS. Ideally,

In walrestore.c, the following comments seem to be incorrect. At least
an unexpected
exit of WALRestore doesn't start a recovery cycle in the patch.
* If the WAL restore exits unexpectedly, the postmaster treats
that the same * as a backend crash: shared memory may be corrupted, so remaining backends * should be killed by
SIGQUITand then a recovery cycle started.

In walrestore.c
+ * Main entry point for walrestore process
+ *
+ * This is invoked from BootstrapMain, which has already created the basic
+ * execution environment, but not enabled signals yet.

BootstrapMain() doesn't exist, and it should be changed to
AuxiliaryProcessMain().
This is not a fault of the patch. There are the same typos in
bgwriter.c, walwriter.c
and checkpointer.c

In walrestore.c
+ * SIGUSR1 is presently unused; keep it spare in case someday we want this
+ * process to participate in ProcSignal signalling.

The above comment is incorrect because SIGUSR1 is presently used.

+ /*
+ * From here on, elog(ERROR) should end with exit(1), not send
+ * control back to the sigsetjmp block above
+ */
+ ExitOnAnyError = true;

The above is not required because sigsetjmp is not used in walrestore.c

+ /* Normal exit from the walwriter is here */
+ proc_exit(0); /* done */

Typo: s/walwriter/walrestore

I've not reviewed the patch enough yet. Will review the patch tomorrow again.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: WAL Restore process during recovery

From

Simon Riggs

Date:

23 January 2012, 09:04:30

On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

> I've not reviewed the patch enough yet. Will review the patch tomorrow again.

Thanks very much. I'm sure that's enough to keep me busy a few days.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: WAL Restore process during recovery

From

Simon Riggs

Date:

23 January 2012, 11:23:45

On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

> In StartChildProcess(), the code which emits an error when fork of walrestore
> fails is required.

OK

> In reaper(), the following comment needs to be updated because an unexpected
> exit (including FATAL) is treated as a crash in the patch.
>
>                /*
>                 * Was it the wal restore?  If exit status is zero (normal) or one
>                 * (FATAL exit), we assume everything is all right just like normal
>                 * backends.
>                 */
>                if (pid == WalRestorePID)

OK

> Why does walrestore need to be invoked even when restore_command is
> not specified? It seems to be useless. We invoke walreceiver only when
> primary_conninfo is specified now. Similarly we should invoke walrestore
> only when restore_command is specified?

walreceiver is shutdown and restarted in case of failed connection.
That never happens with walrestore because the command is run each
time - when we issue system(3) a new process is forked to run the
command. So there is no specific cleanup to perform and so no reason
for a managed cleanup process.

So I can't see a specific reason to change that. Do you think it makes
a difference?


> When I set up the file-based log-shipping environment using pg_standby,
> ran "pgbench -i -s2", waited for walrestore to restore at least one WAL
> file, and created the trigger file, then I encounterd the following error in
> the standby.
>
>    sby LOG:  startup process requests 000000010000000000000003 from archive
>    trigger file found: smart failover
>    sby LOG:  startup process sees last file was 000000010000000000000003
>    sby FATAL:  could not rename file "pg_xlog/RECOVERYXLOG" to
> "pg_xlog/000000010000000000000003": No such file or directory
>    sby LOG:  startup process (PID 11079) exited with exit code 1
>    sby LOG:  terminating any other active server processes

Will look further.

> When I set up streaming replication with setting restore_command,
> I got the following messages repeatedly. The WAL file name was always
> "000000000000000000000000".

Will look further.

>    sby1 LOG:  walrestore checking for next file to restore
>    sby1 LOG:  restore of 000000000000000000000000 is already complete, so sleep

Will look further.

> In PostmasterStateMachine(), the following comment needs to mention WALRestore.
>
>                 * PM_WAIT_READONLY state ends when we have no regular backends that
>                 * have been started during recovery.  We kill the startup and
>                 * walreceiver processes and transition to PM_WAIT_BACKENDS.  Ideally,

OK

> In walrestore.c, the following comments seem to be incorrect. At least
> an unexpected
> exit of WALRestore doesn't start a recovery cycle in the patch.
>
>     * If the WAL restore exits unexpectedly, the postmaster treats
> that the same
>     * as a backend crash: shared memory may be corrupted, so remaining backends
>     * should be killed by SIGQUIT and then a recovery cycle started.

Yes it does...

> In walrestore.c
> + * Main entry point for walrestore process
> + *
> + * This is invoked from BootstrapMain, which has already created the basic
> + * execution environment, but not enabled signals yet.
>
> BootstrapMain() doesn't exist, and it should be changed to
> AuxiliaryProcessMain().
> This is not a fault of the patch. There are the same typos in
> bgwriter.c, walwriter.c
> and checkpointer.c

OK, will fix.

> In walrestore.c
> +        * SIGUSR1 is presently unused; keep it spare in case someday we want this
> +        * process to participate in ProcSignal signalling.
>
> The above comment is incorrect because SIGUSR1 is presently used.

OK

> +                       /*
> +                        * From here on, elog(ERROR) should end with exit(1), not send
> +                        * control back to the sigsetjmp block above
> +                        */
> +                       ExitOnAnyError = true;
>
> The above is not required because sigsetjmp is not used in walrestore.c

OK

> +                       /* Normal exit from the walwriter is here */
> +                       proc_exit(0);           /* done */
>
> Typo: s/walwriter/walrestore

OK


Cleaned up the points noted, new patch attached in case you wish to
review further.

Still has bug, so still with me to fix.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

walrestore_process.v3.patch

Re: WAL Restore process during recovery

From

Fujii Masao

Date:

24 January 2012, 05:44:03

On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Why does walrestore need to be invoked even when restore_command is
>> not specified? It seems to be useless. We invoke walreceiver only when
>> primary_conninfo is specified now. Similarly we should invoke walrestore
>> only when restore_command is specified?
>
> walreceiver is shutdown and restarted in case of failed connection.
> That never happens with walrestore because the command is run each
> time - when we issue system(3) a new process is forked to run the
> command. So there is no specific cleanup to perform and so no reason
> for a managed cleanup process.
>
> So I can't see a specific reason to change that. Do you think it makes
> a difference?

Yes. When restore_command is not specified in recovery.conf, walrestore
process doesn't do any useful activity and just wastes CPU cycle. Which
might be harmless for a functionality of recovery, but ISTM it's better not
to start up walrestore in that case to avoid the waste of cycle.

> Cleaned up the points noted, new patch attached in case you wish to
> review further.
>
> Still has bug, so still with me to fix.

Thanks! Will review further.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: WAL Restore process during recovery

From

Simon Riggs

Date:

24 January 2012, 05:49:36

On Tue, Jan 24, 2012 at 9:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> Why does walrestore need to be invoked even when restore_command is
>>> not specified? It seems to be useless. We invoke walreceiver only when
>>> primary_conninfo is specified now. Similarly we should invoke walrestore
>>> only when restore_command is specified?
>>
>> walreceiver is shutdown and restarted in case of failed connection.
>> That never happens with walrestore because the command is run each
>> time - when we issue system(3) a new process is forked to run the
>> command. So there is no specific cleanup to perform and so no reason
>> for a managed cleanup process.
>>
>> So I can't see a specific reason to change that. Do you think it makes
>> a difference?
>
> Yes. When restore_command is not specified in recovery.conf, walrestore
> process doesn't do any useful activity and just wastes CPU cycle. Which
> might be harmless for a functionality of recovery, but ISTM it's better not
> to start up walrestore in that case to avoid the waste of cycle.

It just sleeps on a latch when it has nothing to do, so no wasted cycles.

Asking the postmaster seemed the easier option, I guess I could have
chosen the other way also.

I'll look at this when this is the last thing left to resolve to see
if that improves things.


>> Cleaned up the points noted, new patch attached in case you wish to
>> review further.
>>
>> Still has bug, so still with me to fix.
>
> Thanks! Will review further.

Much appreciated.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: WAL Restore process during recovery

From

Fujii Masao

Date:

24 January 2012, 06:19:49

On Tue, Jan 24, 2012 at 6:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jan 24, 2012 at 9:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> Why does walrestore need to be invoked even when restore_command is
>>>> not specified? It seems to be useless. We invoke walreceiver only when
>>>> primary_conninfo is specified now. Similarly we should invoke walrestore
>>>> only when restore_command is specified?
>>>
>>> walreceiver is shutdown and restarted in case of failed connection.
>>> That never happens with walrestore because the command is run each
>>> time - when we issue system(3) a new process is forked to run the
>>> command. So there is no specific cleanup to perform and so no reason
>>> for a managed cleanup process.
>>>
>>> So I can't see a specific reason to change that. Do you think it makes
>>> a difference?
>>
>> Yes. When restore_command is not specified in recovery.conf, walrestore
>> process doesn't do any useful activity and just wastes CPU cycle. Which
>> might be harmless for a functionality of recovery, but ISTM it's better not
>> to start up walrestore in that case to avoid the waste of cycle.
>
> It just sleeps on a latch when it has nothing to do, so no wasted cycles.

Right, since walrestore process wakes up just every 10 seconds, a waste of
cycle is low. But what I feel uncomfortable is that walrestore process has
nothing to do *from start to end*, when restore_command is not specified,
but it's started up. I guess that many people would get surprised at that.
Of course, if restore_command can be changed without restarting the server,
I agree with you because walrestore process might do an useful activity
later. But currently not.

> Asking the postmaster seemed the easier option, I guess I could have
> chosen the other way also.
>
> I'll look at this when this is the last thing left to resolve to see
> if that improves things.

Okay.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: WAL Restore process during recovery

From

Fujii Masao

Date:

26 January 2012, 00:22:45

On Tue, Jan 24, 2012 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Cleaned up the points noted, new patch attached in case you wish to
>> review further.
>>
>> Still has bug, so still with me to fix.
>
> Thanks! Will review further.

v3 patch contains lots of unrelated code changes like the following.

-   <structfield>pid</structfield> column of the
+   <structfield>procpid</structfield> column of the

You seem to have failed to extract the patch from your repository correctly.
So I used v2 patch for the review. Sorry if I comment the things which you've
already fixed in v3 patch.

Here are the comments. They are almost not serious problems.

+/*
+ * GUC parameters
+ */
+int    WalRestoreDelay = 10000;

You forget to change guc.c to define wal_restore_delay as a GUC parameter?
Or just that source code comment is incorrect?

+        elog(FATAL, "recovery_restore_command is too long");

Typo: s/recovery_restore_command/restore_command

+    InitLatch(&walrstr->WALRestoreLatch); /* initialize latch used in main loop */

That latch is shared one. OwnLatch() should be called instead of InitLatch()?
If yes, it's better to call DisownLatch() when walrestore exits. Though skipping
DisownLatch() would be harmless because the latch is never owned by new
process after walrestore exits.

+    {
+        /* use volatile pointer to prevent code rearrangement */
+        volatile WalRestoreData *walrstr = WalRstr;
+
+        nextFileTli = walrstr->nextFileTli;

The declaration of "walrstr" is not required here because it's already done
at the head of WalRestoreNextFile().

+    if (restoredFromArchive)
+    {
+        /* use volatile pointer to prevent code rearrangement */
+        volatile WalRestoreData *walrstr = WalRstr;

Same as above.

+#define TMPRECOVERYXLOG    "RECOVERYXLOG"

ISTM that it's better to move this definition to an include file and we should
use it in all the places where the fixed value "RECOVERYXLOG" is still used.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center