Thread: WAL Restore process during recovery
WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > WALRestore process asynchronously executes restore_command while > recovery continues working. > > Overlaps downloading of next WAL file to reduce time delays in file > based archive recovery. > > Handles cases of file-only and streaming/file correctly. Though I've not reviewed the patch deeply yet, I observed the following two problems when I tested the patch. When I set up streaming replication + archive (i.e., restore_command is set) and started the standby, I got the following error: FATAL: all AuxiliaryProcs are in use LOG: walrestore process (PID 18839) exited with exit code 1 When I started an archive recovery without setting restore_command, it successfully finished. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> WALRestore process asynchronously executes restore_command while >> recovery continues working. >> >> Overlaps downloading of next WAL file to reduce time delays in file >> based archive recovery. >> >> Handles cases of file-only and streaming/file correctly. > > Though I've not reviewed the patch deeply yet, I observed the following > two problems when I tested the patch. > > When I set up streaming replication + archive (i.e., restore_command is set) > and started the standby, I got the following error: > > FATAL: all AuxiliaryProcs are in use > LOG: walrestore process (PID 18839) exited with exit code 1 > > When I started an archive recovery without setting restore_command, > it successfully finished. Oh, I did have NUM_AUXILIARY_PROCS increased at one point, but I "realised" it wasn't needed and removed it. Will change that. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> WALRestore process asynchronously executes restore_command while >> recovery continues working. >> >> Overlaps downloading of next WAL file to reduce time delays in file >> based archive recovery. >> >> Handles cases of file-only and streaming/file correctly. > > Though I've not reviewed the patch deeply yet, I observed the following > two problems when I tested the patch. > > When I set up streaming replication + archive (i.e., restore_command is set) > and started the standby, I got the following error: > > FATAL: all AuxiliaryProcs are in use > LOG: walrestore process (PID 18839) exited with exit code 1 Fixed and better documented. > When I started an archive recovery without setting restore_command, > it successfully finished. Not sure exactly what you mean, but I fixed a bug that might be something you're seeing. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Fri, Jan 20, 2012 at 4:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> WALRestore process asynchronously executes restore_command while >>> recovery continues working. >>> >>> Overlaps downloading of next WAL file to reduce time delays in file >>> based archive recovery. >>> >>> Handles cases of file-only and streaming/file correctly. >> >> Though I've not reviewed the patch deeply yet, I observed the following >> two problems when I tested the patch. >> >> When I set up streaming replication + archive (i.e., restore_command is set) >> and started the standby, I got the following error: >> >> FATAL: all AuxiliaryProcs are in use >> LOG: walrestore process (PID 18839) exited with exit code 1 > > Fixed and better documented. > >> When I started an archive recovery without setting restore_command, >> it successfully finished. > > Not sure exactly what you mean, but I fixed a bug that might be > something you're seeing. Thanks! But you forgot to include walrestore.c and .h in the patch. Can you submit the updated version of the patch? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote: Requested update -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > > Requested update Thanks! Will review. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Jan 20, 2012 at 7:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >> Requested update > > Thanks! Will review. In StartChildProcess(), the code which emits an error when fork of walrestore fails is required. In reaper(), the following comment needs to be updated because an unexpected exit (including FATAL) is treated as a crash in the patch. /* * Was it the wal restore? If exit status is zero (normal) or one * (FATAL exit), we assume everything isall right just like normal * backends. */ if (pid == WalRestorePID) Why does walrestore need to be invoked even when restore_command is not specified? It seems to be useless. We invoke walreceiver only when primary_conninfo is specified now. Similarly we should invoke walrestore only when restore_command is specified? When I set up the file-based log-shipping environment using pg_standby, ran "pgbench -i -s2", waited for walrestore to restore at least one WAL file, and created the trigger file, then I encounterd the following error in the standby. sby LOG: startup process requests 000000010000000000000003 from archive trigger file found: smart failover sby LOG: startup process sees last file was 000000010000000000000003 sby FATAL: could not rename file "pg_xlog/RECOVERYXLOG"to "pg_xlog/000000010000000000000003": No such file or directory sby LOG: startup process (PID 11079) exited with exit code1 sby LOG: terminating any other active server processes When I set up streaming replication with setting restore_command, I got the following messages repeatedly. The WAL file name was always "000000000000000000000000". sby1 LOG: walrestore checking for next file to restore sby1 LOG: restore of 000000000000000000000000 is already complete,so sleep In PostmasterStateMachine(), the following comment needs to mention WALRestore. * PM_WAIT_READONLY state ends when we have no regular backends that * have been started during recovery. We killthe startup and * walreceiver processes and transition to PM_WAIT_BACKENDS. Ideally, In walrestore.c, the following comments seem to be incorrect. At least an unexpected exit of WALRestore doesn't start a recovery cycle in the patch. * If the WAL restore exits unexpectedly, the postmaster treats that the same * as a backend crash: shared memory may be corrupted, so remaining backends * should be killed by SIGQUITand then a recovery cycle started. In walrestore.c + * Main entry point for walrestore process + * + * This is invoked from BootstrapMain, which has already created the basic + * execution environment, but not enabled signals yet. BootstrapMain() doesn't exist, and it should be changed to AuxiliaryProcessMain(). This is not a fault of the patch. There are the same typos in bgwriter.c, walwriter.c and checkpointer.c In walrestore.c + * SIGUSR1 is presently unused; keep it spare in case someday we want this + * process to participate in ProcSignal signalling. The above comment is incorrect because SIGUSR1 is presently used. + /* + * From here on, elog(ERROR) should end with exit(1), not send + * control back to the sigsetjmp block above + */ + ExitOnAnyError = true; The above is not required because sigsetjmp is not used in walrestore.c + /* Normal exit from the walwriter is here */ + proc_exit(0); /* done */ Typo: s/walwriter/walrestore I've not reviewed the patch enough yet. Will review the patch tomorrow again. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > I've not reviewed the patch enough yet. Will review the patch tomorrow again. Thanks very much. I'm sure that's enough to keep me busy a few days. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > In StartChildProcess(), the code which emits an error when fork of walrestore > fails is required. OK > In reaper(), the following comment needs to be updated because an unexpected > exit (including FATAL) is treated as a crash in the patch. > > /* > * Was it the wal restore? If exit status is zero (normal) or one > * (FATAL exit), we assume everything is all right just like normal > * backends. > */ > if (pid == WalRestorePID) OK > Why does walrestore need to be invoked even when restore_command is > not specified? It seems to be useless. We invoke walreceiver only when > primary_conninfo is specified now. Similarly we should invoke walrestore > only when restore_command is specified? walreceiver is shutdown and restarted in case of failed connection. That never happens with walrestore because the command is run each time - when we issue system(3) a new process is forked to run the command. So there is no specific cleanup to perform and so no reason for a managed cleanup process. So I can't see a specific reason to change that. Do you think it makes a difference? > When I set up the file-based log-shipping environment using pg_standby, > ran "pgbench -i -s2", waited for walrestore to restore at least one WAL > file, and created the trigger file, then I encounterd the following error in > the standby. > > sby LOG: startup process requests 000000010000000000000003 from archive > trigger file found: smart failover > sby LOG: startup process sees last file was 000000010000000000000003 > sby FATAL: could not rename file "pg_xlog/RECOVERYXLOG" to > "pg_xlog/000000010000000000000003": No such file or directory > sby LOG: startup process (PID 11079) exited with exit code 1 > sby LOG: terminating any other active server processes Will look further. > When I set up streaming replication with setting restore_command, > I got the following messages repeatedly. The WAL file name was always > "000000000000000000000000". Will look further. > sby1 LOG: walrestore checking for next file to restore > sby1 LOG: restore of 000000000000000000000000 is already complete, so sleep Will look further. > In PostmasterStateMachine(), the following comment needs to mention WALRestore. > > * PM_WAIT_READONLY state ends when we have no regular backends that > * have been started during recovery. We kill the startup and > * walreceiver processes and transition to PM_WAIT_BACKENDS. Ideally, OK > In walrestore.c, the following comments seem to be incorrect. At least > an unexpected > exit of WALRestore doesn't start a recovery cycle in the patch. > > * If the WAL restore exits unexpectedly, the postmaster treats > that the same > * as a backend crash: shared memory may be corrupted, so remaining backends > * should be killed by SIGQUIT and then a recovery cycle started. Yes it does... > In walrestore.c > + * Main entry point for walrestore process > + * > + * This is invoked from BootstrapMain, which has already created the basic > + * execution environment, but not enabled signals yet. > > BootstrapMain() doesn't exist, and it should be changed to > AuxiliaryProcessMain(). > This is not a fault of the patch. There are the same typos in > bgwriter.c, walwriter.c > and checkpointer.c OK, will fix. > In walrestore.c > + * SIGUSR1 is presently unused; keep it spare in case someday we want this > + * process to participate in ProcSignal signalling. > > The above comment is incorrect because SIGUSR1 is presently used. OK > + /* > + * From here on, elog(ERROR) should end with exit(1), not send > + * control back to the sigsetjmp block above > + */ > + ExitOnAnyError = true; > > The above is not required because sigsetjmp is not used in walrestore.c OK > + /* Normal exit from the walwriter is here */ > + proc_exit(0); /* done */ > > Typo: s/walwriter/walrestore OK Cleaned up the points noted, new patch attached in case you wish to review further. Still has bug, so still with me to fix. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Why does walrestore need to be invoked even when restore_command is >> not specified? It seems to be useless. We invoke walreceiver only when >> primary_conninfo is specified now. Similarly we should invoke walrestore >> only when restore_command is specified? > > walreceiver is shutdown and restarted in case of failed connection. > That never happens with walrestore because the command is run each > time - when we issue system(3) a new process is forked to run the > command. So there is no specific cleanup to perform and so no reason > for a managed cleanup process. > > So I can't see a specific reason to change that. Do you think it makes > a difference? Yes. When restore_command is not specified in recovery.conf, walrestore process doesn't do any useful activity and just wastes CPU cycle. Which might be harmless for a functionality of recovery, but ISTM it's better not to start up walrestore in that case to avoid the waste of cycle. > Cleaned up the points noted, new patch attached in case you wish to > review further. > > Still has bug, so still with me to fix. Thanks! Will review further. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 24, 2012 at 9:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> Why does walrestore need to be invoked even when restore_command is >>> not specified? It seems to be useless. We invoke walreceiver only when >>> primary_conninfo is specified now. Similarly we should invoke walrestore >>> only when restore_command is specified? >> >> walreceiver is shutdown and restarted in case of failed connection. >> That never happens with walrestore because the command is run each >> time - when we issue system(3) a new process is forked to run the >> command. So there is no specific cleanup to perform and so no reason >> for a managed cleanup process. >> >> So I can't see a specific reason to change that. Do you think it makes >> a difference? > > Yes. When restore_command is not specified in recovery.conf, walrestore > process doesn't do any useful activity and just wastes CPU cycle. Which > might be harmless for a functionality of recovery, but ISTM it's better not > to start up walrestore in that case to avoid the waste of cycle. It just sleeps on a latch when it has nothing to do, so no wasted cycles. Asking the postmaster seemed the easier option, I guess I could have chosen the other way also. I'll look at this when this is the last thing left to resolve to see if that improves things. >> Cleaned up the points noted, new patch attached in case you wish to >> review further. >> >> Still has bug, so still with me to fix. > > Thanks! Will review further. Much appreciated. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 24, 2012 at 6:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, Jan 24, 2012 at 9:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> Why does walrestore need to be invoked even when restore_command is >>>> not specified? It seems to be useless. We invoke walreceiver only when >>>> primary_conninfo is specified now. Similarly we should invoke walrestore >>>> only when restore_command is specified? >>> >>> walreceiver is shutdown and restarted in case of failed connection. >>> That never happens with walrestore because the command is run each >>> time - when we issue system(3) a new process is forked to run the >>> command. So there is no specific cleanup to perform and so no reason >>> for a managed cleanup process. >>> >>> So I can't see a specific reason to change that. Do you think it makes >>> a difference? >> >> Yes. When restore_command is not specified in recovery.conf, walrestore >> process doesn't do any useful activity and just wastes CPU cycle. Which >> might be harmless for a functionality of recovery, but ISTM it's better not >> to start up walrestore in that case to avoid the waste of cycle. > > It just sleeps on a latch when it has nothing to do, so no wasted cycles. Right, since walrestore process wakes up just every 10 seconds, a waste of cycle is low. But what I feel uncomfortable is that walrestore process has nothing to do *from start to end*, when restore_command is not specified, but it's started up. I guess that many people would get surprised at that. Of course, if restore_command can be changed without restarting the server, I agree with you because walrestore process might do an useful activity later. But currently not. > Asking the postmaster seemed the easier option, I guess I could have > chosen the other way also. > > I'll look at this when this is the last thing left to resolve to see > if that improves things. Okay. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 24, 2012 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Cleaned up the points noted, new patch attached in case you wish to >> review further. >> >> Still has bug, so still with me to fix. > > Thanks! Will review further. v3 patch contains lots of unrelated code changes like the following. - <structfield>pid</structfield> column of the + <structfield>procpid</structfield> column of the You seem to have failed to extract the patch from your repository correctly. So I used v2 patch for the review. Sorry if I comment the things which you've already fixed in v3 patch. Here are the comments. They are almost not serious problems. +/* + * GUC parameters + */ +int WalRestoreDelay = 10000; You forget to change guc.c to define wal_restore_delay as a GUC parameter? Or just that source code comment is incorrect? + elog(FATAL, "recovery_restore_command is too long"); Typo: s/recovery_restore_command/restore_command + InitLatch(&walrstr->WALRestoreLatch); /* initialize latch used in main loop */ That latch is shared one. OwnLatch() should be called instead of InitLatch()? If yes, it's better to call DisownLatch() when walrestore exits. Though skipping DisownLatch() would be harmless because the latch is never owned by new process after walrestore exits. + { + /* use volatile pointer to prevent code rearrangement */ + volatile WalRestoreData *walrstr = WalRstr; + + nextFileTli = walrstr->nextFileTli; The declaration of "walrstr" is not required here because it's already done at the head of WalRestoreNextFile(). + if (restoredFromArchive) + { + /* use volatile pointer to prevent code rearrangement */ + volatile WalRestoreData *walrstr = WalRstr; Same as above. +#define TMPRECOVERYXLOG "RECOVERYXLOG" ISTM that it's better to move this definition to an include file and we should use it in all the places where the fixed value "RECOVERYXLOG" is still used. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center