Thread: Split xlog.c
Hi, xlog.c is very large. We've split off some functions from it over the years, but it's still large and it keeps growing. Attached is a proposal to split functions related to WAL replay, standby mode, fetching files from archive, computing the recovery target and so on, to new source file called xlogrecovery.c. That's a fairly clean split. StartupXLOG() stays in xlog.c, but much of the code from it has been moved to new functions InitWalRecovery(), PerformWalRecovery() and EndWalRecovery(). The general idea is that xlog.c is still responsible for orchestrating the servers startup, but xlogrecovery.c is responsible for figuring out whether WAL recovery is needed, performing it, and deciding when it can stop. There's surely more refactoring we could do. xlog.c has a lot of global variables, with similar names but slightly different meanings for example. (Quick: what's the difference between InRedo, InRecovery, InArchiveRecovery, and RecoveryInProgress()? I have to go check the code every time to remind myself). But this patch tries to just move source code around for clarity. There are small changes in the order that some of things are done in StartupXLOG(), for readability. I tried to be careful and check that the changes are safe, but a second pair of eyes would be appreciated on that. - Heikki
Attachment
Hi, On 2021-06-16 16:30:45 +0300, Heikki Linnakangas wrote: > xlog.c is very large. We've split off some functions from it over the years, > but it's still large and it keeps growing. > > Attached is a proposal to split functions related to WAL replay, standby > mode, fetching files from archive, computing the recovery target and so on, > to new source file called xlogrecovery.c. Wohoo! I think this is desperately needed. I personally am more concerned about the size of StartupXLOG() etc than the size of xlog.c itself, but since both reasonably are done at the same time... > That's a fairly clean split. StartupXLOG() stays in xlog.c, but much of the > code from it has been moved to new functions InitWalRecovery(), > PerformWalRecovery() and EndWalRecovery(). The general idea is that xlog.c is > still responsible for orchestrating the servers startup, but xlogrecovery.c > is responsible for figuring out whether WAL recovery is needed, performing > it, and deciding when it can stop. For some reason "recovery" bothers me a tiny bit, even though it's obviously already in use. Using "apply", or "replay" seems more descriptive to me, but whatever. > There's surely more refactoring we could do. xlog.c has a lot of global > variables, with similar names but slightly different meanings for example. > (Quick: what's the difference between InRedo, InRecovery, InArchiveRecovery, > and RecoveryInProgress()? I have to go check the code every time to remind > myself). But this patch tries to just move source code around for clarity. Agreed, it's quite chaotic. I think a good initial step to clean up that mess would be to just collect the relevant variables into one or two structs. > There are small changes in the order that some of things are done in > StartupXLOG(), for readability. I tried to be careful and check that the > changes are safe, but a second pair of eyes would be appreciated on that. I think it might be worth trying to break this into a bit more incremental changes - it's a huge commit and mixing code movement with code changes makes it really hard to review the non-movement portion. > +void > +PerformWalRecovery(void) > +{ > + > + if (record != NULL) > + { > + ErrorContextCallback errcallback; > + TimestampTz xtime; > + PGRUsage ru0; > + XLogRecPtr ReadRecPtr; > + XLogRecPtr EndRecPtr; > + > + pg_rusage_init(&ru0); > + > + InRedo = true; > + > + /* Initialize resource managers */ > + for (rmid = 0; rmid <= RM_MAX_ID; rmid++) > + { > + if (RmgrTable[rmid].rm_startup != NULL) > + RmgrTable[rmid].rm_startup(); > + } > + > + ereport(LOG, > + (errmsg("redo starts at %X/%X", > + LSN_FORMAT_ARGS(xlogreader->ReadRecPtr)))); > + > + /* > + * main redo apply loop > + */ > + do > + { If we're refactoring all of this, can we move the apply-one-record part into its own function as well? Happy to do that as a followup or precursor patch too. The per-record logic has grown complicated enough to make that quite worthwhile imo - and imo most of the time one either is interested in the per-record work, or in the rest of the StartupXLog/PerformWalRecovery logic. Greetings, Andres Freund
On 17/06/2021 02:00, Andres Freund wrote: > On 2021-06-16 16:30:45 +0300, Heikki Linnakangas wrote: >> That's a fairly clean split. StartupXLOG() stays in xlog.c, but much of the >> code from it has been moved to new functions InitWalRecovery(), >> PerformWalRecovery() and EndWalRecovery(). The general idea is that xlog.c is >> still responsible for orchestrating the servers startup, but xlogrecovery.c >> is responsible for figuring out whether WAL recovery is needed, performing >> it, and deciding when it can stop. > > For some reason "recovery" bothers me a tiny bit, even though it's obviously > already in use. Using "apply", or "replay" seems more descriptive to me, but > whatever. I think of "recovery" as a broader term than applying or replaying. Replaying the WAL records is one part of recovery. But yeah, the difference is not well-defined and we tend to use those terms interchangeably. >> There's surely more refactoring we could do. xlog.c has a lot of global >> variables, with similar names but slightly different meanings for example. >> (Quick: what's the difference between InRedo, InRecovery, InArchiveRecovery, >> and RecoveryInProgress()? I have to go check the code every time to remind >> myself). But this patch tries to just move source code around for clarity. > > Agreed, it's quite chaotic. I think a good initial step to clean up that mess > would be to just collect the relevant variables into one or two structs. Not a bad idea. >> There are small changes in the order that some of things are done in >> StartupXLOG(), for readability. I tried to be careful and check that the >> changes are safe, but a second pair of eyes would be appreciated on that. > > I think it might be worth trying to break this into a bit more incremental > changes - it's a huge commit and mixing code movement with code changes makes > it really hard to review the non-movement portion. Fair. Attached is a new patch set which contains a few smaller commits that reorder things in xlog.c, and then the big commit that moves things to xlogrecovery.c. > If we're refactoring all of this, can we move the apply-one-record part into > its own function as well? Happy to do that as a followup or precursor patch > too. The per-record logic has grown complicated enough to make that quite > worthwhile imo - and imo most of the time one either is interested in the > per-record work, or in the rest of the StartupXLog/PerformWalRecovery logic. Added a commit to do that, as a follow-up. Yeah, I agree that makes sense. - Heikki
Attachment
- 0001-Don-t-use-O_SYNC-or-similar-when-opening-signal-file.patch
- 0002-Remove-unnecessary-restoredFromArchive-global-variab.patch
- 0003-Extract-code-to-get-reason-that-recovery-was-stopped.patch
- 0004-Move-InRecovery-and-standbyState-global-vars-to-xlog.patch
- 0005-Move-code-around-in-StartupXLOG.patch
- 0006-Split-xlog.c-into-xlog.c-and-xlogrecovery.c.patch
- 0007-Move-code-to-apply-one-WAL-record-to-a-subroutine.patch
On Tue, Jun 22, 2021 at 2:37 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 17/06/2021 02:00, Andres Freund wrote: > > On 2021-06-16 16:30:45 +0300, Heikki Linnakangas wrote: > >> That's a fairly clean split. StartupXLOG() stays in xlog.c, but much of the > >> code from it has been moved to new functions InitWalRecovery(), > >> PerformWalRecovery() and EndWalRecovery(). The general idea is that xlog.c is > >> still responsible for orchestrating the servers startup, but xlogrecovery.c > >> is responsible for figuring out whether WAL recovery is needed, performing > >> it, and deciding when it can stop. > > > > For some reason "recovery" bothers me a tiny bit, even though it's obviously > > already in use. Using "apply", or "replay" seems more descriptive to me, but > > whatever. > > I think of "recovery" as a broader term than applying or replaying. > Replaying the WAL records is one part of recovery. But yeah, the > difference is not well-defined and we tend to use those terms > interchangeably. > > >> There's surely more refactoring we could do. xlog.c has a lot of global > >> variables, with similar names but slightly different meanings for example. > >> (Quick: what's the difference between InRedo, InRecovery, InArchiveRecovery, > >> and RecoveryInProgress()? I have to go check the code every time to remind > >> myself). But this patch tries to just move source code around for clarity. > > > > Agreed, it's quite chaotic. I think a good initial step to clean up that mess > > would be to just collect the relevant variables into one or two structs. > > Not a bad idea. > > >> There are small changes in the order that some of things are done in > >> StartupXLOG(), for readability. I tried to be careful and check that the > >> changes are safe, but a second pair of eyes would be appreciated on that. > > > > I think it might be worth trying to break this into a bit more incremental > > changes - it's a huge commit and mixing code movement with code changes makes > > it really hard to review the non-movement portion. > > Fair. Attached is a new patch set which contains a few smaller commits > that reorder things in xlog.c, and then the big commit that moves things > to xlogrecovery.c. > > > If we're refactoring all of this, can we move the apply-one-record part into > > its own function as well? Happy to do that as a followup or precursor patch > > too. The per-record logic has grown complicated enough to make that quite > > worthwhile imo - and imo most of the time one either is interested in the > > per-record work, or in the rest of the StartupXLog/PerformWalRecovery logic. > > Added a commit to do that, as a follow-up. Yeah, I agree that makes sense. The patch does not apply on Head anymore, could you rebase and post a patch. I'm changing the status to "Waiting for Author". Regards, Vignesh
On 15/07/2021 15:19, vignesh C wrote: > The patch does not apply on Head anymore, could you rebase and post a > patch. I'm changing the status to "Waiting for Author". Here's a rebase. - Heikki
Attachment
- 0001-Don-t-use-O_SYNC-or-similar-when-opening-signal-file.patch
- 0002-Remove-unnecessary-restoredFromArchive-global-variab.patch
- 0003-Extract-code-to-get-reason-that-recovery-was-stopped.patch
- 0004-Move-InRecovery-and-standbyState-global-vars-to-xlog.patch
- 0005-Move-code-around-in-StartupXLOG.patch
- 0006-Split-xlog.c-into-xlog.c-and-xlogrecovery.c.patch
- 0007-Move-code-to-apply-one-WAL-record-to-a-subroutine.patch
Hi, I think it'd make sense to apply the first few patches now, they seem uncontroversial and simple enough. On 2021-07-31 00:33:34 +0300, Heikki Linnakangas wrote: > From 0cfb852e320bd8fe83c588d25306d5b4c57b9da6 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 21 Jun 2021 22:14:58 +0300 > Subject: [PATCH 1/7] Don't use O_SYNC or similar when opening signal file to > fsync it. +1 > From 83f00e90bb818ed21bb14580f19f58c4ade87ef7 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Wed, 9 Jun 2021 12:05:53 +0300 > Subject: [PATCH 2/7] Remove unnecessary 'restoredFromArchive' global variable. > > It might've been useful for debugging purposes, but meh. There's > 'readSource' which does almost the same thing. +1 > From ec53470c8d271c01b8d2e12b92863501c3a9b4cf Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 21 Jun 2021 16:12:50 +0300 > Subject: [PATCH 3/7] Extract code to get reason that recovery was stopped to a > function. +1 > +/* > + * Create a comment for the history file to explain why and where > + * timeline changed. > + */ > +static char * > +getRecoveryStopReason(void) > +{ > + char reason[200]; > + > + if (recoveryTarget == RECOVERY_TARGET_XID) > + snprintf(reason, sizeof(reason), > + "%s transaction %u", > + recoveryStopAfter ? "after" : "before", > + recoveryStopXid); > + else if (recoveryTarget == RECOVERY_TARGET_TIME) > + snprintf(reason, sizeof(reason), > + "%s %s\n", > + recoveryStopAfter ? "after" : "before", > + timestamptz_to_str(recoveryStopTime)); > + else if (recoveryTarget == RECOVERY_TARGET_LSN) > + snprintf(reason, sizeof(reason), > + "%s LSN %X/%X\n", > + recoveryStopAfter ? "after" : "before", > + LSN_FORMAT_ARGS(recoveryStopLSN)); > + else if (recoveryTarget == RECOVERY_TARGET_NAME) > + snprintf(reason, sizeof(reason), > + "at restore point \"%s\"", > + recoveryStopName); > + else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE) > + snprintf(reason, sizeof(reason), "reached consistency"); > + else > + snprintf(reason, sizeof(reason), "no recovery target specified"); > + > + return pstrdup(reason); > +} I guess it would make sense to change this over to a switch at some point, so we can get warnings if a new type of target is added... > From 70f688f9576b7939d18321444fd59c51c402ce23 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 21 Jun 2021 21:25:37 +0300 > Subject: [PATCH 4/7] Move InRecovery and standbyState global vars to > xlogutils.c. > > They are used in code that is sometimes called from a redo routine, > so xlogutils.c seems more appropriate. That's where we have other helper > functions used by redo routines. FWIW, with some compilers on some linux distributions there is an efficiency difference between accessing a variable (or calling a function) defined in the current translation unit or a separate one (with the separate TU going through the GOT). I don't think it's a problem here, but it's worth keeping in mind while moving things around. We should probably adjust our compiler settings to address that at some point :( > From da11050ca890ce0311d9e97d2832a6a61bc43e10 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Fri, 18 Jun 2021 12:15:04 +0300 > Subject: [PATCH 5/7] Move code around in StartupXLOG(). > > This is the order that things will happen with the next commit, this > makes it more explicit. To aid review, I added "BEGIN/END function" > comments to mark which blocks of code are moved to separate functions in > in the next commit. > --- > src/backend/access/transam/xlog.c | 605 ++++++++++++++++-------------- > 1 file changed, 315 insertions(+), 290 deletions(-) > > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > index efb3ca273ed..b9d96d6de26 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -882,7 +882,6 @@ static MemoryContext walDebugCxt = NULL; > > static void readRecoverySignalFile(void); > static void validateRecoveryParameters(void); > -static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); > static bool recoveryStopsBefore(XLogReaderState *record); > static bool recoveryStopsAfter(XLogReaderState *record); > static char *getRecoveryStopReason(void); > @@ -5592,111 +5591,6 @@ validateRecoveryParameters(void) > } > } > > -/* > - * Exit archive-recovery state > - */ > -static void > -exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) > -{ I don't really understand the motivation for this part of the change? This kind of seems to run counter to the stated goals of the patch series? Seems like it'd need a different commit message at last? > + /*---- BEGIN FreeWalRecovery ----*/ > + > /* Shut down xlogreader */ > if (readFile >= 0) > { FWIW, FreeWalRecovery() for something that closes and unlinks files among other things doesn't seem like a great name. Greetings, Andres Freund
On 31/07/2021 02:11, Andres Freund wrote: > Hi, > > I think it'd make sense to apply the first few patches now, they seem > uncontroversial and simple enough. Pushed those, thanks! >> From da11050ca890ce0311d9e97d2832a6a61bc43e10 Mon Sep 17 00:00:00 2001 >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Fri, 18 Jun 2021 12:15:04 +0300 >> Subject: [PATCH 5/7] Move code around in StartupXLOG(). >> >> This is the order that things will happen with the next commit, this >> makes it more explicit. To aid review, I added "BEGIN/END function" >> comments to mark which blocks of code are moved to separate functions in >> in the next commit. > >> --- >> src/backend/access/transam/xlog.c | 605 ++++++++++++++++-------------- >> 1 file changed, 315 insertions(+), 290 deletions(-) >> >> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c >> index efb3ca273ed..b9d96d6de26 100644 >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -882,7 +882,6 @@ static MemoryContext walDebugCxt = NULL; >> >> static void readRecoverySignalFile(void); >> static void validateRecoveryParameters(void); >> -static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); >> static bool recoveryStopsBefore(XLogReaderState *record); >> static bool recoveryStopsAfter(XLogReaderState *record); >> static char *getRecoveryStopReason(void); >> @@ -5592,111 +5591,6 @@ validateRecoveryParameters(void) >> } >> } >> >> -/* >> - * Exit archive-recovery state >> - */ >> -static void >> -exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) >> -{ > > I don't really understand the motivation for this part of the change? This > kind of seems to run counter to the stated goals of the patch series? Seems > like it'd need a different commit message at last? Hmm. Some parts of exitArchiveRecovery are being moved into xlogrecovery.c, so it becomes smaller than before. Maybe there's still enough code left there that a separate function makes sense. I'll try that differently. >> + /*---- BEGIN FreeWalRecovery ----*/ >> + >> /* Shut down xlogreader */ >> if (readFile >= 0) >> { > > FWIW, FreeWalRecovery() for something that closes and unlinks files among > other things doesn't seem like a great name. Rename to CloseWalRecovery(), maybe? I'll try that. - Heikki
On 31/07/2021 10:54, Heikki Linnakangas wrote: > On 31/07/2021 02:11, Andres Freund wrote: >>> @@ -5592,111 +5591,6 @@ validateRecoveryParameters(void) >>> } >>> } >>> >>> -/* >>> - * Exit archive-recovery state >>> - */ >>> -static void >>> -exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) >>> -{ >> >> I don't really understand the motivation for this part of the change? This >> kind of seems to run counter to the stated goals of the patch series? Seems >> like it'd need a different commit message at last? > > Hmm. Some parts of exitArchiveRecovery are being moved into > xlogrecovery.c, so it becomes smaller than before. Maybe there's still > enough code left there that a separate function makes sense. I'll try > that differently. So, my issue with exitArchiveRecovery() was that after this refactoring, the function didn't really exit archive recovery anymore. InArchiveRecovery flag is cleared earlier already, in xlogrecovery.c. I renamed exitArchiveRecovery() to XLogInitNewTimeline(), and moved the unlinking of the signal files into the caller. The function now only initializes the first WAL segment on the new timeline, and the new name reflects that. I'm pretty happy with this now. >>> + /*---- BEGIN FreeWalRecovery ----*/ >>> + >>> /* Shut down xlogreader */ >>> if (readFile >= 0) >>> { >> >> FWIW, FreeWalRecovery() for something that closes and unlinks files among >> other things doesn't seem like a great name. > > Rename to CloseWalRecovery(), maybe? I'll try that. I renamed it to ShutdownWalRecovery(). I also refactored the FinishWalRecovery() function so that instead of having a dozen output pointer parameters, it returns a struct with all the return values. New patch set attached. - Heikki
Attachment
After applying 0001 and 0002 I got a bunch of compile problems: In file included from /pgsql/source/master/src/include/postgres.h:46, from /pgsql/source/master/src/backend/access/transam/xlog.c:39: /pgsql/source/master/src/backend/access/transam/xlog.c: In function 'StartupXLOG': /pgsql/source/master/src/backend/access/transam/xlog.c:5310:10: error: 'lastPageBeginPtr' undeclared (first use in this function) Assert(lastPageBeginPtr == EndOfLog); ^~~~~~~~~~~~~~~~ /pgsql/source/master/src/include/c.h:848:9: note: in definition of macro 'Assert' if (!(condition)) \ ^~~~~~~~~ /pgsql/source/master/src/backend/access/transam/xlog.c:5310:10: note: each undeclared identifier is reported only once foreach function it appears in Assert(lastPageBeginPtr == EndOfLog); ^~~~~~~~~~~~~~~~ /pgsql/source/master/src/include/c.h:848:9: note: in definition of macro 'Assert' if (!(condition)) \ ^~~~~~~~~ make[4]: *** [../../../../src/Makefile.global:938: xlog.o] Error 1 /pgsql/source/master/src/backend/access/transam/xlog.c:5310:10: error: use of undeclared identifier 'lastPageBeginPtr' Assert(lastPageBeginPtr == EndOfLog); ^ 1 error generated. make[4]: *** [../../../../src/Makefile.global:1070: xlog.bc] Error 1 make[4]: Target 'all' not remade because of errors. make[3]: *** [/pgsql/source/master/src/backend/common.mk:39: transam-recursive] Error 2 make[3]: Target 'all' not remade because of errors. make[2]: *** [/pgsql/source/master/src/backend/common.mk:39: access-recursive] Error 2 make[2]: Target 'install' not remade because of errors. make[1]: *** [Makefile:42: install-backend-recurse] Error 2 make[1]: Target 'install' not remade because of errors. make: *** [GNUmakefile:11: install-src-recurse] Error 2 make: Target 'install' not remade because of errors. /pgsql/source/master/contrib/pg_prewarm/autoprewarm.c: In function 'apw_load_buffers': /pgsql/source/master/contrib/pg_prewarm/autoprewarm.c:301:9: warning: implicit declaration of function 'AllocateFile'; didyou mean 'load_file'? [-Wimplicit-function-declaration] file = AllocateFile(AUTOPREWARM_FILE, "r"); ^~~~~~~~~~~~ load_file /pgsql/source/master/contrib/pg_prewarm/autoprewarm.c:301:7: warning: assignment to 'FILE *' {aka 'struct _IO_FILE *'} from'int' makes pointer from integer without a cast [-Wint-conversion] file = AllocateFile(AUTOPREWARM_FILE, "r"); ^ /pgsql/source/master/contrib/pg_prewarm/autoprewarm.c:342:2: warning: implicit declaration of function 'FreeFile' [-Wimplicit-function-declaration] FreeFile(file); ^~~~~~~~ /pgsql/source/master/contrib/pg_prewarm/autoprewarm.c: In function 'apw_dump_now': /pgsql/source/master/contrib/pg_prewarm/autoprewarm.c:630:7: warning: assignment to 'FILE *' {aka 'struct _IO_FILE *'} from'int' makes pointer from integer without a cast [-Wint-conversion] file = AllocateFile(transient_dump_file_path, "w"); ^ /pgsql/source/master/contrib/pg_prewarm/autoprewarm.c:694:9: warning: implicit declaration of function 'durable_rename';did you mean 'errtablecolname'? [-Wimplicit-function-declaration] (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR); ^~~~~~~~~~~~~~ errtablecolname -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
On 31/07/2021 22:33, Alvaro Herrera wrote: > After applying 0001 and 0002 I got a bunch of compile problems: Ah sorry, I had assertions disabled and didn't notice. Fixed version attached. - Heikki
Attachment
On 01/08/2021 12:49, Heikki Linnakangas wrote: > On 31/07/2021 22:33, Alvaro Herrera wrote: >> After applying 0001 and 0002 I got a bunch of compile problems: > > Ah sorry, I had assertions disabled and didn't notice. Fixed version > attached. Here is another rebase. - Heikki
Attachment
Hello. At Thu, 16 Sep 2021 11:23:46 +0300, Heikki Linnakangas <hlinnaka@iki.fi> wrote in > Here is another rebase. I have several comments on this. 0001: I understand this is almost simple relocation of code fragments. But it seems introducing some behavioral changes. PublishStartProcessInformation() was changed to be called while crash recovery or on standalone server. Maybe it is harmless and might be more consistent, so I'm fine with it. Another call to ResetUnloggedRelations is added before redo start, that seems fine. recoveryStopReason is always acquired but it is used only after archive recovery. I'm not sure about reason for the variable to live in that wide context. Couldn't we remove the variable then call getRecoveryStopReason() directly at the required place? 0002: heapam.c, clog.c, twophase.c, dbcommands.c doesn't need xlogrecvoer.h. > XLogRecCtl "Rec" looks like Record. Couldn't we use "Rcv", "Recov" or just "Recovery" instead? > TimeLineID PrevTimeLineID; > TransactionId oldestActiveXID; > bool promoted = false; > EndOfWalRecoveryInfo *endofwal; > bool haveTblspcMap; This is just a matter of taste but the "endofwal" looks somewhat alien in the variables. xlog.c: +void +SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr) Isn't this a function of xlogrecovery.c? Or rather isn't minRecoveryPoint-related stuff of xlogrecovery.c? 0003; Just looks fine. I might want to remove the parameter xlogreader from ApplyWalRecord, but that seems cause more harm than good. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Sep 17, 2021 at 12:10:17PM +0900, Kyotaro Horiguchi wrote: > Hello. > > At Thu, 16 Sep 2021 11:23:46 +0300, Heikki Linnakangas <hlinnaka@iki.fi> wrote in > > Here is another rebase. > > I have several comments on this. > Hi Heikki, Are we waiting a rebased version? Currently this does not apply to head. I'll mark this as WoA and move it to necxt CF. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Thu, Sep 16, 2021 at 4:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Here is another rebase. Like probably everyone else who has an opinion on the topic, I like the idea of splitting xlog.c. I don't have a fully formed opinion on the changes yet, but it seems to be a surprisingly equal split, which seems good. Since I just spent a bunch of time being frustrated by ThisTimeLineID, I'm pleased to see that the giant amount of code that moves to xlogrecovery.c apparently ends up not needing that global variable, which I think is excellent. Perhaps the amount of code that needs that global variable can be further reduced in the future, maybe even to zero. I think that the small reorderings that you mention in your original post are the scary part: if we do stuff in a different order, maybe things won't work. In the rest of this email I'm going to try to go through and analyze that. I think it might have been a bit easier if you'd outlined the things you moved and the reasons why you thought that was OK; as it is, I have to reverse-engineer it. But I'd like to see this go forward, either as-is or with whatever modifications seem to be needed, so I'm going to give it a try. - RelationCacheInitFileRemove() moves later. The code over which it moves seems to include sanity checks and initializations of various bits of in-memory state, but nothing that touches anything on disk. Therefore I don't see how this can break anything. I also agree that the new placement of the call is more logical than the old one, since in the current code it's kind of in the middle of a bunch of things that, as your patch highlights, are really all about initializing WAL recovery, and this is a separate kind of a thing. Post-patch, it ends up near where we initialize a bunch of other subsystems. Cool. - Some logic to (a) sanity-check the control file's REDO pointer, (b) set InRecovery = true, and (c) update various bits of control file state in memory has been moved substantially earlier. The actual update of the control file on disk stays where it was before. At least on first reading, I don't really like this. On the one hand, I don't see a reason why it's necessary prerequisite for splitting xlog.c. On the other hand, it seems a bit dangerous. There's now ~8 calls to functions in other modules between the time you change things in memory and the time that you call UpdateControlFile(). Perhaps none of those functions can call anything that might in turn call UpdateControlFile() but I don't know why we should take the chance. Is there some advantage to having the in-memory state out of sync with the on-disk state across all that code? - Renaming backup_label and tablespace_map to .old is now done slightly earlier, just before pg_reset_all() and adjusting our notion of the minimum recovery point rather than just after. Seems OK. - The rm_startup() functions are now called later, only once we're sure that we have a WAL record to apply. Seems fine; slightly more efficient. Looks like the functions in question are just arranging to set up private memory contexts for the AMs that want them for WAL replay, so they won't care if we skip that in some corner cases where there's nothing to replay. - ResetUnloggedRelations(UNLOGGED_RELATION_INIT) is moved later. We'll now do a few minor bookkeeping tasks like setting EndOfLog and EndOfLogTLI first, and we'll also now check whether we reached the minimum recovery point OK before doing this. This appears to me to be a clear improvement, since checking whether the minimum recovery point has been reached is fast, and resetting unlogged relations might be slow, and is pointless if we're just going to error out. - The recoveryWakeupLatch is disowned far later than before. I can't see why this would hurt anything, but my first inclination was to prefer the existing placement of the call. We're only going to wait on the latch while applying WAL, and the existing code seems to release it fairly promptly after it's done applying WAL, which seems to make sense. On the other hand, I can see that your intent was (I believe, anyway) to group it together with shutting down the xlog reader and removing RECOVERYXLOG and RECOVERYHISTORY, and there doesn't seem to be anything wrong with that idea. - The code to clear InArchiveRecovery and close the WAL segment we had open moves earlier. I think it might be possible to fail Assert(InArchiveRecovery), because where you've moved this code, we haven't yet verified that we reached the minimum recovery point. See the comment which begins "It's possible that archive recovery was requested, but we don't know how far we need to replay the WAL before we reach consistency." What if we reach that point, then fail the big hairy if-test and don't set InArchiveRecovery = true? In that case, we can still do it later, in ReadRecord. But maybe that will never happen. Actually it's not entirely clear to me that the assertion is bulletproof even where it is right now, but moving it earlier makes me even less confident. Possibly I just don't understand this well enough. It's a little tempting, too, to see if you could somehow consolidate the two places that do if (readFile >= 0) { close(readFile); readFile = -1 } down to one. - getRecoveryStopReason() is now called earlier than before, and is now called whether or not ArchiveRecoveryRequested. This seems to just move the point of initialization further from the point of use to no real advantage, and I also think that the function is only designed to do something useful for archive recovery, so calling it in other cases just seems confusing. - RECOVERYXLOG and RECOVERYHISTORY are now removed later than before. It's now the last thing that happens before we enabled WAL writes. Doesn't seem like it should hurt anything. - The "archive recovery complete" message is now logged after rather than before writing and archiving a timeline history file. I think that's likely an improvement. That's all I have on 0001. Is this kind of review helpful? Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
> On 5 Oct 2021, at 03:09, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > Are we waiting a rebased version? Currently this does not apply to head. > I'll mark this as WoA and move it to necxt CF. This patch still doesn't apply, exacerbated by the recent ThisTimelineID changes in xlog.c. I'm marking this Returned with Feedback, please feel free to open a new entry when you have a rebase addressing Kyotaro's and Robert's reviews. -- Daniel Gustafsson https://vmware.com/
Here's a new version. It includes two new smaller commits, before the main refactoring: 1. Refactor setting XLP_FIRST_IS_OVERWRITE_CONTRECORD. I moved the code to set that flag from AdvanceXLInsertBuffer() into CreateOverwriteContrecordRecord(). That avoids the need for accessing the global variable in AdvanceXLInsertBuffer(), which is nice with this patch set because I moved the global variables into xlogrecord.c. For comparison, when we are writing a continuation record, the XLP_FIRST_IS_CONTRECORD flag is also set by the caller, CopyXLogRecordToWAL(), not AdvanceXLInsertBuffer() itself. So I think this is marginally more clear anyway. 2. Use correct WAL position in error message on invalid XLOG page header. This is the thing that Robert pointed out in the "xlog.c: removing ReadRecPtr and EndRecPtr" thread. I needed to make the change for the refactoring anyway, but since it's a minor bug fix, it seemed better to extract it to a separate commit, after all. Responses to Robert's comments below: On 20/10/2021 22:06, Robert Haas wrote: > - Some logic to (a) sanity-check the control file's REDO pointer, (b) > set InRecovery = true, and (c) update various bits of control file > state in memory has been moved substantially earlier. The actual > update of the control file on disk stays where it was before. At least > on first reading, I don't really like this. On the one hand, I don't > see a reason why it's necessary prerequisite for splitting xlog.c. On > the other hand, it seems a bit dangerous. The new contents of the control file are determined by the checkpoint record, presence of backup label file, and whether we're doing archive recovery. We have that information at hand in InitWalRecovery(), whereas the caller doesn't know or care whether a backup label file was present, for example. That's why I wanted to move that logic to InitWalRecovery(). However, I was afraid of moving the actual call to UpdateControlFile() there. That would be a bigger behavioral change. What if initializing one of the subsystems fails? Currently, the control file is left unchanged, but if we called UpdateControlFile() earlier, then it would be modified already. > There's now ~8 calls to functions in other modules between the time > you change things in memory and the time that you call > UpdateControlFile(). Perhaps none of those functions can call > anything that might in turn call UpdateControlFile() but I don't know > why we should take the chance. Is there some advantage to having the > in-memory state out of sync with the on-disk state across all that > code? The functions that get called in between don't call UpdateControlFile() and don't affect what gets written there. It would be pretty questionable if they did, even on master. But for the sake of the argument, let's see what would happen if they did: master: The later call to UpdateControlFile() writes out the same values again. Unless the changed field was one of the following: 'state', 'checkPoint', 'checkPointCopy', 'minRecoveryPoint', 'minRecoveryPointTLI', 'backupStartPoint', 'backupEndRequired' or 'time'. If it was one of those, then it may be overwritten with the values deduced from the starting checkpoint. After these patches: The later call to UpdateControlFile() writes out the same values again, even if it was one of those fields. Seems like a wash to me. It's hard to tell which behavior would be the correct one. On 'master', InRecovery might or might not already be set when we call those functions. It is already set if there was a backup label file, but if we're doing recover for any other reason, it's set only later. That's pretty sloppy. We check InRecovery in various assertions, and it affects whether UpdateMinRecoveryPoint() updates the control file or not, among other things. With these patches, InRecovery is always set at that point (or not, if recovery is not needed). That's a bit besides the point here, but it highlights that the current coding isn't very robust either if those startup functions tried to modify the control file. I think these patches make it a little better, or at least not worse. > - The code to clear InArchiveRecovery and close the WAL segment we had > open moves earlier. I think it might be possible to fail > Assert(InArchiveRecovery), because where you've moved this code, we > haven't yet verified that we reached the minimum recovery point. See > the comment which begins "It's possible that archive recovery was > requested, but we don't know how far we need to replay the WAL before > we reach consistency." What if we reach that point, then fail the big > hairy if-test and don't set InArchiveRecovery = true? In that case, we > can still do it later, in ReadRecord. But maybe that will never > happen. Actually it's not entirely clear to me that the assertion is > bulletproof even where it is right now, but moving it earlier makes me > even less confident. Possibly I just don't understand this well > enough. Hmm, yeah, this logic is hairy. I tried to find a case where that assertion would fail but couldn't find one. I believe it's correct, but we could probably make it more clear. In a nutshell, PerformWalRecovery() will never return, if (ArchiveRecoveryRequested && !InArchiveRecovery). Why? There are two ways that PerformWalRecovery() can return: 1. After reaching end of WAL. ReadRecord() will always always set InArchiveRecovery in that case, if ArchiveRecoveryRequested was set. It won't return NULL without doing that. 2. We reached the requested recovery target point. There's a check for that case in PerformWalRecovery(), it will throw an "ERROR: requested recovery stop point is before consistent recovery point" if that happens before InArchiveRecovery is set. Because reachedConsistency isn't set until crash recovery is finished. That said, independently of this patch series, perhaps that assertion should be changed into something like this: if (ArchiveRecoveryRequested) { - Assert(InArchiveRecovery); + /* + * If archive recovery was requested, we should not finish + * recovery before starting archive recovery. + * + * There are other checks for this in PerformWalRecovery() so + * this shouldn't happen, but let's be safe. + */ + if (!InArchiveRecovery) + elog(ERROR, "archive recovery was requested, but recovery finished before it started"); > It's a little tempting, too, to see if you could somehow consolidate > the two places that do if (readFile >= 0) { close(readFile); readFile > = -1 } down to one. Yeah, I thought about that, but couldn't find a nice way to do it. > - getRecoveryStopReason() is now called earlier than before, and is > now called whether or not ArchiveRecoveryRequested. This seems to > just move the point of initialization further from the point of use > to no real advantage, and I also think that the function is only > designed to do something useful for archive recovery, so calling it > in other cases just seems confusing. On the other hand, it's now closer to the actual end-of-recovery. The idea here is that it seems natural to return the reason that recovery ended along with all the other end-of-recovery information, in the same EndOfWalRecoveryInfo struct. Kyotaro commented on the same thing and suggested keeping the call getRecoveryStopReason() where it was. That'd require exposing getRecoveryStopReason() from xlogrecovery.c. Which isn't a big deal, we could do it, but in general I tried to minimize the surface area between xlog.c and xlogrecovery.c. If getRecoveryStopReason() was a separate function, should standby_signal_file_found and recovery_signal_file_found also be separate functions? I'd prefer to gather all the end-of-recovery information into one struct. > That's all I have on 0001. Is this kind of review helpful? Yes, very helpful, thank you! - Heikki
Attachment
On 17/09/2021 06:10, Kyotaro Horiguchi wrote: > recoveryStopReason is always acquired but it is used only after > archive recovery. I'm not sure about reason for the variable to > live in that wide context. Couldn't we remove the variable then > call getRecoveryStopReason() directly at the required place? Robert commented on the same thing, see my reply there. > 0002: > > heapam.c, clog.c, twophase.c, dbcommands.c doesn't need xlogrecvoer.h. Cleaned that up in v7, thanks! >> XLogRecCtl > > "Rec" looks like Record. Couldn't we use "Rcv", "Recov" or just > "Recovery" instead? I never made that association before, but now I cannot unsee it :-). I changed it to XLogRecoveryCtl. >> TimeLineID PrevTimeLineID; >> TransactionId oldestActiveXID; >> bool promoted = false; >> EndOfWalRecoveryInfo *endofwal; >> bool haveTblspcMap; > > This is just a matter of taste but the "endofwal" looks somewhat > alien in the variables. Changed to "endOfRecoveryInfo". > > xlog.c: > +void > +SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr) > > Isn't this a function of xlogrecovery.c? Or rather isn't > minRecoveryPoint-related stuff of xlogrecovery.c? Updating the control file is xlog.c's responsibility. There are two different minRecoveryPoints: 1. xlogrecovery.c has a copy of the minRecoveryPoint from the control file, so that it knows when we have reached consistency. 2. xlog.c is responsible for updating the minRecoveryPoint in the control file, after consistency has been reached. SwitchIntoArchiveRecovery() is called on the transition. - Heikki
On 23/11/2021 01:10, Heikki Linnakangas wrote: > Here's a new version. And here's another rebase, now that Robert got rid of ReadRecPtr and EndRecPtr. - Heikki
Attachment
On Wed, Nov 24, 2021 at 12:16 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > And here's another rebase, now that Robert got rid of ReadRecPtr and > EndRecPtr. In general, I think 0001 is a good idea, but the comment that says "Set the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag on the page header" seems to me to be telling the reader about what's already obvious instead of explaining to them the thing they might have missed. GetXLogBuffer() says that it's only safe to use if you hold a WAL insertion lock and don't go backwards, and here you don't hold a WAL insertion lock and I guess you're not going backwards only because you're staying in exactly the same place? It seems to me that the only reason this is safe is because, at the time this is called, only the startup process is able to write WAL, and therefore the race condition that would otherwise exist does not. Even then, I wonder what keeps the buffer from being flushed after we return from XLogInsert() and before we set the bit, and if the answer is that nothing prevents that, whether that's OK. It might be good to talk about these issues too. Just to be clear, I'm not saying that I think the code is broken. But I am concerned about someone using this as precedent for code that runs in some other place, which would be highly likely to be broken, and the way to avoid that is for the comment to explain the tricky points. Also, you've named the parameter to this new function so that it's exactly the same as the global variable. I do approve of trying to pass the value as a parameter instead of relying on a global variable, and I wonder if you could find a way to remove the global variable entirely. But if not, I think the function parameter and the global variable should have different names, because otherwise it's easy for anyone reading the code to get confused about which one is being referenced in any particular spot, and it's also hard to grep. -- Robert Haas EDB: http://www.enterprisedb.com
On 24/11/2021 21:44, Robert Haas wrote: > On Wed, Nov 24, 2021 at 12:16 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> And here's another rebase, now that Robert got rid of ReadRecPtr and >> EndRecPtr. > > In general, I think 0001 is a good idea, but the comment that says > "Set the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag on the page header" > seems to me to be telling the reader about what's already obvious > instead of explaining to them the thing they might have missed. > GetXLogBuffer() says that it's only safe to use if you hold a WAL > insertion lock and don't go backwards, and here you don't hold a WAL > insertion lock and I guess you're not going backwards only because > you're staying in exactly the same place? It seems to me that the only > reason this is safe is because, at the time this is called, only the > startup process is able to write WAL, and therefore the race condition > that would otherwise exist does not. Yeah, its correctness depends on the fact that no other backend is allows to write WAL. > Even then, I wonder what keeps > the buffer from being flushed after we return from XLogInsert() and > before we set the bit, and if the answer is that nothing prevents > that, whether that's OK. It might be good to talk about these issues > too. Hmm. We don't advance LogwrtRqst.Write, so I think a concurrent XLogFlush() would not flush the page. But I agree, that's more accidental than by design and we should be more explicit about it. I changed the code so that it sets the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag in the page header first, and inserts the record only after that. That way, you don't "go backwards". I also added more sanity checks to verify that the record really is inserted where we expect. > Also, you've named the parameter to this new function so that it's > exactly the same as the global variable. I do approve of trying to > pass the value as a parameter instead of relying on a global variable, > and I wonder if you could find a way to remove the global variable > entirely. But if not, I think the function parameter and the global > variable should have different names, because otherwise it's easy for > anyone reading the code to get confused about which one is being > referenced in any particular spot, and it's also hard to grep. Renamed the parameter to 'pagePtr', that describes pretty well what it's used for in the function. Attached is a new patch set. It includes these changes to CreateOverwriteContrecordRecord(), and also a bunch of other small changes: - I moved the code to redo some XLOG record types from xlog_redo() to a new function in xlogrecovery.c. This got rid of the HandleBackupEndRecord() callback function I had to add before. This change is in a separate commit, for easier review. It might make sense to introduce a new rmgr for those record types, but didn't do that for now. - I reordered many of the functions in xlogrecord.c, to group together functions that are used in the initialization, and functions that are called for each WAL record. - Improved comments here and there. - I renamed checkXLogConsistency() to verifyBackupPageConsistency(). I think it describes the function better. There are a bunch of other functions with check* prefix like CheckRecoveryConsistency, CheckTimeLineSwitch, CheckForStandbyTrigger that check for various conditions, so using "check" to mean "verify" here was a bit confusing. I think this is ready for commit now. I'm going to wait a day or two to give everyone a chance to review these latest changes, and then push. - Heikki
Attachment
On 17/12/2021 13:10, Heikki Linnakangas wrote: > I think this is ready for commit now. I'm going to wait a day or two to > give everyone a chance to review these latest changes, and then push. In last round of review, I spotted one bug: I had mixed up the meaning of EndOfLogTLI. It is the TLI in the *filename* of the WAL segment that we read the last record from, which can be different from the TLI that the last record is actually on. All existing tests were passing with that bug, so I added a test case to cover that case. So here's one more set of patches with that fixed, which I plan to commit shortly. - Heikki
Attachment
On Tue, Jan 25, 2022 at 12:12:40PM +0200, Heikki Linnakangas wrote: > In last round of review, I spotted one bug: I had mixed up the meaning of > EndOfLogTLI. It is the TLI in the *filename* of the WAL segment that we read > the last record from, which can be different from the TLI that the last > record is actually on. All existing tests were passing with that bug, so I > added a test case to cover that case. FYI, this overlaps with a different patch sent recently, as of this thread: https://www.postgresql.org/message-id/CAAJ_b94Vjt5cXGza_1MkjLQWciNdEemsmiWuQj0d=M7JfjAa1g@mail.gmail.com -- Michael
Attachment
On 27/01/2022 08:34, Michael Paquier wrote: > On Tue, Jan 25, 2022 at 12:12:40PM +0200, Heikki Linnakangas wrote: >> In last round of review, I spotted one bug: I had mixed up the meaning of >> EndOfLogTLI. It is the TLI in the *filename* of the WAL segment that we read >> the last record from, which can be different from the TLI that the last >> record is actually on. All existing tests were passing with that bug, so I >> added a test case to cover that case. > > FYI, this overlaps with a different patch sent recently, as of this > thread: > https://www.postgresql.org/message-id/CAAJ_b94Vjt5cXGza_1MkjLQWciNdEemsmiWuQj0d=M7JfjAa1g@mail.gmail.com Thanks, I pushed this new test case now. With the rest of the patches, I'm seeing a mysterious failure in cirrus CI, on macOS on the 027_stream_regress.pl test. It doesn't make much sense to me, but I'm investigating that now. - Heikki
On 14/02/2022 11:36, Heikki Linnakangas wrote: > On 27/01/2022 08:34, Michael Paquier wrote: >> On Tue, Jan 25, 2022 at 12:12:40PM +0200, Heikki Linnakangas wrote: >>> In last round of review, I spotted one bug: I had mixed up the meaning of >>> EndOfLogTLI. It is the TLI in the *filename* of the WAL segment that we read >>> the last record from, which can be different from the TLI that the last >>> record is actually on. All existing tests were passing with that bug, so I >>> added a test case to cover that case. >> >> FYI, this overlaps with a different patch sent recently, as of this >> thread: >> https://www.postgresql.org/message-id/CAAJ_b94Vjt5cXGza_1MkjLQWciNdEemsmiWuQj0d=M7JfjAa1g@mail.gmail.com > > Thanks, I pushed this new test case now. > > With the rest of the patches, I'm seeing a mysterious failure in cirrus > CI, on macOS on the 027_stream_regress.pl test. It doesn't make much > sense to me, but I'm investigating that now. Fixed that, and pushed. Thanks everyone for the reviews! - Heikki