Re: Split xlog.c - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Split xlog.c
Date
Msg-id 20210616230022.waxyeunbkzr4u36i@alap3.anarazel.de
Whole thread Raw
In response to Split xlog.c  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Split xlog.c  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Hi,

On 2021-06-16 16:30:45 +0300, Heikki Linnakangas wrote:
> xlog.c is very large. We've split off some functions from it over the years,
> but it's still large and it keeps growing.
>
> Attached is a proposal to split functions related to WAL replay, standby
> mode, fetching files from archive, computing the recovery target and so on,
> to new source file called xlogrecovery.c.

Wohoo!

I think this is desperately needed. I personally am more concerned about
the size of StartupXLOG() etc than the size of xlog.c itself, but since
both reasonably are done at the same time...


> That's a fairly clean split.  StartupXLOG() stays in xlog.c, but much of the
> code from it has been moved to new functions InitWalRecovery(),
> PerformWalRecovery() and EndWalRecovery(). The general idea is that xlog.c is
> still responsible for orchestrating the servers startup, but xlogrecovery.c
> is responsible for figuring out whether WAL recovery is needed, performing
> it, and deciding when it can stop.

For some reason "recovery" bothers me a tiny bit, even though it's obviously
already in use. Using "apply", or "replay" seems more descriptive to me, but
whatever.


> There's surely more refactoring we could do. xlog.c has a lot of global
> variables, with similar names but slightly different meanings for example.
> (Quick: what's the difference between InRedo, InRecovery, InArchiveRecovery,
> and RecoveryInProgress()? I have to go check the code every time to remind
> myself). But this patch tries to just move source code around for clarity.

Agreed, it's quite chaotic. I think a good initial step to clean up that mess
would be to just collect the relevant variables into one or two structs.


> There are small changes in the order that some of things are done in
> StartupXLOG(), for readability. I tried to be careful and check that the
> changes are safe, but a second pair of eyes would be appreciated on that.

I think it might be worth trying to break this into a bit more incremental
changes - it's a huge commit and mixing code movement with code changes makes
it really hard to review the non-movement portion.

> +void
> +PerformWalRecovery(void)
> +{

> +
> +    if (record != NULL)
> +    {
> +        ErrorContextCallback errcallback;
> +        TimestampTz xtime;
> +        PGRUsage    ru0;
> +        XLogRecPtr    ReadRecPtr;
> +        XLogRecPtr    EndRecPtr;
> +
> +        pg_rusage_init(&ru0);
> +
> +        InRedo = true;
> +
> +        /* Initialize resource managers */
> +        for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
> +        {
> +            if (RmgrTable[rmid].rm_startup != NULL)
> +                RmgrTable[rmid].rm_startup();
> +        }
> +
> +        ereport(LOG,
> +                (errmsg("redo starts at %X/%X",
> +                        LSN_FORMAT_ARGS(xlogreader->ReadRecPtr))));
> +
> +        /*
> +         * main redo apply loop
> +         */
> +        do
> +        {

If we're refactoring all of this, can we move the apply-one-record part into
its own function as well? Happy to do that as a followup or precursor patch
too. The per-record logic has grown complicated enough to make that quite
worthwhile imo - and imo most of the time one either is interested in the
per-record work, or in the rest of the StartupXLog/PerformWalRecovery logic.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Outdated replication protocol error?
Next
From: Andres Freund
Date:
Subject: Re: Outdated replication protocol error?