Thread: Refactor StartupXLOG?

Refactor StartupXLOG?

From
Thomas Munro
Date:
Hi hackers

StartupXLOG is 1549 lines of code.  Its unwieldy size came up in a
discussion in an IRC channel where some PG hackers lurk and I decided
to see how it might be chopped up into subroutines with a clear
purpose and reasonable size, as an exercise.

I came up with the following on a first pass, though obviously you
could go a lot further:

  StartupXLOG -- reduced to 651 lines
   -> ReportControlFileState() -- 40 lines to log recovery startup message
   -> ReportRecoveryTarget() -- 32 lines to log recovery target message
   -> ProcessBackupLabel() -- 90 lines to read backup label's checkpoint
   -> CleanUpTablespaceMap() -- 33 lines of file cleanup

   -> BeginRedo() -- 191 lines for redo setup
       -> BeginHotStandby() -- 74 lines for hot standby initialisation
   -> ReplayRedo() -- 260 lines for the main redo loop
   -> FinishRedo() -- 63 lines for redo completion

   -> AssignNewTimeLine() -- 66 lines
   -> CleanUpOldTimelineSegments() -- 74 lines of file cleanup
   -> RecoveryCheckpoint() -- 71 lines

Unfortunately the attached sketch patch (not tested much) is totally
unreadable as a result of moving so many things around.  It would
probably need to be done in a series of small well tested readable
patches moving one thing out at a time.  Perhaps with an extra
context/state object to cut down on wide argument lists.

What would the appetite be for that kind of refactoring work,
considering the increased burden on committers who have to backpatch
bug fixes?  Is it a project goal to reduce the size of large
complicated functions like StartupXLOG and heap_update?  It seems like
a good way for new players to learn how they work.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Refactor StartupXLOG?

From
Heikki Linnakangas
Date:
On 09/24/2016 05:01 AM, Thomas Munro wrote:
> What would the appetite be for that kind of refactoring work,
> considering the increased burden on committers who have to backpatch
> bug fixes?  Is it a project goal to reduce the size of large
> complicated functions like StartupXLOG and heap_update?  It seems like
> a good way for new players to learn how they work.

+1. Yes, it does increase the burden of backpatching, but I think it'd 
still be very much worth it.

A couple of little details that caught my eye at a quick read:

>     /* Try to find a backup label. */
>     if (read_backup_label(&checkPointLoc, &backupEndRequired,
>                           &backupFromStandby))
>     {
>         wasShutdown = ProcessBackupLabel(xlogreader, &checkPoint, checkPointLoc,
>                                          &haveTblspcMap);
>
>         /* set flag to delete it later */
>         haveBackupLabel = true;
>     }
>     else
>     {
>         /* Clean up any orphaned tablespace map files with no backup label. */
>         CleanUpTablespaceMap();
> ...

This is a bit asymmetric: In the true-branch, ProcessBackupLabel() reads 
the tablespace map, and sets InArchiveRecovery and StandbyMode, but in 
the false-branch, StartupXLog() calls CleanupTablespaceMap() and sets 
those variables directly.

For functions like BeginRedo, BeginHotStandby, ReplayRedo, etc., I think 
it'd be better to have the "if (InRecovery)" checks in the caller, 
rather than in the functions.

- Heikki



Re: Refactor StartupXLOG?

From
Thomas Munro
Date:
On Sat, Sep 24, 2016 at 8:24 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 09/24/2016 05:01 AM, Thomas Munro wrote:
>>
>> What would the appetite be for that kind of refactoring work,
>> considering the increased burden on committers who have to backpatch
>> bug fixes?  Is it a project goal to reduce the size of large
>> complicated functions like StartupXLOG and heap_update?  It seems like
>> a good way for new players to learn how they work.
>
>
> +1. Yes, it does increase the burden of backpatching, but I think it'd still
> be very much worth it.

Cool.

> A couple of little details that caught my eye at a quick read:
>
>>         /* Try to find a backup label. */
>>         if (read_backup_label(&checkPointLoc, &backupEndRequired,
>>                                                   &backupFromStandby))
>>         {
>>                 wasShutdown = ProcessBackupLabel(xlogreader, &checkPoint,
>> checkPointLoc,
>>
>> &haveTblspcMap);
>>
>>                 /* set flag to delete it later */
>>                 haveBackupLabel = true;
>>         }
>>         else
>>         {
>>                 /* Clean up any orphaned tablespace map files with no
>> backup label. */
>>                 CleanUpTablespaceMap();
>> ...
>
>
> This is a bit asymmetric: In the true-branch, ProcessBackupLabel() reads the
> tablespace map, and sets InArchiveRecovery and StandbyMode, but in the
> false-branch, StartupXLog() calls CleanupTablespaceMap() and sets those
> variables directly.

Right.  I need to move all or some of the other branch out to its own
function too.

> For functions like BeginRedo, BeginHotStandby, ReplayRedo, etc., I think
> it'd be better to have the "if (InRecovery)" checks in the caller, rather
> than in the functions.

Yeah.  I was thinking that someone might value the preservation of
indention level, since that might make small localised bug fixes
easier to backport to the monolithic StartupXLOG.  Plain old
otherwise-redundant curly braces would achieve that.  Or maybe it's
better not to worry about preserving that.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Refactor StartupXLOG?

From
Michael Paquier
Date:
On Sat, Sep 24, 2016 at 11:01 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> What would the appetite be for that kind of refactoring work,
> considering the increased burden on committers who have to backpatch
> bug fixes? Is it a project goal to reduce the size of large
> complicated functions like StartupXLOG and heap_update?  It seems like
> a good way for new players to learn how they work.

A lot of appetite. The size of xlog.c is out of control, so something
that would be really cool to see is spliiting the whole logic of
xlog.c into more independent files, for example low-level file only
operations could go into xlogfile.c, backup code paths in
xlogbackup.c, etc. This would make necessary to expose some of the
shared-memory structures now at the top of xlog.c like XLogCtl but I
think that would be really worth it at the end, and closer to the
things like xloginsert.c and xlogarchive.c that began such a move.
-- 
Michael