Re: parallelizing the archiver - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: parallelizing the archiver
Date
Msg-id BE2F4111-B79C-418B-9A10-6FC9C31A6730@amazon.com
Whole thread Raw
In response to Re: parallelizing the archiver  (Magnus Hagander <magnus@hagander.net>)
Responses Re: parallelizing the archiver
List pgsql-hackers
On 10/22/21, 7:43 AM, "Magnus Hagander" <magnus@hagander.net> wrote:
> I still like the idea of loading the library via a special
> parameter, archive_library or such.

My first attempt [0] added a GUC like this, so I can speak to some of
the interesting design decisions that follow.

The simplest thing we could do would be to add the archive_library GUC
and to load that just like the library is at the end of
shared_preload_libraries.  This would mean that the archive library
could be specified in either GUC, and there would effectively be no
difference between the two.

The next thing we could consider doing is adding a new boolean called
process_archive_library_in_progress, which would be analogous to
process_shared_preload_libraries_in_progress.  If a library is loaded
from the archive_library GUC, its _PG_init() will be called with
process_archive_library_in_progress set.  This also means that if a
library is specified in both shared_preload_libraries and
archive_library, we'd call its _PG_init() twice.  The library could
then branch based on whether
process_shared_preload_libraries_in_progress or
process_archive_library_in_progress was set.

Another approach would be to add a new initialization function (e.g.,
PG_archive_init()) that would be used if the library is being loaded
from archive_library.  Like before, you can use the library for both
shared_preload_libraries and archive_library, but your initialization
logic would be expected to go in separate functions.  However, there
still wouldn't be anything forcing that.  A library could still break
the rules and do everything in _PG_init() and be loaded via
shared_preload_libraries.

One more thing we could do is to discover the relevant symbols for
archiving in library loading function.  Rather than expecting the
initialization function to set the hook correctly, we'd just look up
the _PG_archive() function during loading.  Again, a library could
probably still break the rules and do everything in
_PG_init()/shared_preload_libraries, but there would at least be a
nicer interface available.

I believe the main drawbacks of going down this path are the
additional complexity in the backend and the slippery slope of adding
all kinds of new GUCs in the future.  My original patch also tried to
do something similar for some other shell command GUCs
(archive_cleanup_command, restore_command, and recovery_end_command).
While I'm going to try to keep this focused on archive_command for
now, presumably we'd eventually want the ability to use hooks for all
of these things.  I don't know if we really want to incur a new GUC
for every single one of these.  To be clear, I'm not against adding a
GUC if it seems like the right thing to do.  I just want to make sure
we are aware of the tradeoffs compared to a simple
shared_preload_libraries approach with its terrible UX.

> Wouldn't that be better as a callback into the module? So that
> shell_archive would implement the check for XLogArchiveCommand. Then
> another third party module can make it's own decision on what to
> check. And PGarchive would then be a struct that holds a function
> pointer to the archive command and another function pointer to the
> isenabled command? (I think having a struct for it would be useful
> regardless -- for possible future extensions with more API points).

+1.  This crossed my mind, too.  I'll add this in the next revision.

Nathan

[0] https://postgr.es/m/E9035E94-EC76-436E-B6C9-1C03FBD8EF54%40amazon.com


pgsql-hackers by date:

Previous
From: Japin Li
Date:
Subject: Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber
Next
From: "Bossart, Nathan"
Date:
Subject: Re: Make mesage at end-of-recovery less scary.