Re: parallelizing the archiver - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: parallelizing the archiver
Date
Msg-id 350D9FFB-A933-4EEE-8E21-746A21492623@amazon.com
Whole thread Raw
In response to Re: parallelizing the archiver  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
On 10/19/21, 6:39 AM, "David Steele" <david@pgmasters.net> wrote:
> On 10/19/21 8:50 AM, Robert Haas wrote:
>> I am not quite sure why we wouldn't just compile the functions into
>> the server. Functions pointers can point to core functions as surely
>> as loadable modules. The present design isn't too congenial to that
>> because it's relying on the shared library loading mechanism to wire
>> the thing in place - but there's no reason it has to be that way.
>> Logical decoding plugins don't work that way, for example. We could
>> still have a GUC, say call it archive_method, that selects the module
>> -- with 'shell' being a builtin method, and others being loadable as
>> modules. If you set archive_method='shell' then you enable this
>> module, and it has its own GUC, say call it archive_command, to
>> configure the behavior.
>>
>> An advantage of this approach is that it's perfectly
>> backward-compatible. I understand that archive_command is a hateful
>> thing to many people here, but software has to serve the user base,
>> not just the developers. Lots of people use archive_command and rely
>> on it -- and are not interested in installing yet another piece of
>> out-of-core software to do what $OTHERDB has built in.
>
> +1 to all of this, certainly for the time being. The archive_command
> mechanism is not great, but it is simple, and this part is not really
> what makes writing a good archive command hard.
>
> I had also originally envisioned this a default extension in core, but
> having the default 'shell' method built-in is certainly simpler.

I have no problem building it this way.  It's certainly better for
backward compatibility, which I think everyone here feels is
important.

Robert's proposed design is a bit more like my original proof-of-
concept [0].  There, I added an archive_library GUC which was
basically an extension of shared_preload_libraries (which creates some
interesting problems in the library loading logic).  You could only
set one of archive_command or archive_library at any given time.  When
the archive_library was set, we ran that library's _PG_init() just
like we do for any other library, and then we set the archiver
function pointer to the library's _PG_archive() function.

IIUC the main difference between this design and what Robert proposes
is that we'd also move the existing archive_command stuff somewhere
else and then access it via the archiver function pointer.  I think
that is clearly better than branching based on whether archive_command
or archive_library is set.  (BTW I'm not wedded to these GUCs.  If
folks would rather create something like the archive_method GUC, I
think that would work just as well.)

My original proof-of-concept also attempted to handle a bunch of other
shell command GUCs, but perhaps I'd better keep this focused on
archive_command for now.  What we do here could serve as an example of
how to adjust the other shell command GUCs later on.  I'll go ahead
and rework my patch to look more like what is being discussed here,
although I expect the exact design for the interface will continue to
evolve based on the feedback in this thread.

Nathan

[0] https://postgr.es/m/E9035E94-EC76-436E-B6C9-1C03FBD8EF54%40amazon.com


pgsql-hackers by date:

Previous
From: Sasasu
Date:
Subject: Re: XTS cipher mode for cluster file encryption
Next
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.