Re: parallelizing the archiver - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: parallelizing the archiver
Date
Msg-id CABUevEyqLAwRbWOb7_cSrpRZPknF6t4H5Oq7CPdkABp5ECw0CA@mail.gmail.com
Whole thread Raw
In response to Re: parallelizing the archiver  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: parallelizing the archiver
List pgsql-hackers


On Thu, Oct 21, 2021 at 9:51 PM Bossart, Nathan <bossartn@amazon.com> wrote:
On 10/20/21, 3:23 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> Alright, I reworked the patch a bit to maintain backward
> compatibility.  My initial intent for 0001 was to just do a clean
> refactor to move the shell archiving stuff to its own file.  However,
> after I did that, I realized that adding the hook wouldn't be too much
> more work, so I did that as well.  This seems to be enough to support
> custom archiving modules.  I included a basic example of such a module
> in 0002.  0002 is included primarily for demonstration purposes.

It looks like the FreeBSD build is failing because sys/wait.h is
missing.  Here is an attempt at fixing that.

I still like the idea of loading the library via a special parameter, archive_library or such.

One reason for that is that adding/removing modules in shared_preload_libraries has a terrible UX in that you have to replace the whole thing. This makes it much more complex to deal with when different modules just want to add to it.

E.g. my awsome backup program could set archive_library='my_awesome_backups', and know it didn't break anything else. but it couldn't set  shared_preload_libraries='my_awesome_bacukps', because then it might break a bunch of other modules that used to be there. So it has to go try to parse the whole config and figure out where to make such modifications.

Now, this could *also* be solved by allowing shared_preload_library to be a "list" instead of a string, and allow postgresql.conf to accept syntax like shared_preload_libraries+='my_awesome_backups'.

But without that level fo functionality available, I think a separate parameter for the archive library would be a good thing.

Other than that:
+
+/*
+ * Is WAL archiving configured?  For consistency with previous releases, this
+ * checks that archive_command is set when archiving via shell is enabled.
+ * Otherwise, we just check that an archive function is set, and it is the
+ * responsibility of that archive function to ensure it is properly configured.
+ */
+#define XLogArchivingConfigured() \
+       (PG_archive && (PG_archive != shell_archive || XLogArchiveCommand[0] != '\0'))


Wouldn't that be better as a callback into the module? So that shell_archive would implement the check for XLogArchiveCommand. Then another third party module can make it's own decision on what to check. And PGarchive would then be a struct that holds a function pointer to the archive command and another function pointer to the isenabled command? (I think having a struct for it would be useful regardless -- for possible future extensions with more API points).

--

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: parallelizing the archiver
Next
From: Tom Lane
Date:
Subject: Re: Experimenting with hash tables inside pg_dump