Re: parallelizing the archiver - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: parallelizing the archiver
Date
Msg-id 20211005022041.GL20998@tamriel.snowman.net
Whole thread Raw
In response to Re: parallelizing the archiver  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: parallelizing the archiver  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 10/1/21, 12:08 PM, "Andrey Borodin" <x4mmm@yandex-team.ru> wrote:
> > 30 сент. 2021 г., в 09:47, Bossart, Nathan <bossartn@amazon.com> написал(а):
> >> Of course, there are drawbacks to using an extension.  Besides the
> >> obvious added complexity of building an extension in C versus writing
> >> a shell command, the patch disallows changing the libraries without
> >> restarting the server.  Also, the patch makes no effort to simplify
> >> error handling, memory management, etc.  This is left as an exercise
> >> for the extension author.
> > I think the real problem with extension is quite different than mentioned above.
> > There are many archive tools that already feature parallel archiving. PgBackrest, wal-e, wal-g, pg_probackup,
pghoard,pgbarman and others. These tools by far outweight tools that don't look into archive_status to parallelize
archival.
> > And we are going to ask them: add also a C extension without any feasible benefit to the user. You only get some
restrictionslike system restart to enable shared library. 
> >
> > I think we need a design that legalises already existing de-facto standard features in archive tools. Or event
better- enables these tools to be more efficient, reliable etc. Either way we will create legacy code from the scratch. 
>
> My proposal wouldn't require any changes to any of these utilities.
> This design just adds a new mechanism that would allow end users to
> set up archiving a different way with less overhead in hopes that it
> will help them keep up.  I suspect a lot of work has been put into the
> archive tools you mentioned to make sure they can keep up with high
> rates of WAL generation, so I'm skeptical that anything we do here
> will really benefit them all that much.  Ideally, we'd do something
> that improves matters for everyone, though.  I'm open to suggestions.

This has something we've contemplated quite a bit and the last thing
that I'd want to have is a requirement to configure a whole bunch of
additional parameters to enable this.  Why do we need to have some many
new GUCs?  I would have thought we'd probably be able to get away with
just having the appropriate hooks and then telling folks to load the
extension in shared_preload_libraries..

As for the hooks themselves, I'd certainly hope that they'd be designed
to handle batches of WAL rather than individual ones as that's long been
one of the main issues with the existing archive command approach.  I
appreciate that maybe that's less of an issue with a shared library but
it's still something to consider.

Admittedly, I haven't looked in depth with this patch set and am just
going off of the description of them provided in the thread, so perhaps
I missed something.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Next
From: Stephen Frost
Date:
Subject: Re: Triage on old commitfest entries