Re: Would it be possible to have parallel archiving? - Mailing list pgsql-hackers

From David Steele
Subject Re: Would it be possible to have parallel archiving?
Date
Msg-id 78844d75-3de2-b074-74ee-a4a12b2d5881@pgmasters.net
Whole thread Raw
In response to Re: Would it be possible to have parallel archiving?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Would it be possible to have parallel archiving?
List pgsql-hackers
On 8/28/18 8:32 AM, Stephen Frost wrote:
>
> * hubert depesz lubaczewski (depesz@depesz.com) wrote:
>> I'm in a situation where we quite often generate more WAL than we can
>> archive. The thing is - archiving takes long(ish) time but it's
>> multi-step process and includes talking to remote servers over network.
>>
>> I tested that simply by running archiving in parallel I can easily get
>> 2-3 times higher throughput.
>>
>> But - I'd prefer to keep postgresql knowing what is archived, and what
>> not, so I can't do the parallelization on my own.
>>
>> So, the question is: is it technically possible to have parallel
>> archivization, and would anyone be willing to work on it (sorry, my
>> c skills are basically none, so I can't realistically hack it myself)
>
> Not entirely sure what the concern is around "postgresql knowing what is
> archived", but pgbackrest already does exactly this parallel archiving
> for environments where the WAL volume is larger than a single thread can
> handle, and we've been rewriting it in C specifically to make it fast
> enough to be able to keep PG up-to-date regarding what's been pushed
> already.

To be clear, pgBackRest uses the .ready files in archive_status to
parallelize archiving but still notifies PostgreSQL of completion via
the archive_command mechanism.  We do not modify .ready files to .done
directly.

However, we have optimized the C code to provide ~200
notifications/second (3.2GB/s of WAL transfer) which is enough to keep
up with the workloads we have seen.  Larger WAL segment sizes in PG11
will theoretically increase this to 200GB/s, though in practice CPU to
do the compression will become a major bottleneck, not to mention
network, etc.

Regards,
--
-David
david@pgmasters.net


Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Why hash OIDs?
Next
From: Mariel Cherkassky
Date:
Subject: Catalog corruption