On 8/28/18 8:32 AM, Stephen Frost wrote:
>
> * hubert depesz lubaczewski (depesz@depesz.com) wrote:
>> I'm in a situation where we quite often generate more WAL than we can
>> archive. The thing is - archiving takes long(ish) time but it's
>> multi-step process and includes talking to remote servers over network.
>>
>> I tested that simply by running archiving in parallel I can easily get
>> 2-3 times higher throughput.
>>
>> But - I'd prefer to keep postgresql knowing what is archived, and what
>> not, so I can't do the parallelization on my own.
>>
>> So, the question is: is it technically possible to have parallel
>> archivization, and would anyone be willing to work on it (sorry, my
>> c skills are basically none, so I can't realistically hack it myself)
>
> Not entirely sure what the concern is around "postgresql knowing what is
> archived", but pgbackrest already does exactly this parallel archiving
> for environments where the WAL volume is larger than a single thread can
> handle, and we've been rewriting it in C specifically to make it fast
> enough to be able to keep PG up-to-date regarding what's been pushed
> already.
To be clear, pgBackRest uses the .ready files in archive_status to
parallelize archiving but still notifies PostgreSQL of completion via
the archive_command mechanism. We do not modify .ready files to .done
directly.
However, we have optimized the C code to provide ~200
notifications/second (3.2GB/s of WAL transfer) which is enough to keep
up with the workloads we have seen. Larger WAL segment sizes in PG11
will theoretically increase this to 200GB/s, though in practice CPU to
do the compression will become a major bottleneck, not to mention
network, etc.
Regards,
--
-David
david@pgmasters.net