Thread: Would it be possible to have parallel archiving?

Would it be possible to have parallel archiving?

From
hubert depesz lubaczewski
Date:
Hi,
I'm in a situation where we quite often generate more WAL than we can
archive. The thing is - archiving takes long(ish) time but it's
multi-step process and includes talking to remote servers over network.

I tested that simply by running archiving in parallel I can easily get
2-3 times higher throughput.

But - I'd prefer to keep postgresql knowing what is archived, and what
not, so I can't do the parallelization on my own.

So, the question is: is it technically possible to have parallel
archivization, and would anyone be willing to work on it (sorry, my
c skills are basically none, so I can't realistically hack it myself)

Best regards,

depesz



Re: Would it be possible to have parallel archiving?

From
Alexander Kukushkin
Date:
Hi,

There is the archive_status directory in pg_wal, and if there are
files with suffixes ".ready", you can archive not only the file which
was requested, but quite a few more if there are  ".ready" files
available. After that you have to rename ".ready" to ".done". Postgres
will not call archive_command for files which already marked as
".done".

I think most of the good backup tools already doing that. For example,
pgBackRest, wal-e, wal-g (just named the tools I was working with)/

Regards,
--
Alexander Kukushkin


Re: Would it be possible to have parallel archiving?

From
hubert depesz lubaczewski
Date:
On Tue, Aug 28, 2018 at 08:33:11AM +0200, Alexander Kukushkin wrote:
> There is the archive_status directory in pg_wal, and if there are
> files with suffixes ".ready", you can archive not only the file which
> was requested, but quite a few more if there are  ".ready" files
> available. After that you have to rename ".ready" to ".done". Postgres
> will not call archive_command for files which already marked as
> ".done".

Ah, I was missing this information. Thanks a lot.

Best regards,

depesz



Re: Would it be possible to have parallel archiving?

From
Stephen Frost
Date:
Greetings,

* hubert depesz lubaczewski (depesz@depesz.com) wrote:
> I'm in a situation where we quite often generate more WAL than we can
> archive. The thing is - archiving takes long(ish) time but it's
> multi-step process and includes talking to remote servers over network.
>
> I tested that simply by running archiving in parallel I can easily get
> 2-3 times higher throughput.
>
> But - I'd prefer to keep postgresql knowing what is archived, and what
> not, so I can't do the parallelization on my own.
>
> So, the question is: is it technically possible to have parallel
> archivization, and would anyone be willing to work on it (sorry, my
> c skills are basically none, so I can't realistically hack it myself)

Not entirely sure what the concern is around "postgresql knowing what is
archived", but pgbackrest already does exactly this parallel archiving
for environments where the WAL volume is larger than a single thread can
handle, and we've been rewriting it in C specifically to make it fast
enough to be able to keep PG up-to-date regarding what's been pushed
already.

Happy to discuss it further, as well as other related topics and how
backup software could be given better APIs to tell PG what's been
archived, etc.

Thanks!

Stephen

Attachment

Re: Would it be possible to have parallel archiving?

From
David Steele
Date:
On 8/28/18 8:32 AM, Stephen Frost wrote:
>
> * hubert depesz lubaczewski (depesz@depesz.com) wrote:
>> I'm in a situation where we quite often generate more WAL than we can
>> archive. The thing is - archiving takes long(ish) time but it's
>> multi-step process and includes talking to remote servers over network.
>>
>> I tested that simply by running archiving in parallel I can easily get
>> 2-3 times higher throughput.
>>
>> But - I'd prefer to keep postgresql knowing what is archived, and what
>> not, so I can't do the parallelization on my own.
>>
>> So, the question is: is it technically possible to have parallel
>> archivization, and would anyone be willing to work on it (sorry, my
>> c skills are basically none, so I can't realistically hack it myself)
>
> Not entirely sure what the concern is around "postgresql knowing what is
> archived", but pgbackrest already does exactly this parallel archiving
> for environments where the WAL volume is larger than a single thread can
> handle, and we've been rewriting it in C specifically to make it fast
> enough to be able to keep PG up-to-date regarding what's been pushed
> already.

To be clear, pgBackRest uses the .ready files in archive_status to
parallelize archiving but still notifies PostgreSQL of completion via
the archive_command mechanism.  We do not modify .ready files to .done
directly.

However, we have optimized the C code to provide ~200
notifications/second (3.2GB/s of WAL transfer) which is enough to keep
up with the workloads we have seen.  Larger WAL segment sizes in PG11
will theoretically increase this to 200GB/s, though in practice CPU to
do the compression will become a major bottleneck, not to mention
network, etc.

Regards,
--
-David
david@pgmasters.net


Attachment

Re: Would it be possible to have parallel archiving?

From
Stephen Frost
Date:
Greetings,

* David Steele (david@pgmasters.net) wrote:
> On 8/28/18 8:32 AM, Stephen Frost wrote:
> >
> > * hubert depesz lubaczewski (depesz@depesz.com) wrote:
> >> I'm in a situation where we quite often generate more WAL than we can
> >> archive. The thing is - archiving takes long(ish) time but it's
> >> multi-step process and includes talking to remote servers over network.
> >>
> >> I tested that simply by running archiving in parallel I can easily get
> >> 2-3 times higher throughput.
> >>
> >> But - I'd prefer to keep postgresql knowing what is archived, and what
> >> not, so I can't do the parallelization on my own.
> >>
> >> So, the question is: is it technically possible to have parallel
> >> archivization, and would anyone be willing to work on it (sorry, my
> >> c skills are basically none, so I can't realistically hack it myself)
> >
> > Not entirely sure what the concern is around "postgresql knowing what is
> > archived", but pgbackrest already does exactly this parallel archiving
> > for environments where the WAL volume is larger than a single thread can
> > handle, and we've been rewriting it in C specifically to make it fast
> > enough to be able to keep PG up-to-date regarding what's been pushed
> > already.
>
> To be clear, pgBackRest uses the .ready files in archive_status to
> parallelize archiving but still notifies PostgreSQL of completion via
> the archive_command mechanism.  We do not modify .ready files to .done
> directly.

Right, we don't recommend mucking around with that directory of files.
Even if that works today (which you'd need to test extensively...),
there's no guarantee that it'll work and do what you want in the
future...

> However, we have optimized the C code to provide ~200
> notifications/second (3.2GB/s of WAL transfer) which is enough to keep
> up with the workloads we have seen.  Larger WAL segment sizes in PG11
> will theoretically increase this to 200GB/s, though in practice CPU to
> do the compression will become a major bottleneck, not to mention
> network, etc.

Agreed.

Thanks!

Stephen

Attachment

Re: Would it be possible to have parallel archiving?

From
Andrey Borodin
Date:


28 авг. 2018 г., в 14:08, Stephen Frost <sfrost@snowman.net> написал(а):

Greetings,

* David Steele (david@pgmasters.net) wrote:
On 8/28/18 8:32 AM, Stephen Frost wrote:
To be clear, pgBackRest uses the .ready files in archive_status to
parallelize archiving but still notifies PostgreSQL of completion via
the archive_command mechanism.  We do not modify .ready files to .done
directly.

Right, we don't recommend mucking around with that directory of files.
Even if that works today (which you'd need to test extensively...),
there's no guarantee that it'll work and do what you want in the
future...
WAL-G modifies archive_status files.
This path was chosen to limit state preserved between WAL-G runs (archiving to S3) and further push archiving performance.
Indeed, it was very hard to test. Also, this makes impossible to use two archiving system simultaneously for transit period.

Best regards, Andrey Borodin.

Re: Would it be possible to have parallel archiving?

From
Stephen Frost
Date:
Greetings,

* Andrey Borodin (x4mmm@yandex-team.ru) wrote:
> > 28 авг. 2018 г., в 14:08, Stephen Frost <sfrost@snowman.net> написал(а):
> > * David Steele (david@pgmasters.net <mailto:david@pgmasters.net>) wrote:
> >> On 8/28/18 8:32 AM, Stephen Frost wrote:
> >> To be clear, pgBackRest uses the .ready files in archive_status to
> >> parallelize archiving but still notifies PostgreSQL of completion via
> >> the archive_command mechanism.  We do not modify .ready files to .done
> >> directly.
> >
> > Right, we don't recommend mucking around with that directory of files.
> > Even if that works today (which you'd need to test extensively...),
> > there's no guarantee that it'll work and do what you want in the
> > future...
> WAL-G modifies archive_status files.

Frankly, I've heard far too many concerns and issues with WAL-G to
consider anything it does at all sensible.

> This path was chosen to limit state preserved between WAL-G runs (archiving to S3) and further push archiving
performance.

I still don't think it's a good idea and I specifically recommend
against making changes to the archive status files- those are clearly
owned and managed by PG and should not be whacked around by external
processes.

> Indeed, it was very hard to test. Also, this makes impossible to use two archiving system simultaneously for transit
period.

The testing in WAL-G seems to be rather lacking from what I've seen.

It's not clear to me what you're suggesting wrt 'two archiving system
simultaneously for transit period', but certainly whacking the archive
status files around doesn't make it easier to have multiple archive
commands trying to simultaneously archive WAL files.

Thanks!

Stephen

Attachment

Re: Would it be possible to have parallel archiving?

From
Andrey Borodin
Date:

> 28 авг. 2018 г., в 17:07, Stephen Frost <sfrost@snowman.net> написал(а):
>
> Greetings,
>
> * Andrey Borodin (x4mmm@yandex-team.ru) wrote:
>>> 28 авг. 2018 г., в 14:08, Stephen Frost <sfrost@snowman.net> написал(а):
>>> * David Steele (david@pgmasters.net <mailto:david@pgmasters.net>) wrote:
>>>> On 8/28/18 8:32 AM, Stephen Frost wrote:
>>>> To be clear, pgBackRest uses the .ready files in archive_status to
>>>> parallelize archiving but still notifies PostgreSQL of completion via
>>>> the archive_command mechanism.  We do not modify .ready files to .done
>>>> directly.
>>>
>>> Right, we don't recommend mucking around with that directory of files.
>>> Even if that works today (which you'd need to test extensively...),
>>> there's no guarantee that it'll work and do what you want in the
>>> future...
>> WAL-G modifies archive_status files.
>
> Frankly, I've heard far too many concerns and issues with WAL-G to
> consider anything it does at all sensible.
Umm.. very interesting. What kind of issues? There are few on github repo, all of them will be addressed. Do you have
someother reports? Can you share it? 
I'm open to discuss any concerns.

>
>> This path was chosen to limit state preserved between WAL-G runs (archiving to S3) and further push archiving
performance.
>
> I still don't think it's a good idea and I specifically recommend
> against making changes to the archive status files- those are clearly
> owned and managed by PG and should not be whacked around by external
> processes.
If you do not write to archive_status, you basically have two options:
1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts
performance.
2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone to
core changes as writes are. 

Well, PostgreSQL clearly have the problem which can be solved by good parallel archiving API. Anything else - is
whackingaround, just reading archive_status is nothing better that reading and writing. 

>
>> Indeed, it was very hard to test. Also, this makes impossible to use two archiving system simultaneously for transit
period.
>
> The testing in WAL-G seems to be rather lacking from what I've seen.
Indeed, WAL-G still lacks automatic integration tests, I hope that some dockerized tests will be added soon.
By now I'm doing automated QA in Yandex infrastructure.

Best regards, Andrey Borodin.

Re: Would it be possible to have parallel archiving?

From
Stephen Frost
Date:
Greetings,

* Andrey Borodin (x4mmm@yandex-team.ru) wrote:
> > 28 авг. 2018 г., в 17:07, Stephen Frost <sfrost@snowman.net> написал(а):
> > I still don't think it's a good idea and I specifically recommend
> > against making changes to the archive status files- those are clearly
> > owned and managed by PG and should not be whacked around by external
> > processes.
> If you do not write to archive_status, you basically have two options:
> 1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts
performance.

It's absolutely important to make sure that the files PG is asking to
archive have actually been archived, yes.

> 2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone to
core changes as writes are. 

This blindly assumes that PG won't care about some other process
whacking around archive status files and I don't think that's a good
assumption to be operating under, and certainly not under the claim
that it's simply a 'performance' improvement.

> Well, PostgreSQL clearly have the problem which can be solved by good parallel archiving API. Anything else - is
whackingaround, just reading archive_status is nothing better that reading and writing. 

Pushing files which are indicated by archive status as being ready is
absolutely an entirely different thing from whacking around the status
files themselves which PG is managing itself.

Thanks!

Stephen

Attachment

Re: Would it be possible to have parallel archiving?

From
Andrey Borodin
Date:

> 28 авг. 2018 г., в 17:41, Stephen Frost <sfrost@snowman.net> написал(а):
>
> Pushing files which are indicated by archive status as being ready is
> absolutely an entirely different thing from whacking around the status
> files themselves which PG is managing itself.
I disagree.
Returning archive_command exit code as a result of prior reading archive_status is not safe.
"absolutely an entirely different thing" is a speculation just like "jumping out of 5th floor is safer than jumping out
10th".If archive is not to be monitored properly - do not whack with archive_status at all. pgBackRest is no safer that
WAL-Gin this aspect. They are prone to the same conditions, changing behavior of archive_status will affect them both. 
I'm aware of the issue and monitor PG changes in this aspect. I do not pretend that there cannot be any problem at all
andthis method will stay safe forever. But now it is. 

Best regards, Andrey Borodin.

Re: Would it be possible to have parallel archiving?

From
David Steele
Date:
On 8/28/18 4:34 PM, Andrey Borodin wrote:
>>
>> I still don't think it's a good idea and I specifically recommend
>> against making changes to the archive status files- those are clearly
>> owned and managed by PG and should not be whacked around by external
>> processes.
> If you do not write to archive_status, you basically have two options:
> 1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts
performance.
> 2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone to
core changes as writes are.
 

Another option is to maintain the state of what has been safely archived
(and what has errored) locally.  This allows pgBackRest to rapidly
return the status to Postgres without rechecking against the repository,
which as you note would be very slow.

This allows more than one archive_command to be safely run since all
archive commands must succeed before Postgres will mark the segment as done.

It's true that reading archive_status is susceptible to core changes but
the less interaction the better, I think.

Regards,
-- 
-David
david@pgmasters.net


Re: Would it be possible to have parallel archiving?

From
Stephen Frost
Date:
Greetings,

* David Steele (david@pgmasters.net) wrote:
> On 8/28/18 4:34 PM, Andrey Borodin wrote:
> >>
> >> I still don't think it's a good idea and I specifically recommend
> >> against making changes to the archive status files- those are clearly
> >> owned and managed by PG and should not be whacked around by external
> >> processes.
> > If you do not write to archive_status, you basically have two options:
> > 1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts
performance.
> > 2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone
tocore  changes as writes are. 
>
> Another option is to maintain the state of what has been safely archived
> (and what has errored) locally.  This allows pgBackRest to rapidly
> return the status to Postgres without rechecking against the repository,
> which as you note would be very slow.
>
> This allows more than one archive_command to be safely run since all
> archive commands must succeed before Postgres will mark the segment as done.
>
> It's true that reading archive_status is susceptible to core changes but
> the less interaction the better, I think.

Absolutely.  External processes shouldn't be changing the files written
out and managed by the core system.  pgbackrest is *much* safer than
alternatives which hack around files inside of PGDATA.  We've been
working to move things forward to the point where pgbackrest is able to
be run as a user who hasn't even got access to modify those files (which
has now landed in PG11) and for good reason- it's outright dangerous to
do.

Thanks!

Stephen

Attachment

Re: Would it be possible to have parallel archiving?

From
Michael Paquier
Date:
On Tue, Aug 28, 2018 at 05:20:10PM -0400, Stephen Frost wrote:
> Absolutely.  External processes shouldn't be changing the files written
> out and managed by the core system.

+1.  Archiving commands should not mess up directly with the contents
the backend is managing.
--
Michael

Attachment