Thread: Would it be possible to have parallel archiving?
Hi, I'm in a situation where we quite often generate more WAL than we can archive. The thing is - archiving takes long(ish) time but it's multi-step process and includes talking to remote servers over network. I tested that simply by running archiving in parallel I can easily get 2-3 times higher throughput. But - I'd prefer to keep postgresql knowing what is archived, and what not, so I can't do the parallelization on my own. So, the question is: is it technically possible to have parallel archivization, and would anyone be willing to work on it (sorry, my c skills are basically none, so I can't realistically hack it myself) Best regards, depesz
Hi, There is the archive_status directory in pg_wal, and if there are files with suffixes ".ready", you can archive not only the file which was requested, but quite a few more if there are ".ready" files available. After that you have to rename ".ready" to ".done". Postgres will not call archive_command for files which already marked as ".done". I think most of the good backup tools already doing that. For example, pgBackRest, wal-e, wal-g (just named the tools I was working with)/ Regards, -- Alexander Kukushkin
On Tue, Aug 28, 2018 at 08:33:11AM +0200, Alexander Kukushkin wrote: > There is the archive_status directory in pg_wal, and if there are > files with suffixes ".ready", you can archive not only the file which > was requested, but quite a few more if there are ".ready" files > available. After that you have to rename ".ready" to ".done". Postgres > will not call archive_command for files which already marked as > ".done". Ah, I was missing this information. Thanks a lot. Best regards, depesz
Greetings, * hubert depesz lubaczewski (depesz@depesz.com) wrote: > I'm in a situation where we quite often generate more WAL than we can > archive. The thing is - archiving takes long(ish) time but it's > multi-step process and includes talking to remote servers over network. > > I tested that simply by running archiving in parallel I can easily get > 2-3 times higher throughput. > > But - I'd prefer to keep postgresql knowing what is archived, and what > not, so I can't do the parallelization on my own. > > So, the question is: is it technically possible to have parallel > archivization, and would anyone be willing to work on it (sorry, my > c skills are basically none, so I can't realistically hack it myself) Not entirely sure what the concern is around "postgresql knowing what is archived", but pgbackrest already does exactly this parallel archiving for environments where the WAL volume is larger than a single thread can handle, and we've been rewriting it in C specifically to make it fast enough to be able to keep PG up-to-date regarding what's been pushed already. Happy to discuss it further, as well as other related topics and how backup software could be given better APIs to tell PG what's been archived, etc. Thanks! Stephen
Attachment
On 8/28/18 8:32 AM, Stephen Frost wrote: > > * hubert depesz lubaczewski (depesz@depesz.com) wrote: >> I'm in a situation where we quite often generate more WAL than we can >> archive. The thing is - archiving takes long(ish) time but it's >> multi-step process and includes talking to remote servers over network. >> >> I tested that simply by running archiving in parallel I can easily get >> 2-3 times higher throughput. >> >> But - I'd prefer to keep postgresql knowing what is archived, and what >> not, so I can't do the parallelization on my own. >> >> So, the question is: is it technically possible to have parallel >> archivization, and would anyone be willing to work on it (sorry, my >> c skills are basically none, so I can't realistically hack it myself) > > Not entirely sure what the concern is around "postgresql knowing what is > archived", but pgbackrest already does exactly this parallel archiving > for environments where the WAL volume is larger than a single thread can > handle, and we've been rewriting it in C specifically to make it fast > enough to be able to keep PG up-to-date regarding what's been pushed > already. To be clear, pgBackRest uses the .ready files in archive_status to parallelize archiving but still notifies PostgreSQL of completion via the archive_command mechanism. We do not modify .ready files to .done directly. However, we have optimized the C code to provide ~200 notifications/second (3.2GB/s of WAL transfer) which is enough to keep up with the workloads we have seen. Larger WAL segment sizes in PG11 will theoretically increase this to 200GB/s, though in practice CPU to do the compression will become a major bottleneck, not to mention network, etc. Regards, -- -David david@pgmasters.net
Attachment
Greetings, * David Steele (david@pgmasters.net) wrote: > On 8/28/18 8:32 AM, Stephen Frost wrote: > > > > * hubert depesz lubaczewski (depesz@depesz.com) wrote: > >> I'm in a situation where we quite often generate more WAL than we can > >> archive. The thing is - archiving takes long(ish) time but it's > >> multi-step process and includes talking to remote servers over network. > >> > >> I tested that simply by running archiving in parallel I can easily get > >> 2-3 times higher throughput. > >> > >> But - I'd prefer to keep postgresql knowing what is archived, and what > >> not, so I can't do the parallelization on my own. > >> > >> So, the question is: is it technically possible to have parallel > >> archivization, and would anyone be willing to work on it (sorry, my > >> c skills are basically none, so I can't realistically hack it myself) > > > > Not entirely sure what the concern is around "postgresql knowing what is > > archived", but pgbackrest already does exactly this parallel archiving > > for environments where the WAL volume is larger than a single thread can > > handle, and we've been rewriting it in C specifically to make it fast > > enough to be able to keep PG up-to-date regarding what's been pushed > > already. > > To be clear, pgBackRest uses the .ready files in archive_status to > parallelize archiving but still notifies PostgreSQL of completion via > the archive_command mechanism. We do not modify .ready files to .done > directly. Right, we don't recommend mucking around with that directory of files. Even if that works today (which you'd need to test extensively...), there's no guarantee that it'll work and do what you want in the future... > However, we have optimized the C code to provide ~200 > notifications/second (3.2GB/s of WAL transfer) which is enough to keep > up with the workloads we have seen. Larger WAL segment sizes in PG11 > will theoretically increase this to 200GB/s, though in practice CPU to > do the compression will become a major bottleneck, not to mention > network, etc. Agreed. Thanks! Stephen
Attachment
WAL-G modifies archive_status files.28 авг. 2018 г., в 14:08, Stephen Frost <sfrost@snowman.net> написал(а):Greetings,
* David Steele (david@pgmasters.net) wrote:On 8/28/18 8:32 AM, Stephen Frost wrote:
To be clear, pgBackRest uses the .ready files in archive_status to
parallelize archiving but still notifies PostgreSQL of completion via
the archive_command mechanism. We do not modify .ready files to .done
directly.
Right, we don't recommend mucking around with that directory of files.
Even if that works today (which you'd need to test extensively...),
there's no guarantee that it'll work and do what you want in the
future...
This path was chosen to limit state preserved between WAL-G runs (archiving to S3) and further push archiving performance.
Indeed, it was very hard to test. Also, this makes impossible to use two archiving system simultaneously for transit period.
Best regards, Andrey Borodin.
Greetings, * Andrey Borodin (x4mmm@yandex-team.ru) wrote: > > 28 авг. 2018 г., в 14:08, Stephen Frost <sfrost@snowman.net> написал(а): > > * David Steele (david@pgmasters.net <mailto:david@pgmasters.net>) wrote: > >> On 8/28/18 8:32 AM, Stephen Frost wrote: > >> To be clear, pgBackRest uses the .ready files in archive_status to > >> parallelize archiving but still notifies PostgreSQL of completion via > >> the archive_command mechanism. We do not modify .ready files to .done > >> directly. > > > > Right, we don't recommend mucking around with that directory of files. > > Even if that works today (which you'd need to test extensively...), > > there's no guarantee that it'll work and do what you want in the > > future... > WAL-G modifies archive_status files. Frankly, I've heard far too many concerns and issues with WAL-G to consider anything it does at all sensible. > This path was chosen to limit state preserved between WAL-G runs (archiving to S3) and further push archiving performance. I still don't think it's a good idea and I specifically recommend against making changes to the archive status files- those are clearly owned and managed by PG and should not be whacked around by external processes. > Indeed, it was very hard to test. Also, this makes impossible to use two archiving system simultaneously for transit period. The testing in WAL-G seems to be rather lacking from what I've seen. It's not clear to me what you're suggesting wrt 'two archiving system simultaneously for transit period', but certainly whacking the archive status files around doesn't make it easier to have multiple archive commands trying to simultaneously archive WAL files. Thanks! Stephen
Attachment
> 28 авг. 2018 г., в 17:07, Stephen Frost <sfrost@snowman.net> написал(а): > > Greetings, > > * Andrey Borodin (x4mmm@yandex-team.ru) wrote: >>> 28 авг. 2018 г., в 14:08, Stephen Frost <sfrost@snowman.net> написал(а): >>> * David Steele (david@pgmasters.net <mailto:david@pgmasters.net>) wrote: >>>> On 8/28/18 8:32 AM, Stephen Frost wrote: >>>> To be clear, pgBackRest uses the .ready files in archive_status to >>>> parallelize archiving but still notifies PostgreSQL of completion via >>>> the archive_command mechanism. We do not modify .ready files to .done >>>> directly. >>> >>> Right, we don't recommend mucking around with that directory of files. >>> Even if that works today (which you'd need to test extensively...), >>> there's no guarantee that it'll work and do what you want in the >>> future... >> WAL-G modifies archive_status files. > > Frankly, I've heard far too many concerns and issues with WAL-G to > consider anything it does at all sensible. Umm.. very interesting. What kind of issues? There are few on github repo, all of them will be addressed. Do you have someother reports? Can you share it? I'm open to discuss any concerns. > >> This path was chosen to limit state preserved between WAL-G runs (archiving to S3) and further push archiving performance. > > I still don't think it's a good idea and I specifically recommend > against making changes to the archive status files- those are clearly > owned and managed by PG and should not be whacked around by external > processes. If you do not write to archive_status, you basically have two options: 1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts performance. 2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone to core changes as writes are. Well, PostgreSQL clearly have the problem which can be solved by good parallel archiving API. Anything else - is whackingaround, just reading archive_status is nothing better that reading and writing. > >> Indeed, it was very hard to test. Also, this makes impossible to use two archiving system simultaneously for transit period. > > The testing in WAL-G seems to be rather lacking from what I've seen. Indeed, WAL-G still lacks automatic integration tests, I hope that some dockerized tests will be added soon. By now I'm doing automated QA in Yandex infrastructure. Best regards, Andrey Borodin.
Greetings, * Andrey Borodin (x4mmm@yandex-team.ru) wrote: > > 28 авг. 2018 г., в 17:07, Stephen Frost <sfrost@snowman.net> написал(а): > > I still don't think it's a good idea and I specifically recommend > > against making changes to the archive status files- those are clearly > > owned and managed by PG and should not be whacked around by external > > processes. > If you do not write to archive_status, you basically have two options: > 1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts performance. It's absolutely important to make sure that the files PG is asking to archive have actually been archived, yes. > 2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone to core changes as writes are. This blindly assumes that PG won't care about some other process whacking around archive status files and I don't think that's a good assumption to be operating under, and certainly not under the claim that it's simply a 'performance' improvement. > Well, PostgreSQL clearly have the problem which can be solved by good parallel archiving API. Anything else - is whackingaround, just reading archive_status is nothing better that reading and writing. Pushing files which are indicated by archive status as being ready is absolutely an entirely different thing from whacking around the status files themselves which PG is managing itself. Thanks! Stephen
Attachment
> 28 авг. 2018 г., в 17:41, Stephen Frost <sfrost@snowman.net> написал(а): > > Pushing files which are indicated by archive status as being ready is > absolutely an entirely different thing from whacking around the status > files themselves which PG is managing itself. I disagree. Returning archive_command exit code as a result of prior reading archive_status is not safe. "absolutely an entirely different thing" is a speculation just like "jumping out of 5th floor is safer than jumping out 10th".If archive is not to be monitored properly - do not whack with archive_status at all. pgBackRest is no safer that WAL-Gin this aspect. They are prone to the same conditions, changing behavior of archive_status will affect them both. I'm aware of the issue and monitor PG changes in this aspect. I do not pretend that there cannot be any problem at all andthis method will stay safe forever. But now it is. Best regards, Andrey Borodin.
On 8/28/18 4:34 PM, Andrey Borodin wrote: >> >> I still don't think it's a good idea and I specifically recommend >> against making changes to the archive status files- those are clearly >> owned and managed by PG and should not be whacked around by external >> processes. > If you do not write to archive_status, you basically have two options: > 1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts performance. > 2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone to core changes as writes are. Another option is to maintain the state of what has been safely archived (and what has errored) locally. This allows pgBackRest to rapidly return the status to Postgres without rechecking against the repository, which as you note would be very slow. This allows more than one archive_command to be safely run since all archive commands must succeed before Postgres will mark the segment as done. It's true that reading archive_status is susceptible to core changes but the less interaction the better, I think. Regards, -- -David david@pgmasters.net
Greetings, * David Steele (david@pgmasters.net) wrote: > On 8/28/18 4:34 PM, Andrey Borodin wrote: > >> > >> I still don't think it's a good idea and I specifically recommend > >> against making changes to the archive status files- those are clearly > >> owned and managed by PG and should not be whacked around by external > >> processes. > > If you do not write to archive_status, you basically have two options: > > 1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts performance. > > 2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone tocore changes as writes are. > > Another option is to maintain the state of what has been safely archived > (and what has errored) locally. This allows pgBackRest to rapidly > return the status to Postgres without rechecking against the repository, > which as you note would be very slow. > > This allows more than one archive_command to be safely run since all > archive commands must succeed before Postgres will mark the segment as done. > > It's true that reading archive_status is susceptible to core changes but > the less interaction the better, I think. Absolutely. External processes shouldn't be changing the files written out and managed by the core system. pgbackrest is *much* safer than alternatives which hack around files inside of PGDATA. We've been working to move things forward to the point where pgbackrest is able to be run as a user who hasn't even got access to modify those files (which has now landed in PG11) and for good reason- it's outright dangerous to do. Thanks! Stephen
Attachment
On Tue, Aug 28, 2018 at 05:20:10PM -0400, Stephen Frost wrote: > Absolutely. External processes shouldn't be changing the files written > out and managed by the core system. +1. Archiving commands should not mess up directly with the contents the backend is managing. -- Michael