Re: Would it be possible to have parallel archiving? - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: Would it be possible to have parallel archiving?
Date
Msg-id D8EBD385-D0D8-4997-BD2D-DD2B99248B39@yandex-team.ru
Whole thread Raw
In response to Re: Would it be possible to have parallel archiving?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Would it be possible to have parallel archiving?
Re: Would it be possible to have parallel archiving?
List pgsql-hackers

> 28 авг. 2018 г., в 17:07, Stephen Frost <sfrost@snowman.net> написал(а):
>
> Greetings,
>
> * Andrey Borodin (x4mmm@yandex-team.ru) wrote:
>>> 28 авг. 2018 г., в 14:08, Stephen Frost <sfrost@snowman.net> написал(а):
>>> * David Steele (david@pgmasters.net <mailto:david@pgmasters.net>) wrote:
>>>> On 8/28/18 8:32 AM, Stephen Frost wrote:
>>>> To be clear, pgBackRest uses the .ready files in archive_status to
>>>> parallelize archiving but still notifies PostgreSQL of completion via
>>>> the archive_command mechanism.  We do not modify .ready files to .done
>>>> directly.
>>>
>>> Right, we don't recommend mucking around with that directory of files.
>>> Even if that works today (which you'd need to test extensively...),
>>> there's no guarantee that it'll work and do what you want in the
>>> future...
>> WAL-G modifies archive_status files.
>
> Frankly, I've heard far too many concerns and issues with WAL-G to
> consider anything it does at all sensible.
Umm.. very interesting. What kind of issues? There are few on github repo, all of them will be addressed. Do you have
someother reports? Can you share it? 
I'm open to discuss any concerns.

>
>> This path was chosen to limit state preserved between WAL-G runs (archiving to S3) and further push archiving
performance.
>
> I still don't think it's a good idea and I specifically recommend
> against making changes to the archive status files- those are clearly
> owned and managed by PG and should not be whacked around by external
> processes.
If you do not write to archive_status, you basically have two options:
1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts
performance.
2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone to
core changes as writes are. 

Well, PostgreSQL clearly have the problem which can be solved by good parallel archiving API. Anything else - is
whackingaround, just reading archive_status is nothing better that reading and writing. 

>
>> Indeed, it was very hard to test. Also, this makes impossible to use two archiving system simultaneously for transit
period.
>
> The testing in WAL-G seems to be rather lacking from what I've seen.
Indeed, WAL-G still lacks automatic integration tests, I hope that some dockerized tests will be added soon.
By now I'm doing automated QA in Yandex infrastructure.

Best regards, Andrey Borodin.

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Would it be possible to have parallel archiving?
Next
From: Stephen Frost
Date:
Subject: Re: Would it be possible to have parallel archiving?