Re: Concurrency issue in pg_rewind - Mailing list pgsql-hackers

From Alexey Kondratov
Subject Re: Concurrency issue in pg_rewind
Date
Msg-id 30ec75b9bd9bfab1e83e7168dc6d6ddc@postgrespro.ru
Whole thread Raw
In response to Re: Concurrency issue in pg_rewind  (Alexander Kukushkin <cyberdemn@gmail.com>)
List pgsql-hackers
On 2020-09-17 15:27, Alexander Kukushkin wrote:
> On Thu, 17 Sep 2020 at 14:04, Alexey Kondratov
> <a.kondratov@postgrespro.ru> wrote:
> 
>> With --restore-target-wal pg_rewind is trying to call restore_command 
>> on
>> its own and it can happen at two stages:
>> 
>> 1) When pg_rewind is trying to find the last checkpoint preceding a
>> divergence point. In that case file map is not even yet initialized.
>> Thus, all fetched WAL segments at this stage will be present in the 
>> file
>> map created later.
> 
> Nope, it will fetch files you requested, and in addition to that it
> will leave a child process running in the background which is doing
> the prefetch (manipulating with pg_wal/.wal-g/...)
> 
>> 
>> 2) When it creates a data pages map. It should traverse WAL from the
>> last common checkpoint till the final shutdown point in order to find
>> all modified pages on the target. At this stage pg_rewind only updates
>> info about data segments in the file map. That way, I see a minor
>> problem that WAL segments fetched at this stage would not be deleted,
>> since they are absent in the file map.
>> 
>> Anyway, pg_rewind does not delete neither WAL segments, not any other
>> files in the middle of the file map creation, so I cannot imagine, how
>> it can get into the same trouble on its own.
> 
> When pg_rewind was creating the map, some temporary files where there,
> because the forked child process of wal-g was still running.
> When the wal-g child process exits, it removes some of these files.
> Specifically, it was trying to prefetch 0000008400000A7600000024 into
> the pg_wal/.wal-g/prefetch/running/0000008400000A7600000024, but
> apparently the file wasn't available on S3 and prefetch failed,
> therefore the empty file was removed.
> 

I do understand how you got into this problem with wal-g. This part of 
my answer was about bare postgres and pg_rewind. And my point was that 
from my perspective pg_rewind with --restore-target-wal cannot get into 
the same trouble on its own, without 'help' of some side tools like 
wal-g.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Fix for parallel BTree initialization bug
Next
From: "k.jamison@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist