Re: reorder pg_rewind control file sync - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: reorder pg_rewind control file sync
Date
Msg-id alpine.DEB.2.21.1903251013160.6866@lancre
Whole thread Raw
In response to Re: reorder pg_rewind control file sync  (Michael Paquier <michael@paquier.xyz>)
Responses Re: reorder pg_rewind control file sync
List pgsql-hackers
Bonjour Michaël,

>> The attached patch reorders the cluster fsyncing and control file changes in
>> "pg_rewind" so that the later is done after all data are committed to disk,
>> so as to reflect the actual cluster status, similarly to what is done by
>> "pg_checksums", per discussion in the thread about offline enabling of
>> checksums:
>
> It would be an interesting property to see that it is possible to
> retry a rewind of a node which has been partially rewound already,
> but the operation failed in the middle.

Yes. I understand that the question is whether the Warning in pg_rewind 
documentation can be partially lifted. The short answer is that it is not 
obvious.

> Because that's the real deal here: as long as we know that its control 
> file is in its previous state, we can rely on it for retrying the 
> operation.  Logically, I think that it should work, because we would 
> still try to fetch the same blocks from the source server since WAL has 
> forked by looking at the records of the target up from the last 
> checkpoint before WAL has forked up to the last shutdown checkpoint, and 
> the operation is lossy by design when it comes to deal with file 
> differences.
>
> Have you tried to see if pg_rewind is able to repeat its operation for
> specific scenarios?

I have run the non regression tests. I'm not sure of what scenarii are 
covered there, but probably not an interruption in the middle of a fsync, 
specially if fsync is usually disabled to ease the tests:-)

> One is for example a database created on the promoted standby, used as a 
> source, and a second, different database created on the primary after 
> the standby has been promoted.  You could make the tool exit() before 
> the rewind finishes, just before updating the control file, and see if 
> the operation is repeatable. Interrupting the tool would be fine as 
> well, still less controllable.
>
> It would be good to mention in the patch why the order matters.

Yep. This requires a careful analysis of pg_rewind inner working, that I 
do not have to do in the short terme.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Reporting script runtimes in pg_regress
Next
From: Heikki Linnakangas
Date:
Subject: Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist