On Wed, Feb 27, 2019 at 11:50:17AM +0100, Fabien COELHO wrote:
>> Shouldn't be necessary - the control file fits into a single page, and
>> writes of that size ought to always be atomic. And I also think
>> introducing flock usage for this would be quite disproportional.
There are static assertions to make sure that the side of control file
data never gets higher than 512 bytes for this purpose.
> Note that my concern is not about the page size, but rather that as more
> commands may change the cluster status by editing the control file, it would
> be better that a postmaster does not start while a pg_rewind or enable
> checksum or whatever is in progress, and currently there is a possible race
> condition between the read and write that can induce an issue, at least
> theoretically.
Something that I think we could live instead is a special flag in the
control file to mark the postmaster as in maintenance mode. This
would be useful to prevent the postmaster to start if seeing this flag
in the control file, as well to find out that a host has crashed in
the middle of a maintenance operation. We don't give this insurance
now when running pg_rewind, which is bad. That's also separate from
the checksum-related patches and pg_rewind.
flock() can be something hard to live with for cross-platform
compatibility like Windows (LockFileEx) or fancy platforms. And note
that we don't use it yet in the tree. And flock() would help in the
first case I am mentioning, but not in the second.
--
Michael