On Sun, Oct 29, 2023 at 11:49:11AM +0100, Peter J. Holzer wrote:
> On 2023-10-29 10:11:07 +0100, Paul Förster wrote:
>> On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>>> I don't think so. AFAIK Replication keeps the data files in sync on a
>>> bit-for-bit level and turning on checksums changes the data layout.
>>> Running a cluster where one node has checksums and the other doesn't
>>> would result in a complete mess.
>>
>> I agree with the last sentence. This is why I asked if it is safe to
>> enable checksums on a replica, switch over and then do it again on the
>> ex primary, i.e. now new replica without doing a reinit.
>
> It *might* work if there are zero writes on the primary during the
> downtime of the replica (because those writes couldn't be replicated),
> but that seems hard to ensure. Even if you could get away with making
> the primary read-only (is this even possible?) I wouldn't have much
> confidence in the result and reinit the (new) replica anyway.
Hm? Page checksums are written when a page is flushed to disk, we
don't set them for dirty buffers or full-page writes included in WAL,
so it should be OK to do something like the following:
- Stop cleanly a standby.
- Run pg_checksums on the standby to enable them.
- Restart the standby.
- Catchup with the latest changes
- Stop cleanly the primary, letting the shutdown checkpoint be
replicated to the standby.
- Promote the standby.
- Enable checksums on the previous primary.
- Start the previous primary to be a standby of the node you failed
over to.
--
Michael