Thread: Enabling checksums on a streaming replica
I'm wondering about the validity of using the pg_checksums utility to enable checksum's on a streaming replication standby, and then promoting that standby as a way to enable checksums on existing clusters.
I've tested the process out, and it "works" (by works I mean doesn't blow up or log any errors). But this seems far enough outside of supported territory that I'm curious what others think.
The process is:
- Primary with checksums off
- create a streaming replica off that primary
- stop the secondary
- enable checksums on the secondary with the pg_checksums utility
- start the replica
- promote the replica
I've thrown load at it while the it was replicating from checksums off -> checksums on, promoted it, and verified the checksums after with pg_checksums.
Any thoughts on this approach?
Thanks,
Brad
Hello!
Very interesting idea, but what about full page writes, that comes from master?
Can they be a problem?
I'm wondering about the validity of using the pg_checksums utility to enable checksum's on a streaming replication standby, and then promoting that standby as a way to enable checksums on existing clusters.
I've tested the process out, and it "works" (by works I mean doesn't blow up or log any errors). But this seems far enough outside of supported territory that I'm curious what others think.
The process is:
- Primary with checksums off
- create a streaming replica off that primary
- stop the secondary
- enable checksums on the secondary with the pg_checksums utility
- start the replica
- promote the replica
I've thrown load at it while the it was replicating from checksums off -> checksums on, promoted it, and verified the checksums after with pg_checksums.
Any thoughts on this approach?
Thanks,
Brad
-- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Grigory Smolkin <g.smolkin@postgrespro.ru> wrote on 06/26/2019 12:37:10 PM:
> From: Grigory Smolkin <g.smolkin@postgrespro.ru>
> To: pgsql-general@lists.postgresql.org
> Date: 06/26/2019 12:37 PM
> Subject: [EXTERNAL] Re: Enabling checksums on a streaming replica
>
> Hello!
> Very interesting idea, but what about full page writes, that comes
> from master?
> Can they be a problem?
In testing, it doesn't appear to matter. I've ensured that I've generated some full page writes (confirmed via pg_waldump), and those apply fine.
The one thing I'm not sure of, when verifying checksums via pg_checksums I see blocks being skipped. I'm not sure what or why it is skipping blocks, and if that indicates a problem or not.
pg_checksums -c /data/10_repl/
Checksum scan completed
Files scanned: 1530
Blocks scanned: 100782
Blocks skipped: 3
Bad checksums: 0
Data checksum version: 1
Brad
On Wed, Jun 26, 2019 at 02:08:19PM -0400, Brad Nicholson wrote: > In testing, it doesn't appear to matter. I've ensured that I've generated > some full page writes (confirmed via pg_waldump), and those apply > fine. Full pages writes are first written from shared buffers to WAL, where their checksums does not actually apply. When the WAL records are read an applied, a full page image is recovered in shared buffers. The checksum of the page would get updated once the shared buffer page used is evicted and written back to disk. > The one thing I'm not sure of, when verifying checksums via pg_checksums I > see blocks being skipped. I'm not sure what or why it is skipping blocks, > and if that indicates a problem or not. Some files can get skipped entirely, but all the blocks of a file are basically checked. Well, except for new pages of course which have no checksums to look at yet. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> wrote on 06/27/2019 11:17:10 PM:
> From: Michael Paquier <michael@paquier.xyz>
> To: Brad Nicholson <bradn@ca.ibm.com>
> Cc: Grigory Smolkin <g.smolkin@postgrespro.ru>, pgsql-
> general@lists.postgresql.org
> Date: 06/27/2019 11:17 PM
> Subject: [EXTERNAL] Re: Re: Enabling checksums on a streaming replica
>
> On Wed, Jun 26, 2019 at 02:08:19PM -0400, Brad Nicholson wrote:
> > In testing, it doesn't appear to matter. I've ensured that I've generated
> > some full page writes (confirmed via pg_waldump), and those apply
> > fine.
>
> Full pages writes are first written from shared buffers to WAL, where
> their checksums does not actually apply. When the WAL records are
> read an applied, a full page image is recovered in shared buffers.
> The checksum of the page would get updated once the shared buffer page
> used is evicted and written back to disk.
>
So if all the checksums are being recalculated on the replica, this approach should be relatively safe, should it not?
Assuming pg_checksums is doing the right thing (and it looks to me like it should be).
Brad.
On Fri, Jun 28, 2019 at 07:43:30AM -0400, Brad Nicholson wrote: > So if all the checksums are being recalculated on the replica, this > approach should be relatively safe, should it not? Yep. > Assuming pg_checksums is doing the right thing (and it looks to me like it > should be). The constraint that a cluster needs to be cleanly shut down to be able to enable checksums with pg_checksums is the actual deal here. After that of course comes the WAL retention on the primary or in the WAL archives that a standby would need again to catch up while it was offline. -- Michael