Thread: Enabling checksums on a streaming replica

Enabling checksums on a streaming replica

From
"Brad Nicholson"
Date:

I'm wondering about the validity of using the pg_checksums utility to enable checksum's on a streaming replication standby, and then promoting that standby as a way to enable checksums on existing clusters.

I've tested the process out, and it "works" (by works I mean doesn't blow up or log any errors). But this seems far enough outside of supported territory that I'm curious what others think.

The process is:

- Primary with checksums off
- create a streaming replica off that primary
- stop the secondary
- enable checksums on the secondary with the pg_checksums utility
- start the replica
- promote the replica

I've thrown load at it while the it was replicating from checksums off -> checksums on, promoted it, and verified the checksums after with pg_checksums.

Any thoughts on this approach?


Thanks,
Brad

Re: Enabling checksums on a streaming replica

From
Grigory Smolkin
Date:

Hello!

Very interesting idea, but what about full page writes, that comes from master?
Can they be a problem?

On 6/26/19 6:32 PM, Brad Nicholson wrote:

I'm wondering about the validity of using the pg_checksums utility to enable checksum's on a streaming replication standby, and then promoting that standby as a way to enable checksums on existing clusters.

I've tested the process out, and it "works" (by works I mean doesn't blow up or log any errors). But this seems far enough outside of supported territory that I'm curious what others think.

The process is:

- Primary with checksums off
- create a streaming replica off that primary
- stop the secondary
- enable checksums on the secondary with the pg_checksums utility
- start the replica
- promote the replica

I've thrown load at it while the it was replicating from checksums off -> checksums on, promoted it, and verified the checksums after with pg_checksums.

Any thoughts on this approach?


Thanks,
Brad

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Re: Enabling checksums on a streaming replica

From
"Brad Nicholson"
Date:

Grigory Smolkin <g.smolkin@postgrespro.ru> wrote on 06/26/2019 12:37:10 PM:

> From: Grigory Smolkin <g.smolkin@postgrespro.ru>

> To: pgsql-general@lists.postgresql.org
> Date: 06/26/2019 12:37 PM
> Subject: [EXTERNAL] Re: Enabling checksums on a streaming replica
>
> Hello!

> Very interesting idea, but what about full page writes, that comes
> from master?
> Can they be a problem?



In testing, it doesn't appear to matter.  I've ensured that I've generated some full page writes (confirmed via pg_waldump), and those apply fine.

The one thing I'm not sure of, when verifying checksums via pg_checksums I see blocks being skipped.  I'm not sure what or why it is skipping blocks, and if that indicates a problem or not.


pg_checksums -c /data/10_repl/

Checksum scan completed
Files scanned:  1530
Blocks scanned: 100782
Blocks skipped: 3
Bad checksums:  0
Data checksum version: 1

Brad

Re: Re: Enabling checksums on a streaming replica

From
Michael Paquier
Date:
On Wed, Jun 26, 2019 at 02:08:19PM -0400, Brad Nicholson wrote:
> In testing, it doesn't appear to matter.  I've ensured that I've generated
> some full page writes (confirmed via pg_waldump), and those apply
> fine.

Full pages writes are first written from shared buffers to WAL, where
their checksums does not actually apply.  When the WAL records are
read an applied, a full page image is recovered in shared buffers.
The checksum of the page would get updated once the shared buffer page
used is evicted and written back to disk.

> The one thing I'm not sure of, when verifying checksums via pg_checksums I
> see blocks being skipped.  I'm not sure what or why it is skipping blocks,
> and if that indicates a problem or not.

Some files can get skipped entirely, but all the blocks of a file
are basically checked.  Well, except for new pages of course which
have no checksums to look at yet.
--
Michael

Attachment

Re: Re: Re: Enabling checksums on a streaming replica

From
"Brad Nicholson"
Date:

Michael Paquier <michael@paquier.xyz> wrote on 06/27/2019 11:17:10 PM:

> From: Michael Paquier <michael@paquier.xyz>

> To: Brad Nicholson <bradn@ca.ibm.com>
> Cc: Grigory Smolkin <g.smolkin@postgrespro.ru>, pgsql-
> general@lists.postgresql.org

> Date: 06/27/2019 11:17 PM
> Subject: [EXTERNAL] Re: Re: Enabling checksums on a streaming replica
>
> On Wed, Jun 26, 2019 at 02:08:19PM -0400, Brad Nicholson wrote:
> > In testing, it doesn't appear to matter.  I've ensured that I've generated
> > some full page writes (confirmed via pg_waldump), and those apply
> > fine.
>
> Full pages writes are first written from shared buffers to WAL, where
> their checksums does not actually apply.  When the WAL records are
> read an applied, a full page image is recovered in shared buffers.
> The checksum of the page would get updated once the shared buffer page
> used is evicted and written back to disk.
>


So if all the checksums are being recalculated on the replica, this approach should be relatively safe, should it not?

Assuming pg_checksums is doing the right thing (and it looks to me like it should be).


Brad.

Re: Re: Re: Enabling checksums on a streaming replica

From
Michael Paquier
Date:
On Fri, Jun 28, 2019 at 07:43:30AM -0400, Brad Nicholson wrote:
> So if all the checksums are being recalculated on the replica, this
> approach should be relatively safe, should it not?

Yep.

> Assuming pg_checksums is doing the right thing (and it looks to me like it
> should be).

The constraint that a cluster needs to be cleanly shut down to be able
to enable checksums with pg_checksums is the actual deal here.  After
that of course comes the WAL retention on the primary or in the WAL
archives that a standby would need again to catch up while it was
offline.
--
Michael

Attachment