Thread: Enabling checksums on a streaming replica

Enabling checksums on a streaming replica

From

"Brad Nicholson"

Date:

26 June 2019, 15:32:11

I'm wondering about the validity of using the pg_checksums utility to enable checksum's on a streaming replication standby, and then promoting that standby as a way to enable checksums on existing clusters.

I've tested the process out, and it "works" (by works I mean doesn't blow up or log any errors). But this seems far enough outside of supported territory that I'm curious what others think.

The process is:

- Primary with checksums off
- create a streaming replica off that primary
- stop the secondary
- enable checksums on the secondary with the pg_checksums utility
- start the replica
- promote the replica

I've thrown load at it while the it was replicating from checksums off -> checksums on, promoted it, and verified the checksums after with pg_checksums.

Any thoughts on this approach?

Thanks,
Brad

Re: Enabling checksums on a streaming replica

From

Grigory Smolkin

Date:

26 June 2019, 16:37:10

Hello!

Very interesting idea, but what about full page writes, that comes from master?
Can they be a problem?

On 6/26/19 6:32 PM, Brad Nicholson wrote:

I'm wondering about the validity of using the pg_checksums utility to enable checksum's on a streaming replication standby, and then promoting that standby as a way to enable checksums on existing clusters.

I've tested the process out, and it "works" (by works I mean doesn't blow up or log any errors). But this seems far enough outside of supported territory that I'm curious what others think.

The process is:

- Primary with checksums off
- create a streaming replica off that primary
- stop the secondary
- enable checksums on the secondary with the pg_checksums utility
- start the replica
- promote the replica

I've thrown load at it while the it was replicating from checksums off -> checksums on, promoted it, and verified the checksums after with pg_checksums.

Any thoughts on this approach?

Thanks,
Brad

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Re: Enabling checksums on a streaming replica

From

"Brad Nicholson"

Date:

26 June 2019, 18:08:19

Grigory Smolkin <g.smolkin@postgrespro.ru> wrote on 06/26/2019 12:37:10 PM: > From: Grigory Smolkin <g.smolkin@postgrespro.ru>
> To: pgsql-general@lists.postgresql.org
> Date: 06/26/2019 12:37 PM
> Subject: [EXTERNAL] Re: Enabling checksums on a streaming replica
> > Hello!
> Very interesting idea, but what about full page writes, that comes > from master? > Can they be a problem?

In testing, it doesn't appear to matter. I've ensured that I've generated some full page writes (confirmed via pg_waldump), and those apply fine.
The one thing I'm not sure of, when verifying checksums via pg_checksums I see blocks being skipped. I'm not sure what or why it is skipping blocks, and if that indicates a problem or not.

pg_checksums -c /data/10_repl/

Checksum scan completed
Files scanned: 1530
Blocks scanned: 100782
Blocks skipped: 3
Bad checksums: 0
Data checksum version: 1

Brad

Re: Re: Enabling checksums on a streaming replica

From

Michael Paquier

Date:

28 June 2019, 03:17:10

On Wed, Jun 26, 2019 at 02:08:19PM -0400, Brad Nicholson wrote:
> In testing, it doesn't appear to matter.  I've ensured that I've generated
> some full page writes (confirmed via pg_waldump), and those apply
> fine.

Full pages writes are first written from shared buffers to WAL, where
their checksums does not actually apply.  When the WAL records are
read an applied, a full page image is recovered in shared buffers.
The checksum of the page would get updated once the shared buffer page
used is evicted and written back to disk.

> The one thing I'm not sure of, when verifying checksums via pg_checksums I
> see blocks being skipped.  I'm not sure what or why it is skipping blocks,
> and if that indicates a problem or not.

Some files can get skipped entirely, but all the blocks of a file
are basically checked.  Well, except for new pages of course which
have no checksums to look at yet.
--
Michael

Attachment

signature.asc

Re: Re: Re: Enabling checksums on a streaming replica

From

"Brad Nicholson"

Date:

28 June 2019, 11:43:30

Michael Paquier <michael@paquier.xyz> wrote on 06/27/2019 11:17:10 PM: > From: Michael Paquier <michael@paquier.xyz>
> To: Brad Nicholson <bradn@ca.ibm.com>
> Cc: Grigory Smolkin <g.smolkin@postgrespro.ru>, pgsql- > general@lists.postgresql.org
> Date: 06/27/2019 11:17 PM
> Subject: [EXTERNAL] Re: Re: Enabling checksums on a streaming replica
> > On Wed, Jun 26, 2019 at 02:08:19PM -0400, Brad Nicholson wrote: > > In testing, it doesn't appear to matter. I've ensured that I've generated > > some full page writes (confirmed via pg_waldump), and those apply > > fine. > > Full pages writes are first written from shared buffers to WAL, where > their checksums does not actually apply. When the WAL records are > read an applied, a full page image is recovered in shared buffers. > The checksum of the page would get updated once the shared buffer page > used is evicted and written back to disk. >

So if all the checksums are being recalculated on the replica, this approach should be relatively safe, should it not?
Assuming pg_checksums is doing the right thing (and it looks to me like it should be).

Brad.

Re: Re: Re: Enabling checksums on a streaming replica

From

Michael Paquier

Date:

29 June 2019, 02:27:09

On Fri, Jun 28, 2019 at 07:43:30AM -0400, Brad Nicholson wrote:
> So if all the checksums are being recalculated on the replica, this
> approach should be relatively safe, should it not?

Yep.

> Assuming pg_checksums is doing the right thing (and it looks to me like it
> should be).

The constraint that a cluster needs to be cleanly shut down to be able
to enable checksums with pg_checksums is the actual deal here.  After
that of course comes the WAL retention on the primary or in the WAL
archives that a standby would need again to catch up while it was
offline.
--
Michael

Attachment

signature.asc