Thread: Re: pgsql: Validate page level checksums in base backups

Re: pgsql: Validate page level checksums in base backups

From
Andres Freund
Date:
On 2018-04-03 11:52:26 +0000, Magnus Hagander wrote:
> Validate page level checksums in base backups
> 
> When base backups are run over the replication protocol (for example
> using pg_basebackup), verify the checksums of all data blocks if
> checksums are enabled. If checksum failures are encountered, log them
> as warnings but don't abort the backup.
> 
> This becomes the default behaviour in pg_basebackup (provided checksums
> are enabled on the server), so add a switch (-k) to disable the checks
> if necessary.
> 
> Author: Michael Banck
> Reviewed-By: Magnus Hagander, David Steele
> Discussion: https://postgr.es/m/20180228180856.GE13784@nighthawk.caipicrew.dd-dns.de

Hm, something isn't quite right with the test:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2018-04-03%2016%3A10%3A01

# WARNING:  6 total checksum verification failures
# pg_basebackup: checksum error occured
# '
#     doesn't match '(?^s:^WARNING.*7 total checksum verification failures)'

Greetings,

Andres Freund


Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:


On Tue, Apr 3, 2018 at 7:13 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-03 11:52:26 +0000, Magnus Hagander wrote:
> Validate page level checksums in base backups
>
> When base backups are run over the replication protocol (for example
> using pg_basebackup), verify the checksums of all data blocks if
> checksums are enabled. If checksum failures are encountered, log them
> as warnings but don't abort the backup.
>
> This becomes the default behaviour in pg_basebackup (provided checksums
> are enabled on the server), so add a switch (-k) to disable the checks
> if necessary.
>
> Author: Michael Banck
> Reviewed-By: Magnus Hagander, David Steele
> Discussion: https://postgr.es/m/20180228180856.GE13784@nighthawk.caipicrew.dd-dns.de

Hm, something isn't quite right with the test:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2018-04-03%2016%3A10%3A01

# WARNING:  6 total checksum verification failures
# pg_basebackup: checksum error occured
# '
#     doesn't match '(?^s:^WARNING.*7 total checksum verification failures)'

Yeah, there's clearly a second problem here.

I'm stuck in meetings most of tonight, so I might not get to it right away. If someonen else figures it out meanwhile please do, otherwise I'll get to it as soon as I can. 


--

Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:


On Tue, Apr 3, 2018 at 7:13 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-03 11:52:26 +0000, Magnus Hagander wrote:
> Validate page level checksums in base backups
>
> When base backups are run over the replication protocol (for example
> using pg_basebackup), verify the checksums of all data blocks if
> checksums are enabled. If checksum failures are encountered, log them
> as warnings but don't abort the backup.
>
> This becomes the default behaviour in pg_basebackup (provided checksums
> are enabled on the server), so add a switch (-k) to disable the checks
> if necessary.
>
> Author: Michael Banck
> Reviewed-By: Magnus Hagander, David Steele
> Discussion: https://postgr.es/m/20180228180856.GE13784@nighthawk.caipicrew.dd-dns.de

Hm, something isn't quite right with the test:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2018-04-03%2016%3A10%3A01

# WARNING:  6 total checksum verification failures
# pg_basebackup: checksum error occured
# '
#     doesn't match '(?^s:^WARNING.*7 total checksum verification failures)'

Yeah, there's clearly a second problem here.

I'm stuck in meetings most of tonight, so I might not get to it right away. If someonen else figures it out meanwhile please do, otherwise I'll get to it as soon as I can. 


--

Re: pgsql: Validate page level checksums in base backups

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Yeah, there's clearly a second problem here.

I think this test script is broken in many ways.

It's scribbling on the source cluster's disk files and assuming that that
translates one-for-one to what gets sent to the slave server --- but what
if some of the blocks that it modifies on-disk are resident in the
source's shared buffers?  I think you'd have to shut down the source and
then apply the corruption if you want stable results.

I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
since the way in which the corruption is induced is just guessing
as to where page boundaries are.

Also, scribbling on tables as sensitive as pg_class is just asking for
trouble IMO.  I don't see anything in this test, for example, that
prevents autovacuum from running and causing a PANIC before the test
can complete.  Even with AV off, there's a good chance that clobber-
cache-always animals will fall over because they do so many more
physical accesses to the system catalogs.  I'd suggest inducing the
corruption in some user table(s) that we can more tightly constrain
the source server's accesses to.

            regards, tom lane


Re: pgsql: Validate page level checksums in base backups

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Yeah, there's clearly a second problem here.

I think this test script is broken in many ways.

It's scribbling on the source cluster's disk files and assuming that that
translates one-for-one to what gets sent to the slave server --- but what
if some of the blocks that it modifies on-disk are resident in the
source's shared buffers?  I think you'd have to shut down the source and
then apply the corruption if you want stable results.

I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
since the way in which the corruption is induced is just guessing
as to where page boundaries are.

Also, scribbling on tables as sensitive as pg_class is just asking for
trouble IMO.  I don't see anything in this test, for example, that
prevents autovacuum from running and causing a PANIC before the test
can complete.  Even with AV off, there's a good chance that clobber-
cache-always animals will fall over because they do so many more
physical accesses to the system catalogs.  I'd suggest inducing the
corruption in some user table(s) that we can more tightly constrain
the source server's accesses to.

            regards, tom lane


Re: pgsql: Validate page level checksums in base backups

From
Peter Geoghegan
Date:
On Tue, Apr 3, 2018 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Also, scribbling on tables as sensitive as pg_class is just asking for
> trouble IMO.  I don't see anything in this test, for example, that
> prevents autovacuum from running and causing a PANIC before the test
> can complete.

+1

> Even with AV off, there's a good chance that clobber-
> cache-always animals will fall over because they do so many more
> physical accesses to the system catalogs.  I'd suggest inducing the
> corruption in some user table(s) that we can more tightly constrain
> the source server's accesses to.

I've simulated quite a few novel corruption scenarios using pg_hexedit
in the past year. It would be nice if pg_prewarm offered an easy way
to evict from shared_buffers for repeated testing. Obviously you can
accomplish the same thing in other ways, but that isn't ideal, and
particularly hinders automated testing.

-- 
Peter Geoghegan


Re: pgsql: Validate page level checksums in base backups

From
Peter Geoghegan
Date:
On Tue, Apr 3, 2018 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Also, scribbling on tables as sensitive as pg_class is just asking for
> trouble IMO.  I don't see anything in this test, for example, that
> prevents autovacuum from running and causing a PANIC before the test
> can complete.

+1

> Even with AV off, there's a good chance that clobber-
> cache-always animals will fall over because they do so many more
> physical accesses to the system catalogs.  I'd suggest inducing the
> corruption in some user table(s) that we can more tightly constrain
> the source server's accesses to.

I've simulated quite a few novel corruption scenarios using pg_hexedit
in the past year. It would be nice if pg_prewarm offered an easy way
to evict from shared_buffers for repeated testing. Obviously you can
accomplish the same thing in other ways, but that isn't ideal, and
particularly hinders automated testing.

-- 
Peter Geoghegan


Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:


On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Yeah, there's clearly a second problem here.

I think this test script is broken in many ways.

It's scribbling on the source cluster's disk files and assuming that that
translates one-for-one to what gets sent to the slave server --- but what
if some of the blocks that it modifies on-disk are resident in the
source's shared buffers?  I think you'd have to shut down the source and
then apply the corruption if you want stable results.

It doesn't actually use a slave server as part of the tests.

And basebackups don't read from the sources shared buffers, but it *does* read from the kernel buffers.


I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
since the way in which the corruption is induced is just guessing
as to where page boundaries are.

Yeah, that might be a problem. Those should be calculated from the block size.


Also, scribbling on tables as sensitive as pg_class is just asking for
trouble IMO.  I don't see anything in this test, for example, that
prevents autovacuum from running and causing a PANIC before the test
can complete.  Even with AV off, there's a good chance that clobber-
cache-always animals will fall over because they do so many more
physical accesses to the system catalogs.  I'd suggest inducing the
corruption in some user table(s) that we can more tightly constrain
the source server's accesses to.

Yeah, that seems like a good idea. And probably also shut the server down while writing the corruption, just in case.

Will stick looking into that on my todo for when I'm back, unless beaten to it. Michael, you want a stab at it?

//Magnus

Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:


On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Yeah, there's clearly a second problem here.

I think this test script is broken in many ways.

It's scribbling on the source cluster's disk files and assuming that that
translates one-for-one to what gets sent to the slave server --- but what
if some of the blocks that it modifies on-disk are resident in the
source's shared buffers?  I think you'd have to shut down the source and
then apply the corruption if you want stable results.

It doesn't actually use a slave server as part of the tests.

And basebackups don't read from the sources shared buffers, but it *does* read from the kernel buffers.


I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
since the way in which the corruption is induced is just guessing
as to where page boundaries are.

Yeah, that might be a problem. Those should be calculated from the block size.


Also, scribbling on tables as sensitive as pg_class is just asking for
trouble IMO.  I don't see anything in this test, for example, that
prevents autovacuum from running and causing a PANIC before the test
can complete.  Even with AV off, there's a good chance that clobber-
cache-always animals will fall over because they do so many more
physical accesses to the system catalogs.  I'd suggest inducing the
corruption in some user table(s) that we can more tightly constrain
the source server's accesses to.

Yeah, that seems like a good idea. And probably also shut the server down while writing the corruption, just in case.

Will stick looking into that on my todo for when I'm back, unless beaten to it. Michael, you want a stab at it?

//Magnus

Re: pgsql: Validate page level checksums in base backups

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's scribbling on the source cluster's disk files and assuming that that
>> translates one-for-one to what gets sent to the slave server --- but what
>> if some of the blocks that it modifies on-disk are resident in the
>> source's shared buffers?  I think you'd have to shut down the source and
>> then apply the corruption if you want stable results.

> It doesn't actually use a slave server as part of the tests.
> And basebackups don't read from the sources shared buffers, but it *does*
> read from the kernel buffers.

Right, so the failure case occurs when the source server writes back a
dirty page from its shared buffers to disk, in between the test script's
corrupting that page on-disk and then trying to read it.

Again, you'd have several orders of magnitude better chance of getting
reproducible behavior if you were targeting something that wasn't a
system catalog.

            regards, tom lane


Re: pgsql: Validate page level checksums in base backups

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's scribbling on the source cluster's disk files and assuming that that
>> translates one-for-one to what gets sent to the slave server --- but what
>> if some of the blocks that it modifies on-disk are resident in the
>> source's shared buffers?  I think you'd have to shut down the source and
>> then apply the corruption if you want stable results.

> It doesn't actually use a slave server as part of the tests.
> And basebackups don't read from the sources shared buffers, but it *does*
> read from the kernel buffers.

Right, so the failure case occurs when the source server writes back a
dirty page from its shared buffers to disk, in between the test script's
corrupting that page on-disk and then trying to read it.

Again, you'd have several orders of magnitude better chance of getting
reproducible behavior if you were targeting something that wasn't a
system catalog.

            regards, tom lane


Re: pgsql: Validate page level checksums in base backups

From
Michael Banck
Date:
Hi,

On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > since the way in which the corruption is induced is just guessing
> > as to where page boundaries are.
> 
> Yeah, that might be a problem. Those should be calculated from the block
> size.
> 
> Also, scribbling on tables as sensitive as pg_class is just asking for
> > trouble IMO.  I don't see anything in this test, for example, that
> > prevents autovacuum from running and causing a PANIC before the test
> > can complete.  Even with AV off, there's a good chance that clobber-
> > cache-always animals will fall over because they do so many more
> > physical accesses to the system catalogs.  I'd suggest inducing the
> > corruption in some user table(s) that we can more tightly constrain
> > the source server's accesses to.
> 
> Yeah, that seems like a good idea. And probably also shut the server down
> while writing the corruption, just in case.
> 
> Will stick looking into that on my todo for when I'm back, unless beaten to
> it. Michael, you want a stab at it?

Attached is a patch which does that hopefully:

1. creates two user tables, one large enough for at least 6 blocks
(around 360kb), the other just one block.

2. stops the cluster before scribbling over its data and starts it
afterwards.

3. uses the blocksize (and the pager header size) to determine offsets
for scribbling.

I've tested it with blocksizes 8 and 32 now, the latter should make sure
that the first table is indeed large enough, but maybe something less
arbitrary than "10000 integers" should be used?

Anyway, sorry for the hassle.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment

Re: pgsql: Validate page level checksums in base backups

From
Michael Banck
Date:
Hi,

On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > since the way in which the corruption is induced is just guessing
> > as to where page boundaries are.
> 
> Yeah, that might be a problem. Those should be calculated from the block
> size.
> 
> Also, scribbling on tables as sensitive as pg_class is just asking for
> > trouble IMO.  I don't see anything in this test, for example, that
> > prevents autovacuum from running and causing a PANIC before the test
> > can complete.  Even with AV off, there's a good chance that clobber-
> > cache-always animals will fall over because they do so many more
> > physical accesses to the system catalogs.  I'd suggest inducing the
> > corruption in some user table(s) that we can more tightly constrain
> > the source server's accesses to.
> 
> Yeah, that seems like a good idea. And probably also shut the server down
> while writing the corruption, just in case.
> 
> Will stick looking into that on my todo for when I'm back, unless beaten to
> it. Michael, you want a stab at it?

Attached is a patch which does that hopefully:

1. creates two user tables, one large enough for at least 6 blocks
(around 360kb), the other just one block.

2. stops the cluster before scribbling over its data and starts it
afterwards.

3. uses the blocksize (and the pager header size) to determine offsets
for scribbling.

I've tested it with blocksizes 8 and 32 now, the latter should make sure
that the first table is indeed large enough, but maybe something less
arbitrary than "10000 integers" should be used?

Anyway, sorry for the hassle.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment

Re: pgsql: Validate page level checksums in base backups

From
David Steele
Date:
On 4/3/18 4:48 PM, Michael Banck wrote:
> 
> Attached is a patch which does that hopefully:
> 
> 1. creates two user tables, one large enough for at least 6 blocks
> (around 360kb), the other just one block.
> 
> 2. stops the cluster before scribbling over its data and starts it
> afterwards.
> 
> 3. uses the blocksize (and the pager header size) to determine offsets
> for scribbling.

This patch looks reasonable to me.

> I've tested it with blocksizes 8 and 32 now, the latter should make sure
> that the first table is indeed large enough, but maybe something less
> arbitrary than "10000 integers" should be used?

It might be quicker to just stop the cluster and then write out an 
arbitrary number of zero pages.  Zero pages are always considered valid 
so you can then corrupt whichever pages you want for testing.

-- 
-David
david@pgmasters.net


Re: pgsql: Validate page level checksums in base backups

From
David Steele
Date:
On 4/3/18 4:48 PM, Michael Banck wrote:
> 
> Attached is a patch which does that hopefully:
> 
> 1. creates two user tables, one large enough for at least 6 blocks
> (around 360kb), the other just one block.
> 
> 2. stops the cluster before scribbling over its data and starts it
> afterwards.
> 
> 3. uses the blocksize (and the pager header size) to determine offsets
> for scribbling.

This patch looks reasonable to me.

> I've tested it with blocksizes 8 and 32 now, the latter should make sure
> that the first table is indeed large enough, but maybe something less
> arbitrary than "10000 integers" should be used?

It might be quicker to just stop the cluster and then write out an 
arbitrary number of zero pages.  Zero pages are always considered valid 
so you can then corrupt whichever pages you want for testing.

-- 
-David
david@pgmasters.net


Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:
On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > since the way in which the corruption is induced is just guessing
> > as to where page boundaries are.
>
> Yeah, that might be a problem. Those should be calculated from the block
> size.
>
> Also, scribbling on tables as sensitive as pg_class is just asking for
> > trouble IMO.  I don't see anything in this test, for example, that
> > prevents autovacuum from running and causing a PANIC before the test
> > can complete.  Even with AV off, there's a good chance that clobber-
> > cache-always animals will fall over because they do so many more
> > physical accesses to the system catalogs.  I'd suggest inducing the
> > corruption in some user table(s) that we can more tightly constrain
> > the source server's accesses to.
>
> Yeah, that seems like a good idea. And probably also shut the server down
> while writing the corruption, just in case.
>
> Will stick looking into that on my todo for when I'm back, unless beaten to
> it. Michael, you want a stab at it?

Attached is a patch which does that hopefully:

1. creates two user tables, one large enough for at least 6 blocks
(around 360kb), the other just one block.

2. stops the cluster before scribbling over its data and starts it
afterwards.

3. uses the blocksize (and the pager header size) to determine offsets
for scribbling.

I've tested it with blocksizes 8 and 32 now, the latter should make sure
that the first table is indeed large enough, but maybe something less
arbitrary than "10000 integers" should be used?

Anyway, sorry for the hassle.

Applied, with the addition that I explicitly disabled autovacuum on those tables as well. 

We might want to enhance it further by calculating the figure 10,000 based on blocksize perhaps?

--

Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:
On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > since the way in which the corruption is induced is just guessing
> > as to where page boundaries are.
>
> Yeah, that might be a problem. Those should be calculated from the block
> size.
>
> Also, scribbling on tables as sensitive as pg_class is just asking for
> > trouble IMO.  I don't see anything in this test, for example, that
> > prevents autovacuum from running and causing a PANIC before the test
> > can complete.  Even with AV off, there's a good chance that clobber-
> > cache-always animals will fall over because they do so many more
> > physical accesses to the system catalogs.  I'd suggest inducing the
> > corruption in some user table(s) that we can more tightly constrain
> > the source server's accesses to.
>
> Yeah, that seems like a good idea. And probably also shut the server down
> while writing the corruption, just in case.
>
> Will stick looking into that on my todo for when I'm back, unless beaten to
> it. Michael, you want a stab at it?

Attached is a patch which does that hopefully:

1. creates two user tables, one large enough for at least 6 blocks
(around 360kb), the other just one block.

2. stops the cluster before scribbling over its data and starts it
afterwards.

3. uses the blocksize (and the pager header size) to determine offsets
for scribbling.

I've tested it with blocksizes 8 and 32 now, the latter should make sure
that the first table is indeed large enough, but maybe something less
arbitrary than "10000 integers" should be used?

Anyway, sorry for the hassle.

Applied, with the addition that I explicitly disabled autovacuum on those tables as well. 

We might want to enhance it further by calculating the figure 10,000 based on blocksize perhaps?

--

Re: pgsql: Validate page level checksums in base backups

From
Michael Banck
Date:
Hi,

On Wed, Apr 04, 2018 at 11:38:35AM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck <michael.banck@credativ.de>
> wrote:
> 
> > Hi,
> >
> > On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> > > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > > > since the way in which the corruption is induced is just guessing
> > > > as to where page boundaries are.
> > >
> > > Yeah, that might be a problem. Those should be calculated from the block
> > > size.
> > >
> > > Also, scribbling on tables as sensitive as pg_class is just asking for
> > > > trouble IMO.  I don't see anything in this test, for example, that
> > > > prevents autovacuum from running and causing a PANIC before the test
> > > > can complete.  Even with AV off, there's a good chance that clobber-
> > > > cache-always animals will fall over because they do so many more
> > > > physical accesses to the system catalogs.  I'd suggest inducing the
> > > > corruption in some user table(s) that we can more tightly constrain
> > > > the source server's accesses to.
> > >
> > > Yeah, that seems like a good idea. And probably also shut the server down
> > > while writing the corruption, just in case.
> > >
> > > Will stick looking into that on my todo for when I'm back, unless beaten
> > to
> > > it. Michael, you want a stab at it?
> >
> > Attached is a patch which does that hopefully:
> >
> > 1. creates two user tables, one large enough for at least 6 blocks
> > (around 360kb), the other just one block.
> >
> > 2. stops the cluster before scribbling over its data and starts it
> > afterwards.
> >
> > 3. uses the blocksize (and the pager header size) to determine offsets
> > for scribbling.
> >
> > I've tested it with blocksizes 8 and 32 now, the latter should make sure
> > that the first table is indeed large enough, but maybe something less
> > arbitrary than "10000 integers" should be used?
> >
> > Anyway, sorry for the hassle.
> >
> 
> Applied, with the addition that I explicitly disabled autovacuum on those
> tables as well.
 
Thanks! It looks like there were no further builfarm failures so far,
let's see how this goes.

> We might want to enhance it further by calculating the figure 10,000 based
> on blocksize perhaps?

10,000 was roughly twice the size needed for 32k block sizes. If there
are concerns that this might not be enough, I am happy to invest some
more time here (next week probably). However, the pg_basebackup
testsuite takes up 800+ MB to run, so I don't see the urgent need of
optimizing away 50-100 KB (which clearly everybody else thought as well)
if we are talking about disk space overhead.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: pgsql: Validate page level checksums in base backups

From
Michael Banck
Date:
Hi,

On Wed, Apr 04, 2018 at 11:38:35AM +0200, Magnus Hagander wrote:
> On Tue, Apr 3, 2018 at 10:48 PM, Michael Banck <michael.banck@credativ.de>
> wrote:
> 
> > Hi,
> >
> > On Tue, Apr 03, 2018 at 08:48:08PM +0200, Magnus Hagander wrote:
> > > On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
> > > > since the way in which the corruption is induced is just guessing
> > > > as to where page boundaries are.
> > >
> > > Yeah, that might be a problem. Those should be calculated from the block
> > > size.
> > >
> > > Also, scribbling on tables as sensitive as pg_class is just asking for
> > > > trouble IMO.  I don't see anything in this test, for example, that
> > > > prevents autovacuum from running and causing a PANIC before the test
> > > > can complete.  Even with AV off, there's a good chance that clobber-
> > > > cache-always animals will fall over because they do so many more
> > > > physical accesses to the system catalogs.  I'd suggest inducing the
> > > > corruption in some user table(s) that we can more tightly constrain
> > > > the source server's accesses to.
> > >
> > > Yeah, that seems like a good idea. And probably also shut the server down
> > > while writing the corruption, just in case.
> > >
> > > Will stick looking into that on my todo for when I'm back, unless beaten
> > to
> > > it. Michael, you want a stab at it?
> >
> > Attached is a patch which does that hopefully:
> >
> > 1. creates two user tables, one large enough for at least 6 blocks
> > (around 360kb), the other just one block.
> >
> > 2. stops the cluster before scribbling over its data and starts it
> > afterwards.
> >
> > 3. uses the blocksize (and the pager header size) to determine offsets
> > for scribbling.
> >
> > I've tested it with blocksizes 8 and 32 now, the latter should make sure
> > that the first table is indeed large enough, but maybe something less
> > arbitrary than "10000 integers" should be used?
> >
> > Anyway, sorry for the hassle.
> >
> 
> Applied, with the addition that I explicitly disabled autovacuum on those
> tables as well.
 
Thanks! It looks like there were no further builfarm failures so far,
let's see how this goes.

> We might want to enhance it further by calculating the figure 10,000 based
> on blocksize perhaps?

10,000 was roughly twice the size needed for 32k block sizes. If there
are concerns that this might not be enough, I am happy to invest some
more time here (next week probably). However, the pg_basebackup
testsuite takes up 800+ MB to run, so I don't see the urgent need of
optimizing away 50-100 KB (which clearly everybody else thought as well)
if we are talking about disk space overhead.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: pgsql: Validate page level checksums in base backups

From
Alvaro Herrera
Date:
Michael Banck wrote:

> However, the pg_basebackup testsuite takes up 800+ MB to run,

Uh, you're right.  This seems a bit over the top.  Can we reduce that
without losing coverage?  We've gone great lengths to avoid large
amounts of data in tests elsewhere.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:


On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Michael Banck wrote:

> However, the pg_basebackup testsuite takes up 800+ MB to run,

Uh, you're right.  This seems a bit over the top.  Can we reduce that
without losing coverage?  We've gone great lengths to avoid large
amounts of data in tests elsewhere.

That didn't come out of this patch, right? This is a pre-existing issue? 

--

Re: pgsql: Validate page level checksums in base backups

From
Michael Banck
Date:
Hi,

On Wed, Apr 04, 2018 at 07:25:11PM +0200, Magnus Hagander wrote:
> On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera <alvherre@alvh.no-ip.org>
> wrote:
> > Michael Banck wrote:
> >
> > > However, the pg_basebackup testsuite takes up 800+ MB to run,
> >
> > Uh, you're right.  This seems a bit over the top.  Can we reduce that
> > without losing coverage?  We've gone great lengths to avoid large
> > amounts of data in tests elsewhere.
> 
> That didn't come out of this patch, right? This is a pre-existing issue?

It was around that ballpark before, but this patch probably made
things worse as it adds four additional datadirs (around 40 MB each
here) and we are close to 1 GB now.

I haven't looked at the other testsuites, but if it is ok to cleanup the
basebackups after each set of tests suceeded, that would alleviate the
problem.

Otherwise, I had a quick look and there is no obvious outlier; the
pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
could be cut down somehow?) and the base backups are 22-40 MB each, and
there is around 20 of them, so that adds up to more than 750 MB.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:
On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

On Wed, Apr 04, 2018 at 07:25:11PM +0200, Magnus Hagander wrote:
> On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera <alvherre@alvh.no-ip.org>
> wrote:
> > Michael Banck wrote:
> >
> > > However, the pg_basebackup testsuite takes up 800+ MB to run,
> >
> > Uh, you're right.  This seems a bit over the top.  Can we reduce that
> > without losing coverage?  We've gone great lengths to avoid large
> > amounts of data in tests elsewhere.
>
> That didn't come out of this patch, right? This is a pre-existing issue?

It was around that ballpark before, but this patch probably made
things worse as it adds four additional datadirs (around 40 MB each
here) and we are close to 1 GB now.

I haven't looked at the other testsuites, but if it is ok to cleanup the
basebackups after each set of tests suceeded, that would alleviate the
problem.

Otherwise, I had a quick look and there is no obvious outlier; the
pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
could be cut down somehow?) and the base backups are 22-40 MB each, and
there is around 20 of them, so that adds up to more than 750 MB.


It certainly seems reasonable to delete the base backups once they're made, after each step, rather than keeping them around forever.

Do we have a precedent somewhere for how we do this, or does our test framework already have a way to do it? How are all the actual data directories etc cleaned up?

Or should it just be a matter of sprinkling some unlink() calls throughout the test file? 

--

Re: pgsql: Validate page level checksums in base backups

From
Michael Banck
Date:
Hi,

On Thu, Apr 05, 2018 at 01:02:27PM +0200, Magnus Hagander wrote:
> On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck <michael.banck@credativ.de>
> wrote:
> > Otherwise, I had a quick look and there is no obvious outlier; the
> > pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
> > could be cut down somehow?) and the base backups are 22-40 MB each, and
> > there is around 20 of them, so that adds up to more than 750 MB.
>
> It certainly seems reasonable to delete the base backups once they're made,
> after each step, rather than keeping them around forever.

I had a look at this and found a copy-pasto in one of the test cases
while testing, patch attached.

I've also attached a second patch (that applies on top of the first)
that removes the base backups once they are no longer needed, also
attached (but see below).
 
> Do we have a precedent somewhere for how we do this, or does our test
> framework already have a way to do it? How are all the actual data
> directories etc cleaned up?

They (and the base backups) are getting purged on success of the whole
testsuite. So to be clear - we are not leaving behind 1 GB of disk space
on success, but we use 1 GB of disk space during the test.
 
> Or should it just be a matter of sprinkling some unlink() calls throughout
> the test file?

I used rmtree() from File::Path (which is also used by PostgresNode to
clean up) to remove them during the run.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment

Re: pgsql: Validate page level checksums in base backups

From
Alvaro Herrera
Date:
Michael Banck wrote:
> Hi,

> > Do we have a precedent somewhere for how we do this, or does our test
> > framework already have a way to do it? How are all the actual data
> > directories etc cleaned up?
> 
> They (and the base backups) are getting purged on success of the whole
> testsuite. So to be clear - we are not leaving behind 1 GB of disk space
> on success, but we use 1 GB of disk space during the test.

Yeah, but it means you force the OS to keep 1 GB of useless dung in
cache.  I would hope that creating 40 MB, deleting those, then creating
further 40 MB (lather, rinse, repeat) would not overall evict 1 GB from
my OS cache.  Unless the OS keeps the unlinked files in cache, which
would be stupid.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:
On Thu, Apr 5, 2018 at 2:41 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

On Thu, Apr 05, 2018 at 01:02:27PM +0200, Magnus Hagander wrote:
> On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck <michael.banck@credativ.de>
> wrote:
> > Otherwise, I had a quick look and there is no obvious outlier; the
> > pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
> > could be cut down somehow?) and the base backups are 22-40 MB each, and
> > there is around 20 of them, so that adds up to more than 750 MB.
>
> It certainly seems reasonable to delete the base backups once they're made,
> after each step, rather than keeping them around forever.

I had a look at this and found a copy-pasto in one of the test cases
while testing, patch attached.

Cute. Applied.


I've also attached a second patch (that applies on top of the first)
that removes the base backups once they are no longer needed, also
attached (but see below).

> Do we have a precedent somewhere for how we do this, or does our test
> framework already have a way to do it? How are all the actual data
> directories etc cleaned up?

They (and the base backups) are getting purged on success of the whole
testsuite. So to be clear - we are not leaving behind 1 GB of disk space
on success, but we use 1 GB of disk space during the test.

> Or should it just be a matter of sprinkling some unlink() calls throughout
> the test file?

I used rmtree() from File::Path (which is also used by PostgresNode to
clean up) to remove them during the run.


This seems fine to me. Applied as well. 

--

Re: pgsql: Validate page level checksums in base backups

From
Tomas Vondra
Date:
Hi,

I think there's a bug in sendFile(). We do check checksums on all pages
that pass this LSN check:

    /*
     * Only check pages which have not been modified since the
     * start of the base backup. Otherwise, they might have been
     * written only halfway and the checksum would not be valid.
     * However, replaying WAL would reinstate the correct page in
     * this case.
     */
    if (PageGetLSN(page) < startptr)
    {
        ...
    }

Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
too, and we'll try to verify the checksum - but we actually do not set
checksums on empty pages.

So I think it should be something like this:

    if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
    {
        ...
    }

It might be worth verifying that the page is actually all-zeroes (and
not just with corrupted pd_upper value. Not sure it's worth it.

I've found this by fairly trivial stress testing - running pgbench and
pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
With the proposed change I see no further failures.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Validate page level checksums in base backups

From
Tomas Vondra
Date:

On 04/10/2018 11:24 PM, Tomas Vondra wrote:
> Hi,
> 
> I think there's a bug in sendFile(). We do check checksums on all pages
> that pass this LSN check:
> 
>     /*
>      * Only check pages which have not been modified since the
>      * start of the base backup. Otherwise, they might have been
>      * written only halfway and the checksum would not be valid.
>      * However, replaying WAL would reinstate the correct page in
>      * this case.
>      */
>     if (PageGetLSN(page) < startptr)
>     {
>         ...
>     }
> 
> Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
> too, and we'll try to verify the checksum - but we actually do not set
> checksums on empty pages.
> 
> So I think it should be something like this:
> 
>     if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
>     {
>         ...
>     }
> 
> It might be worth verifying that the page is actually all-zeroes (and
> not just with corrupted pd_upper value. Not sure it's worth it.
> 
> I've found this by fairly trivial stress testing - running pgbench and
> pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
> With the proposed change I see no further failures.
> 

BTW pg_verify_checksums needs the same fix.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Validate page level checksums in base backups

From
David Steele
Date:
On 4/10/18 5:24 PM, Tomas Vondra wrote:
> 
> I think there's a bug in sendFile(). We do check checksums on all pages
> that pass this LSN check:
> 
>      /*
>       * Only check pages which have not been modified since the
>       * start of the base backup. Otherwise, they might have been
>       * written only halfway and the checksum would not be valid.
>       * However, replaying WAL would reinstate the correct page in
>       * this case.
>       */
>      if (PageGetLSN(page) < startptr)
>      {
>          ...
>      }
> 
> Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
> too, and we'll try to verify the checksum - but we actually do not set
> checksums on empty pages.
> 
> So I think it should be something like this:
> 
>      if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
>      {
>          ...
>      }
> 
> It might be worth verifying that the page is actually all-zeroes (and
> not just with corrupted pd_upper value. Not sure it's worth it.
> 
> I've found this by fairly trivial stress testing - running pgbench and
> pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
> With the proposed change I see no further failures.

Good catch, Tomas!

I should have seen this since I had the same issue when I implemented 
this feature in pgBackRest.

Anyway, I agree that your fix looks correct.

Thanks,
-- 
-David
david@pgmasters.net


Re: pgsql: Validate page level checksums in base backups

From
Michael Paquier
Date:
On Tue, Apr 10, 2018 at 11:35:21PM +0200, Tomas Vondra wrote:
> BTW pg_verify_checksums needs the same fix.

Yes you are right here.  Just for the record: what needs to be done is
to check for PageIsNew() in scan_file() before fetching
pg_checksum_page() and after reading an individual block, which applies
even to what HEAD has for pg_verify_checksums.c.

I am adding an open item for this one.
--
Michael

Attachment

Re: pgsql: Validate page level checksums in base backups

From
Magnus Hagander
Date:


On Tue, Apr 10, 2018 at 11:45 PM, David Steele <david@pgmasters.net> wrote:
On 4/10/18 5:24 PM, Tomas Vondra wrote:

I think there's a bug in sendFile(). We do check checksums on all pages
that pass this LSN check:

     /*
      * Only check pages which have not been modified since the
      * start of the base backup. Otherwise, they might have been
      * written only halfway and the checksum would not be valid.
      * However, replaying WAL would reinstate the correct page in
      * this case.
      */
     if (PageGetLSN(page) < startptr)
     {
         ...
     }

Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
too, and we'll try to verify the checksum - but we actually do not set
checksums on empty pages.

So I think it should be something like this:

     if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
     {
         ...
     }

It might be worth verifying that the page is actually all-zeroes (and
not just with corrupted pd_upper value. Not sure it's worth it.

I've found this by fairly trivial stress testing - running pgbench and
pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
With the proposed change I see no further failures.

Good catch, Tomas!

I should have seen this since I had the same issue when I implemented this feature in pgBackRest.

Anyway, I agree that your fix looks correct.

I've applied this fix, along with the same thing in pg_verify_checksums. 

--