Thread: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Andres Freund
Date:
Hi,

We've had bugs in pg_upgrade where post-upgrade xid horizons weren't correctly
set. We've had bugs were indexes were corrupted during replay.

The latter can be caught by wal_consistency_checking - but that's pretty
expensive.

It seems $subject would have a chance of catching some of these bugs, as well
as exposing amcheck to a database with a bit more varied content?

Depending on the cost it might make sense to do this optionally, via
PG_TEST_EXTRA?

Greetings,

Andres Freund



Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Peter Geoghegan
Date:
On Sun, Apr 3, 2022 at 11:53 AM Andres Freund <andres@anarazel.de> wrote:
> We've had bugs in pg_upgrade where post-upgrade xid horizons weren't correctly
> set. We've had bugs were indexes were corrupted during replay.
>
> The latter can be caught by wal_consistency_checking - but that's pretty
> expensive.
>
> It seems $subject would have a chance of catching some of these bugs, as well
> as exposing amcheck to a database with a bit more varied content?

I thought that Andrew Dunstan (CC'd) had a BF animal that did this
setup. But I'm not sure if that ever ended up happening.

I meant to tell the authors of verify_heapam() (also CC'd) that it
really helped with my recent VACUUM project. While the assertions that
I wrote in vacuumlazy.c might catch certain bugs like this,
verify_heapam() is much more effective in practice.

Let's say that an all-visible page (or all-frozen page) has XIDs from
before relfrozenxid. Why should the next VACUUM (or any VACUUM) be
able to observe the problem? A testing strategy that doesn't rely on
these kinds of accidental details to catch bugs is far better than one
that does.

Definitely all in favor of using verify_heapam() to its full
potential. So I'm +1 on your proposal.

--
Peter Geoghegan



Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Robert Haas
Date:
On Sun, Apr 3, 2022 at 10:10 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I meant to tell the authors of verify_heapam() (also CC'd) that it
> really helped with my recent VACUUM project. While the assertions that
> I wrote in vacuumlazy.c might catch certain bugs like this,
> verify_heapam() is much more effective in practice.

Yeah, I was very excited about verify_heapam(). There is a lot more
stuff that we could check, but a lot of those things would be much
more expensive to check. It does a good job, I think, checking all the
things that a human being could potentially spot just by looking at an
individual page. I love the idea of using it in regression testing in
more places. It might find bugs in amcheck, which would be good, but I
think it's even more likely to help us find bugs in other code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Peter Geoghegan
Date:
On Mon, Apr 4, 2022 at 7:02 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, I was very excited about verify_heapam(). There is a lot more
> stuff that we could check, but a lot of those things would be much
> more expensive to check.

If anything I understated the value of verify_heapam() with this kind
of work before. Better to show just how valuable it is using an
example.

Let's introduce a fairly blatant bug to lazyvacuum.c. This change
makes VACUUM fail to account for the fact that skipping a skippable
range with an all-visible page makes it unsafe to advance
relfrozenxid:

--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1371,8 +1371,6 @@ lazy_scan_skip(LVRelState *vacrel, Buffer
*vmbuffer, BlockNumber next_block,
     else
     {
         *skipping_current_range = true;
-        if (skipsallvis)
-            vacrel->skippedallvis = true;
     }

     return next_unskippable_block;

If I run "make check-world", the tests all pass! But when I run pg_amcheck
against an affected "regression" database, it will complain about
relfrozenxid related corruption in several different tables.

> It does a good job, I think, checking all the
> things that a human being could potentially spot just by looking at an
> individual page. I love the idea of using it in regression testing in
> more places. It might find bugs in amcheck, which would be good, but I
> think it's even more likely to help us find bugs in other code.

I'd really like it if amcheck had HOT chain verification. That's the
other area where catching bugs passively with assertions and whatnot
is clearly not good enough.


--
Peter Geoghegan



Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Mark Dilger
Date:

> On Apr 4, 2022, at 9:27 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> I'd really like it if amcheck had HOT chain verification. That's the
> other area where catching bugs passively with assertions and whatnot
> is clearly not good enough.

I agree, and was hoping to get around to this in the postgres 15 development cycle.  Alas, that did not happen.  Worse,
Ihave several other projects that will keep me busy for the next few months, at least. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Andres Freund
Date:
Hi,

On 2022-04-04 10:02:37 -0400, Robert Haas wrote:
> It does a good job, I think, checking all the things that a human being
> could potentially spot just by looking at an individual page.

I think there's a few more things that'd be good to check. For example amcheck
doesn't verify that HOT chains are reasonable, which can often be spotted
looking at an individual page. Which is a bit unfortunate, given how many bugs
we had in that area.

Stuff to check around that:
- target of redirect has HEAP_ONLY_TUPLE, HEAP_UPDATED set
- In a valid ctid chain within a page (i.e. xmax = xmin):
  - tuples have HEAP_UPDATED set
  - HEAP_ONLY_TUPLE / HEAP_HOT_UPDATED matches across chains elements

I think it'd also be good to check for things like visible tuples following
invisible ones.

Greetings,

Andres Freund



Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Robert Haas
Date:
On Mon, Apr 4, 2022 at 2:16 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-04-04 10:02:37 -0400, Robert Haas wrote:
> > It does a good job, I think, checking all the things that a human being
> > could potentially spot just by looking at an individual page.
>
> I think there's a few more things that'd be good to check. For example amcheck
> doesn't verify that HOT chains are reasonable, which can often be spotted
> looking at an individual page. Which is a bit unfortunate, given how many bugs
> we had in that area.
>
> Stuff to check around that:
> - target of redirect has HEAP_ONLY_TUPLE, HEAP_UPDATED set
> - In a valid ctid chain within a page (i.e. xmax = xmin):
>   - tuples have HEAP_UPDATED set
>   - HEAP_ONLY_TUPLE / HEAP_HOT_UPDATED matches across chains elements
>
> I think it'd also be good to check for things like visible tuples following
> invisible ones.

Interesting.

*takes notes*

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Michael Paquier
Date:
On Sun, Apr 03, 2022 at 11:53:03AM -0700, Andres Freund wrote:
> It seems $subject would have a chance of catching some of these bugs, as well
> as exposing amcheck to a database with a bit more varied content?

Makes sense to me to extend that.

> Depending on the cost it might make sense to do this optionally, via
> PG_TEST_EXTRA?

Yes, it would be good to check the difference in run-time before
introducing more.  A logical dump of the regression database is no
more than 15MB if I recall correctly, so my guess is that most of the
runtime is still going to be eaten by the run of pg_regress.
--
Michael

Attachment

Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Andres Freund
Date:
Hi,

On 2022-04-05 08:46:06 +0900, Michael Paquier wrote:
> On Sun, Apr 03, 2022 at 11:53:03AM -0700, Andres Freund wrote:
> > It seems $subject would have a chance of catching some of these bugs, as well
> > as exposing amcheck to a database with a bit more varied content?
> 
> Makes sense to me to extend that.
> 
> > Depending on the cost it might make sense to do this optionally, via
> > PG_TEST_EXTRA?
> 
> Yes, it would be good to check the difference in run-time before
> introducing more.  A logical dump of the regression database is no
> more than 15MB if I recall correctly, so my guess is that most of the
> runtime is still going to be eaten by the run of pg_regress.

On my workstation it takes about 2.39s to run pg_amcheck on a regression
database with all thoroughness options enabled. With -j4 it's 0.62s.

Without more thorough checking it's 1.24s and 0.30s with -j4.

Greetings,

Andres Freund



Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Michael Paquier
Date:
On Mon, Apr 04, 2022 at 05:39:58PM -0700, Andres Freund wrote:
> On my workstation it takes about 2.39s to run pg_amcheck on a regression
> database with all thoroughness options enabled. With -j4 it's 0.62s.
>
> Without more thorough checking it's 1.24s and 0.30s with -j4.

Okay.  That sounds like an argument to enable that by default, with
parallelism.
--
Michael

Attachment

Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?

From
Andrew Dunstan
Date:
On 4/3/22 22:10, Peter Geoghegan wrote:
> On Sun, Apr 3, 2022 at 11:53 AM Andres Freund <andres@anarazel.de> wrote:
>> We've had bugs in pg_upgrade where post-upgrade xid horizons weren't correctly
>> set. We've had bugs were indexes were corrupted during replay.
>>
>> The latter can be caught by wal_consistency_checking - but that's pretty
>> expensive.
>>
>> It seems $subject would have a chance of catching some of these bugs, as well
>> as exposing amcheck to a database with a bit more varied content?
> I thought that Andrew Dunstan (CC'd) had a BF animal that did this
> setup. But I'm not sure if that ever ended up happening.


I don't think any of my BF animals do anything special in this area.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com