Thread: Some regression tests for the pg_control_*() functions

Some regression tests for the pg_control_*() functions

From
Michael Paquier
Date:
Hi all,

As mentioned in [1], there is no regression tests for the SQL control
functions: pg_control_checkpoint, pg_control_recovery,
pg_control_system and pg_control_init.

It would be minimal to check their execution, as of a "SELECT FROM
func()", still some validation can be done on its output as long as
the test is portable enough (needs transparency for wal_level, commit
timestamps, etc.).

Attached is a proposal to provide some coverage.  Some of the checks
could be just removed, like the ones for non-NULL fields, but I have
written out everything to show how much could be done.

Thoughts?

[1]: https://www.postgresql.org/message-id/YzY0iLxNbmaxHpbs@paquier.xyz
--
Michael

Attachment

Re: Some regression tests for the pg_control_*() functions

From
Bharath Rupireddy
Date:
On Tue, Oct 25, 2022 at 11:07 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Hi all,
>
> As mentioned in [1], there is no regression tests for the SQL control
> functions: pg_control_checkpoint, pg_control_recovery,
> pg_control_system and pg_control_init.
>
> It would be minimal to check their execution, as of a "SELECT FROM
> func()", still some validation can be done on its output as long as
> the test is portable enough (needs transparency for wal_level, commit
> timestamps, etc.).
>
> Attached is a proposal to provide some coverage.  Some of the checks
> could be just removed, like the ones for non-NULL fields, but I have
> written out everything to show how much could be done.
>
> Thoughts?
>
> [1]: https://www.postgresql.org/message-id/YzY0iLxNbmaxHpbs@paquier.xyz

+1 for improving the test coverage. Is there a strong reason to
validate individual output columns rather than select count(*) > 0
from pg_control_XXXX(); sort of tests? If the intention is to validate
the pg_controlfile contents, we have pg_controldata to look at and
pg_control_XXXX() functions doing crc checks. If this isn't enough, we
can have the pg_control_validate() function to do all the necessary
checks and simplify the tests, no?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Some regression tests for the pg_control_*() functions

From
Michael Paquier
Date:
On Wed, Oct 26, 2022 at 10:13:29AM +0530, Bharath Rupireddy wrote:
> +1 for improving the test coverage. Is there a strong reason to
> validate individual output columns rather than select count(*) > 0
> from pg_control_XXXX(); sort of tests? If the intention is to validate
> the pg_controlfile contents, we have pg_controldata to look at and
> pg_control_XXXX() functions doing crc checks.

And it could be possible that the control file finishes by writing
some incorrect data due to a bug in the backend.  Adding a count(*) or
similar to get the number of fields of the function is basically the
same as checking its execution, still I'd like to think that having a
minimum set of checks would be kind of nice on top of that.  Among all
the ones I wrote in the patch upthread, the following ones would be in
my minimalistic list:
- timeline_id > 0
- timeline_id >= prev_timeline_id
- checkpoint_lsn >= redo_lsn
- data_page_checksum_version >= 0
- Perhaps the various fields of pg_control_init() using their
lower-bound values.
- Perhaps pg_control_version and/or catalog_version_no > NN

> If this isn't enough, we
> can have the pg_control_validate() function to do all the necessary
> checks and simplify the tests, no?

There is no function like that.  Perhaps that you mean to introduce
something like that at the C level, but that does not seem necessary
to me as long as a SQL is able to do the job for the most meaningful
parts.
--
Michael

Attachment

Re: Some regression tests for the pg_control_*() functions

From
Bharath Rupireddy
Date:
On Wed, Oct 26, 2022 at 12:48 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 26, 2022 at 10:13:29AM +0530, Bharath Rupireddy wrote:
> > +1 for improving the test coverage. Is there a strong reason to
> > validate individual output columns rather than select count(*) > 0
> > from pg_control_XXXX(); sort of tests? If the intention is to validate
> > the pg_controlfile contents, we have pg_controldata to look at and
> > pg_control_XXXX() functions doing crc checks.
>
> And it could be possible that the control file finishes by writing
> some incorrect data due to a bug in the backend.

We will have bigger problems when a backend corrupts the pg_control
file, no? The bigger problems could be that the server won't come up
or it behaves abnormally or some other.

> Adding a count(*) or
> similar to get the number of fields of the function is basically the
> same as checking its execution, still I'd like to think that having a
> minimum set of checks would be kind of nice on top of that. Among all
> the ones I wrote in the patch upthread, the following ones would be in
> my minimalistic list:
> - timeline_id > 0
> - timeline_id >= prev_timeline_id
> - checkpoint_lsn >= redo_lsn
> - data_page_checksum_version >= 0
> - Perhaps the various fields of pg_control_init() using their
> lower-bound values.
> - Perhaps pg_control_version and/or catalog_version_no > NN

Can't the CRC check detect any of the above corruptions? Do we have
any evidence of backend corrupting the pg_control file or any of the
above variables while running regression tests?

If the concern is backend corrupting the pg_control file and CRC check
can't detect it, then the extra checks (as proposed in the patch) must
be placed within the core (perhaps before writing/after reading the
pg_control file), not in regression tests for sure.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Some regression tests for the pg_control_*() functions

From
Michael Paquier
Date:
On Wed, Oct 26, 2022 at 01:41:12PM +0530, Bharath Rupireddy wrote:
> We will have bigger problems when a backend corrupts the pg_control
> file, no? The bigger problems could be that the server won't come up
> or it behaves abnormally or some other.

Possibly, yes.

> Can't the CRC check detect any of the above corruptions? Do we have
> any evidence of backend corrupting the pg_control file or any of the
> above variables while running regression tests?

It could be possible that the backend writes an incorrect data
combination though its APIs, where the CRC is correct but the data is
not (say a TLI of 0, as one example).

> If the concern is backend corrupting the pg_control file and CRC check
> can't detect it, then the extra checks (as proposed in the patch) must
> be placed within the core (perhaps before writing/after reading the
> pg_control file), not in regression tests for sure.

Well, that depends on the level of protection you want.  Now there are
things in place already when it comes to recovery or at startup.
Anyway, the recent experience with the 56-bit relfilenode thread is
really that we don't check the execution of these functions at all,
and that's the actual minimal requirement, so I have applied a patch
based on count(*) > 0 for now to cover that.  I am not sure if any of
the checks for the control file fields are valuable, perhaps some
are..
--
Michael

Attachment