Thread: Some regression tests for the pg_control_*() functions
Hi all, As mentioned in [1], there is no regression tests for the SQL control functions: pg_control_checkpoint, pg_control_recovery, pg_control_system and pg_control_init. It would be minimal to check their execution, as of a "SELECT FROM func()", still some validation can be done on its output as long as the test is portable enough (needs transparency for wal_level, commit timestamps, etc.). Attached is a proposal to provide some coverage. Some of the checks could be just removed, like the ones for non-NULL fields, but I have written out everything to show how much could be done. Thoughts? [1]: https://www.postgresql.org/message-id/YzY0iLxNbmaxHpbs@paquier.xyz -- Michael
Attachment
On Tue, Oct 25, 2022 at 11:07 AM Michael Paquier <michael@paquier.xyz> wrote: > > Hi all, > > As mentioned in [1], there is no regression tests for the SQL control > functions: pg_control_checkpoint, pg_control_recovery, > pg_control_system and pg_control_init. > > It would be minimal to check their execution, as of a "SELECT FROM > func()", still some validation can be done on its output as long as > the test is portable enough (needs transparency for wal_level, commit > timestamps, etc.). > > Attached is a proposal to provide some coverage. Some of the checks > could be just removed, like the ones for non-NULL fields, but I have > written out everything to show how much could be done. > > Thoughts? > > [1]: https://www.postgresql.org/message-id/YzY0iLxNbmaxHpbs@paquier.xyz +1 for improving the test coverage. Is there a strong reason to validate individual output columns rather than select count(*) > 0 from pg_control_XXXX(); sort of tests? If the intention is to validate the pg_controlfile contents, we have pg_controldata to look at and pg_control_XXXX() functions doing crc checks. If this isn't enough, we can have the pg_control_validate() function to do all the necessary checks and simplify the tests, no? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Oct 26, 2022 at 10:13:29AM +0530, Bharath Rupireddy wrote: > +1 for improving the test coverage. Is there a strong reason to > validate individual output columns rather than select count(*) > 0 > from pg_control_XXXX(); sort of tests? If the intention is to validate > the pg_controlfile contents, we have pg_controldata to look at and > pg_control_XXXX() functions doing crc checks. And it could be possible that the control file finishes by writing some incorrect data due to a bug in the backend. Adding a count(*) or similar to get the number of fields of the function is basically the same as checking its execution, still I'd like to think that having a minimum set of checks would be kind of nice on top of that. Among all the ones I wrote in the patch upthread, the following ones would be in my minimalistic list: - timeline_id > 0 - timeline_id >= prev_timeline_id - checkpoint_lsn >= redo_lsn - data_page_checksum_version >= 0 - Perhaps the various fields of pg_control_init() using their lower-bound values. - Perhaps pg_control_version and/or catalog_version_no > NN > If this isn't enough, we > can have the pg_control_validate() function to do all the necessary > checks and simplify the tests, no? There is no function like that. Perhaps that you mean to introduce something like that at the C level, but that does not seem necessary to me as long as a SQL is able to do the job for the most meaningful parts. -- Michael
Attachment
On Wed, Oct 26, 2022 at 12:48 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Oct 26, 2022 at 10:13:29AM +0530, Bharath Rupireddy wrote: > > +1 for improving the test coverage. Is there a strong reason to > > validate individual output columns rather than select count(*) > 0 > > from pg_control_XXXX(); sort of tests? If the intention is to validate > > the pg_controlfile contents, we have pg_controldata to look at and > > pg_control_XXXX() functions doing crc checks. > > And it could be possible that the control file finishes by writing > some incorrect data due to a bug in the backend. We will have bigger problems when a backend corrupts the pg_control file, no? The bigger problems could be that the server won't come up or it behaves abnormally or some other. > Adding a count(*) or > similar to get the number of fields of the function is basically the > same as checking its execution, still I'd like to think that having a > minimum set of checks would be kind of nice on top of that. Among all > the ones I wrote in the patch upthread, the following ones would be in > my minimalistic list: > - timeline_id > 0 > - timeline_id >= prev_timeline_id > - checkpoint_lsn >= redo_lsn > - data_page_checksum_version >= 0 > - Perhaps the various fields of pg_control_init() using their > lower-bound values. > - Perhaps pg_control_version and/or catalog_version_no > NN Can't the CRC check detect any of the above corruptions? Do we have any evidence of backend corrupting the pg_control file or any of the above variables while running regression tests? If the concern is backend corrupting the pg_control file and CRC check can't detect it, then the extra checks (as proposed in the patch) must be placed within the core (perhaps before writing/after reading the pg_control file), not in regression tests for sure. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Oct 26, 2022 at 01:41:12PM +0530, Bharath Rupireddy wrote: > We will have bigger problems when a backend corrupts the pg_control > file, no? The bigger problems could be that the server won't come up > or it behaves abnormally or some other. Possibly, yes. > Can't the CRC check detect any of the above corruptions? Do we have > any evidence of backend corrupting the pg_control file or any of the > above variables while running regression tests? It could be possible that the backend writes an incorrect data combination though its APIs, where the CRC is correct but the data is not (say a TLI of 0, as one example). > If the concern is backend corrupting the pg_control file and CRC check > can't detect it, then the extra checks (as proposed in the patch) must > be placed within the core (perhaps before writing/after reading the > pg_control file), not in regression tests for sure. Well, that depends on the level of protection you want. Now there are things in place already when it comes to recovery or at startup. Anyway, the recent experience with the 56-bit relfilenode thread is really that we don't check the execution of these functions at all, and that's the actual minimal requirement, so I have applied a patch based on count(*) > 0 for now to cover that. I am not sure if any of the checks for the control file fields are valuable, perhaps some are.. -- Michael