Re: Combine pg_walinspect till_end_of_wal functions with others - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Combine pg_walinspect till_end_of_wal functions with others
Date
Msg-id ZA+yfGrBKg3UOY/2@paquier.xyz
Whole thread Raw
In response to Re: Combine pg_walinspect till_end_of_wal functions with others  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Combine pg_walinspect till_end_of_wal functions with others  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Re: Combine pg_walinspect till_end_of_wal functions with others  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Mar 13, 2023 at 03:53:37PM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 13, 2023 at 12:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>> +-- Make sure checkpoints don't interfere with the test.
>> +SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
>>
>> Adding a physical slot is better for stability of course, but the test
>> also has to drop it or installcheck would cause an existing cluster to
>> have that still around.  The third argument could be true, as well, so
>> as we'd use a temporary slot.
>
> # Disabled because these tests require "wal_level=replica", which
> # some installcheck users do not have (e.g. buildfarm clients).
> NO_INSTALLCHECK = 1
>
> pg_walinspect can't be run under installcheck. I don't think dropping
> the slot at the end is needed, it's unnecessary. I saw
> oldextversions.sql using the same replication slot name, well no
> problem, but I changed it to a unique name.

-SELECT pg_drop_replication_slot('regress_pg_walinspect_slot');
-
+-- Clean up

In my opinion, it is an incorrect practice to assume that nobody will
ever run these tests on a running instance.  FWIW, I have managed
QE/QA flows in the past that did exactly that.  I cannot say for
already-deployed clusters that could be used for production still I
don't feel comfortable with the idea to assume that nobody would do
ever that, and calls of pg_drop_replication_slot() are not a
bottleneck.  So let's be clean and drop these slots to keep the tests
self-contained.  pg_walinspect in REL_15_STABLE gets that right, IMV,
and that's no different from the role cleanup, as one example.

> As hackers we know that these functions have been removed and how to
> achieve till_end_of_wal with the other functions. I noticed that
> you've removed my changes (see below) from the docs that were saying
> how to get info/stats till_end_of_wal. That leaves end users confused
> as to how they can achieve till_end_of_wal functionality. All users
> can't look for commit history/message but they can easily read the
> docs. I prefer to have the following (did so in the attached v7) and
> get rid of the above note if you don't feel strongly about it.
>
> + If a future <replaceable>end_lsn</replaceable>
> +      (i.e. the LSN server doesn't know about) is specified, it returns
> +      informaton till end of WAL.

FWIW, I don't see a strong need for that, because this documents a
behavior where we would not fail.  And FWIW, it just feel natural to
me because the process stops the scan up to where it can.  In short,
it should be enough for the docs to mention the error patterns,
nothing else.

> I have some comments and fixed them in the attached v7 patch:
>
> 1.
> + * pg_get_wal_records_info
>   *
> + * pg_get_wal_stats
>   *
> I think you wanted to be consistent with function comments with
> function names atop, but missed adding for all functions. Actually, I
> don't have a strong opinion on these changes as they unnecessarily
> bloat the changes, so I removed them.

Either is fine if you feel strongly on this one, I am just used to
doing that.

> 2.
> +    if (start_lsn > curr_lsn)
>          ereport(ERROR,
>                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -                 errmsg("cannot accept future start LSN"),
> -                 errdetail("Last known WAL LSN on the database system
> is at %X/%X.",
> -                           LSN_FORMAT_ARGS(curr_lsn))));
> -    }
> +                 errmsg("WAL start LSN must be less than current LSN")));
>
> I don't like this inconsistency much, especially when
> pg_get_wal_record_info emits "cannot accept future input LSN" with the
> current LSN details (this current LSN will give a bit more information
> to the user). Also, let's be consistent across returning NULLs if
> input LSN/start LSN equal to the current LSN. I've done these changes
> in the attached v7 patch.

No arguments against that, consistency is good.

> 3. I wanted COUNT(*) >= 0 for successful function execution to be
> COUNT(*) >= 1 so that we will check for at least the functions
> returning 1 record. And failures to be SELECT * FROM. This was my
> intention but I don't see that in this patch or in the previous
> test-refactoring commit. I added that in the attached v7 patch again.
> Also, made test comments conssitent.

Noticed that as well, still it feels to me that these had better be
separated from the rest, and be in their own patch, perhaps *after*
the main patch discussed on this thread, or just moved into their own
threads.  If a commit finishes with a list of bullet points referring
to a list of completely different things than the subject, there may
be a problem.  In this v7, we have:
- Change the behavior of the functions for end LSNs, tweaking the
tests to do so.
- Adjust more comments and formats in the tests.
- Adjust some tests to be pickier with detection of generated WAL
records.
- Remove the drop slot calls.
But what we need to care most here is the first point.

I am not arguing that none of that should not be changed, but it
should not be inside a patch that slightly tweaks the behaviors of
some existing functions.  First, it creates a lot of noise in the
diffs, making it harder for anybody reading this change to find the
core of what's happening.  Second, it increases the odds of mistakes
and bugs (if a revert is done, the work to-be-done gets greater at the
end).  When it comes to this patch, the changes should only involve
the calls of till_end_of_wal() being moved around from
pg_walinspect.sql to oldextversions.sql.  If you look at v6, the tests
are only focusing on this part, and nothing else.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Testing autovacuum wraparound (including failsafe)
Next
From: Jeff Davis
Date:
Subject: ICU 54 and earlier are too dangerous