Re: Combine pg_walinspect till_end_of_wal functions with others - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Combine pg_walinspect till_end_of_wal functions with others
Date
Msg-id CALj2ACWLj9ViWeexS=8YBZh+ssrtPEXzF3in3j_5Hfp83O8Hsg@mail.gmail.com
Whole thread Raw
In response to Re: Combine pg_walinspect till_end_of_wal functions with others  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Combine pg_walinspect till_end_of_wal functions with others
List pgsql-hackers
On Mon, Mar 13, 2023 at 12:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Mar 10, 2023 at 04:45:06PM +0530, Bharath Rupireddy wrote:
>
> After that comes the rest of the patch, and I have found a couple of
> mistakes.
>
> -      pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn)
> +      pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL)
>        returns setof record
> [...]
> -      pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn, per_record boolean DEFAULT false)
> +      pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL, per_record boolean DEFAULT false)
>
> This part of the documentation is now incorrect.

Oh, yeah. Thanks for fixing it.

> +-- Make sure checkpoints don't interfere with the test.
> +SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
>
> Adding a physical slot is better for stability of course, but the test
> also has to drop it or installcheck would cause an existing cluster to
> have that still around.  The third argument could be true, as well, so
> as we'd use a temporary slot.

# Disabled because these tests require "wal_level=replica", which
# some installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1

pg_walinspect can't be run under installcheck. I don't think dropping
the slot at the end is needed, it's unnecessary. I saw
oldextversions.sql using the same replication slot name, well no
problem, but I changed it to a unique name.

> Hmm.  I would simplify that, and just mention that an error is raised
> when the start LSN is available, without caring about the rest (valid
> end LSN being past the current insert LSN, and error if start > end,
> the second being obvious).

Okay.

> + <note>
> +  <para>
> +   Note that <function>pg_get_wal_records_info_till_end_of_wal</function> and
> +   <function>pg_get_wal_stats_till_end_of_wal</function> functions have been
> +   removed in the <filename>pg_walinspect</filename> version
> +   <literal>1.1</literal>. The same functionality can be achieved with
> +   <function>pg_get_wal_records_info</function> and
> +   <function>pg_get_wal_stats</function> functions by specifying a future
> +   <replaceable>end_lsn</replaceable>. However, <function>till_end_of_wal</function>
> +   functions will still work if the extension is installed explicitly with
> +   version <literal>1.0</literal>.
> +  </para>
> + </note>
>
> Not convinced that this is necessary.

As hackers we know that these functions have been removed and how to
achieve till_end_of_wal with the other functions. I noticed that
you've removed my changes (see below) from the docs that were saying
how to get info/stats till_end_of_wal. That leaves end users confused
as to how they can achieve till_end_of_wal functionality. All users
can't look for commit history/message but they can easily read the
docs. I prefer to have the following (did so in the attached v7) and
get rid of the above note if you don't feel strongly about it.

+ If a future <replaceable>end_lsn</replaceable>
+      (i.e. the LSN server doesn't know about) is specified, it returns
+      informaton till end of WAL.

> +       GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal);
> +
> +       stats_per_record = PG_GETARG_BOOL(2);
>
> This code in GetWalStats() is incorrect.
> pg_get_wal_stats_till_end_of_wal() has a stats_per_record, but as
> *second* argument, so this would be broken.

Oh, yeah. Thanks for fixing it.

> +       GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal);
>
> Coming from the last point, I think that this interface is confusing,
> and actually incorrect.  From what I can see, we should be doing what
> ~15 has by grepping the argument values within the main function
> calls, and just pass them down to the internal routines GetWalStats()
> and GetWALRecordsInfo().

Hm, what you have in v6 works for me.

> At the end, I am finishing with the attached.  ValidateInputLSNs()
> ought to be called, IMO, when the caller of the SQL functions can
> directly specify an end_lsn.  This means that there is no point to do
> this check in the two till_end_* functions.  This has as cost two
> extra checks to make sure that the start_lsn is not higher than the
> current LSN, but I am fine to live with that.  It seemed rather
> natural to me to let ValidateInputLSNs() do a refresh of the end_lsn
> if it sees that it is higher than the current LSN.  And if you look
> closely, you will see that we only call *once* GetCurrentLSN() for
> each function call, so the maths are more precise.
>
> I have cleaned up the comments of the modules, while on it, as there
> was not much value in copy-pasting how a function fails while there is
> a centralized validation code path.  The tests for the till_end()
> functions have been moved to the test path where we install 1.0.

I have some comments and fixed them in the attached v7 patch:

1.
+ * pg_get_wal_records_info
  *
+ * pg_get_wal_stats
  *
I think you wanted to be consistent with function comments with
function names atop, but missed adding for all functions. Actually, I
don't have a strong opinion on these changes as they unnecessarily
bloat the changes, so I removed them.

2.
+    if (start_lsn > curr_lsn)
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("cannot accept future start LSN"),
-                 errdetail("Last known WAL LSN on the database system
is at %X/%X.",
-                           LSN_FORMAT_ARGS(curr_lsn))));
-    }
+                 errmsg("WAL start LSN must be less than current LSN")));

I don't like this inconsistency much, especially when
pg_get_wal_record_info emits "cannot accept future input LSN" with the
current LSN details (this current LSN will give a bit more information
to the user). Also, let's be consistent across returning NULLs if
input LSN/start LSN equal to the current LSN. I've done these changes
in the attached v7 patch.

3. I wanted COUNT(*) >= 0 for successful function execution to be
COUNT(*) >= 1 so that we will check for at least the functions
returning 1 record. And failures to be SELECT * FROM. This was my
intention but I don't see that in this patch or in the previous
test-refactoring commit. I added that in the attached v7 patch again.
Also, made test comments conssitent.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Allow tests to pass in OpenSSL FIPS mode
Next
From: Melih Mutlu
Date:
Subject: Re: Allow logical replication to copy tables in binary format