Thread: Combine pg_walinspect till_end_of_wal functions with others

Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
Hi,

In a recent discussion [1], Michael Paquier asked if we can combine
pg_walinspect till_end_of_wal functions with other functions
pg_get_wal_records_info and pg_get_wal_stats. The code currently looks
much duplicated and the number of functions that pg_walinspect exposes
to the users is bloated. The point was that the till_end_of_wal
functions determine the end LSN and everything else that they do is
the same as their counterpart functions. Well, the idea then was to
keep things simple, not clutter the APIs, have better and consistent
user-inputted end_lsn validations at the cost of usability and code
redundancy. However, now I tend to agree with the feedback received.

I'm attaching a patch doing the $subject with the following behavior:
1. If start_lsn is NULL, error out/return NULL.
2. If end_lsn isn't specified, default to NULL, then determine the end_lsn.
3. If end_lsn is specified as NULL, then determine the end_lsn.
4. If end_lsn is specified as non-NULL, then determine if it is
greater than start_lsn if yes, go ahead do the job, otherwise error
out.

Another idea is to convert till_end_of_wal flavors to SQL-only
functions and remove the c code from pg_walinspect.c. However, I
prefer $subject and completely remove till_end_of_wal flavors for
better usability in the long term.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACV-WBN%3DEUgUPyYOGitp%2Brn163vMnQd%3DHcWrnKt-uqFYFA%40mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Wed, Mar 1, 2023 at 1:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> In a recent discussion [1], Michael Paquier asked if we can combine
> pg_walinspect till_end_of_wal functions with other functions
> pg_get_wal_records_info and pg_get_wal_stats. The code currently looks
> much duplicated and the number of functions that pg_walinspect exposes
> to the users is bloated. The point was that the till_end_of_wal
> functions determine the end LSN and everything else that they do is
> the same as their counterpart functions. Well, the idea then was to
> keep things simple, not clutter the APIs, have better and consistent
> user-inputted end_lsn validations at the cost of usability and code
> redundancy. However, now I tend to agree with the feedback received.
>
> I'm attaching a patch doing the $subject with the following behavior:
> 1. If start_lsn is NULL, error out/return NULL.
> 2. If end_lsn isn't specified, default to NULL, then determine the end_lsn.
> 3. If end_lsn is specified as NULL, then determine the end_lsn.
> 4. If end_lsn is specified as non-NULL, then determine if it is
> greater than start_lsn if yes, go ahead do the job, otherwise error
> out.
>
> Another idea is to convert till_end_of_wal flavors to SQL-only
> functions and remove the c code from pg_walinspect.c. However, I
> prefer $subject and completely remove till_end_of_wal flavors for
> better usability in the long term.
>
> Thoughts?
>
> [1] https://www.postgresql.org/message-id/CALj2ACV-WBN%3DEUgUPyYOGitp%2Brn163vMnQd%3DHcWrnKt-uqFYFA%40mail.gmail.com

Needed a rebase due to 019f8624664dbf1e25e2bd721c7e99822812d109.
Attaching v2 patch. Sorry for the noise.


--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Julien Rouhaud
Date:
On Wed, Mar 01, 2023 at 08:30:00PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 1, 2023 at 1:00 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > In a recent discussion [1], Michael Paquier asked if we can combine
> > pg_walinspect till_end_of_wal functions with other functions
> > pg_get_wal_records_info and pg_get_wal_stats. The code currently looks
> > much duplicated and the number of functions that pg_walinspect exposes
> > to the users is bloated. The point was that the till_end_of_wal
> > functions determine the end LSN and everything else that they do is
> > the same as their counterpart functions. Well, the idea then was to
> > keep things simple, not clutter the APIs, have better and consistent
> > user-inputted end_lsn validations at the cost of usability and code
> > redundancy. However, now I tend to agree with the feedback received.

+1, especially since I really don't like the use of "till" in the function
names.

> > I'm attaching a patch doing the $subject with the following behavior:
> > 1. If start_lsn is NULL, error out/return NULL.

Maybe naive and unrelated question, but is that really helpful?  If for some
reason I want to see information about *all available WAL*, I have to manually
dig for a suitable LSN.  The same action with pg_waldump is easier as I just
need to use the oldest available WAL that's present on disk.

> > Another idea is to convert till_end_of_wal flavors to SQL-only
> > functions and remove the c code from pg_walinspect.c. However, I
> > prefer $subject and completely remove till_end_of_wal flavors for
> > better usability in the long term.

I agree that using default arguments is a way better API.

Nitpicking:

Maybe we could group the kept unused exported C function at the end of the
file?

Also:

/*
- * Get info and data of all WAL records from start LSN till end of WAL.
+ * NB: This function does nothing and stays here for backward compatibility.
+ * Without it, the extension fails to install.
  *
- * This function emits an error if a future start i.e. WAL LSN the database
- * system doesn't know about is specified.
+ * Try using pg_get_wal_records_info() for the same till_end_of_wal
+ * functionaility.
  */
 Datum
 pg_get_wal_records_info_till_end_of_wal(PG_FUNCTION_ARGS)
 {
-   XLogRecPtr  start_lsn;
-   XLogRecPtr  end_lsn = InvalidXLogRecPtr;
-
-   start_lsn = PG_GETARG_LSN(0);
-
-   end_lsn = ValidateInputLSNs(true, start_lsn, end_lsn);
-
-   GetWALRecordsInfo(fcinfo, start_lsn, end_lsn);
-
-   PG_RETURN_VOID();
+   PG_RETURN_NULL();
 }

I don't like much this chunk (same for the other kept function).  Apart from
the obvious typo in "functionaility", I don't think that the comment is really
accurate.

Also, are we actually helping users if we simply return NULL there?  It's quite
possible that people will start to use the new shared lib while still having
the 1.1 SQL definition of the extension installed.  In that case, they will
simply retrieve a NULL row and may spend some time wondering why until they
eventually realize that their only option is to upgrade the extension first and
then use another function.  Why not make their life easier and explicity raise
a suitable error at the SQL level if users try to use those functions?



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Mon, Mar 6, 2023 at 2:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> > > I'm attaching a patch doing the $subject with the following behavior:
> > > 1. If start_lsn is NULL, error out/return NULL.
>
> Maybe naive and unrelated question, but is that really helpful?  If for some
> reason I want to see information about *all available WAL*, I have to manually
> dig for a suitable LSN.  The same action with pg_waldump is easier as I just
> need to use the oldest available WAL that's present on disk.

Are you saying that the pg_walinspect functions should figure out the
oldest available WAL file and LSN, and start from there if start_lsn
specified as NULL or invalid? Note that pg_waldump requires either
explicit startlsn and/or startseg (WAL file name), it can't search for
the oldest WAL file available and start from there automatically.

If the user wants to figure it out, they can do something like below:

ostgres=# select * from pg_ls_waldir() order by name;
           name           |   size   |      modification
--------------------------+----------+------------------------
 000000010000000000000001 | 16777216 | 2023-03-06 14:54:55+00
 000000010000000000000002 | 16777216 | 2023-03-06 14:54:55+00

If we try to make these functions figure out the oldest WAl file and
start from there, then it'll unnecessarily complicate the APIs and
functions. If we still think we need a better function for the users
to figure out the oldest WAL file, perhaps, add a SQL-only
view/function to pg_walinspect that returns "select name from
pg_ls_waldir() order by name limit 1;", but honestly, that's so
trivial.

> > > Another idea is to convert till_end_of_wal flavors to SQL-only
> > > functions and remove the c code from pg_walinspect.c. However, I
> > > prefer $subject and completely remove till_end_of_wal flavors for
> > > better usability in the long term.
>
> I agree that using default arguments is a way better API.

Thanks. Yes, that's true.

> Nitpicking:
>
> Maybe we could group the kept unused exported C function at the end of the
> file?

Will do.

> Also:
>
> /*
> - * Get info and data of all WAL records from start LSN till end of WAL.
> + * NB: This function does nothing and stays here for backward compatibility.
> + * Without it, the extension fails to install.
>   *
> - * This function emits an error if a future start i.e. WAL LSN the database
> - * system doesn't know about is specified.
> + * Try using pg_get_wal_records_info() for the same till_end_of_wal
> + * functionaility.
>
> I don't like much this chunk (same for the other kept function).  Apart from
> the obvious typo in "functionaility", I don't think that the comment is really
> accurate.

Can you be more specific what's inaccurate about the comment?

> Also, are we actually helping users if we simply return NULL there?  It's quite
> possible that people will start to use the new shared lib while still having
> the 1.1 SQL definition of the extension installed.  In that case, they will
> simply retrieve a NULL row and may spend some time wondering why until they
> eventually realize that their only option is to upgrade the extension first and
> then use another function.  Why not make their life easier and explicity raise
> a suitable error at the SQL level if users try to use those functions?

I thought about it initially, but wanted to avoid more errors. An
error would make them use the new version easily. I will change it
that way.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Matthias van de Meent
Date:
On Mon, 6 Mar 2023 at 16:06, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> If we try to make these functions figure out the oldest WAl file and
> start from there, then it'll unnecessarily complicate the APIs and
> functions. If we still think we need a better function for the users
> to figure out the oldest WAL file, perhaps, add a SQL-only
> view/function to pg_walinspect that returns "select name from
> pg_ls_waldir() order by name limit 1;", but honestly, that's so
> trivial.

That "order by name limit 1" has subtle bugs when you're working on a
system that has experienced timeline switches. It is entirely possible
that the first file (as sorted by the default collation) is not the
first record you can inspect, or even in your timeline's history.


Kind regards,

Matthias van de Meent



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Mon, Mar 6, 2023 at 8:52 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Mon, 6 Mar 2023 at 16:06, Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > If we try to make these functions figure out the oldest WAl file and
> > start from there, then it'll unnecessarily complicate the APIs and
> > functions. If we still think we need a better function for the users
> > to figure out the oldest WAL file, perhaps, add a SQL-only
> > view/function to pg_walinspect that returns "select name from
> > pg_ls_waldir() order by name limit 1;", but honestly, that's so
> > trivial.
>
> That "order by name limit 1" has subtle bugs when you're working on a
> system that has experienced timeline switches. It is entirely possible
> that the first file (as sorted by the default collation) is not the
> first record you can inspect, or even in your timeline's history.

Hm. Note that pg_walinspect currently searches WAL on insertion
timeline; it doesn't care about the older timelines. The idea of
making it look at WAL on an older timeline was discussed, but for the
sake of simplicity we kept the functions simple. If needed, I can try
adding the timeline as input parameters to all the functions (with
default -1 meaning current insertion timeline; if specified, look for
WAL on that timeline).

Are you saying that a pg_walinspect function that traverses the pg_wal
directory and figures out the old valid WAL on a given timeline is
still useful? Or make the functions look for older WAL if start_lsn is
given as NULL or invalid?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Matthias van de Meent
Date:
On Mon, 6 Mar 2023 at 16:37, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Mar 6, 2023 at 8:52 PM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> >
> > On Mon, 6 Mar 2023 at 16:06, Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > If we try to make these functions figure out the oldest WAl file and
> > > start from there, then it'll unnecessarily complicate the APIs and
> > > functions. If we still think we need a better function for the users
> > > to figure out the oldest WAL file, perhaps, add a SQL-only
> > > view/function to pg_walinspect that returns "select name from
> > > pg_ls_waldir() order by name limit 1;", but honestly, that's so
> > > trivial.
> >
> > That "order by name limit 1" has subtle bugs when you're working on a
> > system that has experienced timeline switches. It is entirely possible
> > that the first file (as sorted by the default collation) is not the
> > first record you can inspect, or even in your timeline's history.
>
> Hm. Note that pg_walinspect currently searches WAL on insertion
> timeline; it doesn't care about the older timelines. The idea of
> making it look at WAL on an older timeline was discussed, but for the
> sake of simplicity we kept the functions simple. If needed, I can try
> adding the timeline as input parameters to all the functions (with
> default -1 meaning current insertion timeline; if specified, look for
> WAL on that timeline).
>
> Are you saying that a pg_walinspect function that traverses the pg_wal
> directory and figures out the old valid WAL on a given timeline is
> still useful? Or make the functions look for older WAL if start_lsn is
> given as NULL or invalid?

The specific comment I made was only regarding the following issue: An
instance may still have WAL segments from before the latest timeline
switch. These segments may have a higher LSN and lower timeline number
than your current running timeline+LSN (because of e.g. pg_rewind).
This will then result in unwanted behaviour when you sort the segments
numerically/alphabetically and then assume that the first file's LSN
is valid (or available) in your current timeline.

That is why "order by name limit 1" isn't a good solution, and that's
what I was commenting on: you need to parse the timeline hierarchy to
determine which timelines you can use which WAL segments of.

To answer your question on whether I'd like us to traverse timeline
switches: Yes, I'd really like it if we were able to decode the
current timeline's hierarchical WAL of a PG instance in one go, from
the start at (iirc) 0x10000 all the way to the current LSN, assuming
the segments are available.


Kind regards,

Matthias van de Meent



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Julien Rouhaud
Date:
On Mon, Mar 06, 2023 at 08:36:17PM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 6, 2023 at 2:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> > Also:
> >
> > /*
> > - * Get info and data of all WAL records from start LSN till end of WAL.
> > + * NB: This function does nothing and stays here for backward compatibility.
> > + * Without it, the extension fails to install.
> >   *
> > - * This function emits an error if a future start i.e. WAL LSN the database
> > - * system doesn't know about is specified.
> > + * Try using pg_get_wal_records_info() for the same till_end_of_wal
> > + * functionaility.
> >
> > I don't like much this chunk (same for the other kept function).  Apart from
> > the obvious typo in "functionaility", I don't think that the comment is really
> > accurate.
>
> Can you be more specific what's inaccurate about the comment?

It's problematic to install the extension if we rely on upgrade scripts only.
We could also provide a pg_walinspect--1.2.sql file and it would just work, and
that may have been a good idea if there wasn't also the problem of people still
having the version 1.1 locally installed, as we don't want them to see random
failures like "could not find function ... in file ...", or keeping the ability
to install the former 1.1 version (with those functions bypassed).



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Tue, Mar 07, 2023 at 09:13:46AM +0800, Julien Rouhaud wrote:
> It's problematic to install the extension if we rely on upgrade scripts only.
> We could also provide a pg_walinspect--1.2.sql file and it would just work, and
> that may have been a good idea if there wasn't also the problem of people still
> having the version 1.1 locally installed, as we don't want them to see random
> failures like "could not find function ... in file ...", or keeping the ability
> to install the former 1.1 version (with those functions bypassed).

Why would we need a 1.2?  HEAD is the only branch with pg_walinspect
1.1, and it has not been released yet.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Julien Rouhaud
Date:
On Tue, 7 Mar 2023, 12:36 Michael Paquier, <michael@paquier.xyz> wrote:
On Tue, Mar 07, 2023 at 09:13:46AM +0800, Julien Rouhaud wrote:
> It's problematic to install the extension if we rely on upgrade scripts only.
> We could also provide a pg_walinspect--1.2.sql file and it would just work, and
> that may have been a good idea if there wasn't also the problem of people still
> having the version 1.1 locally installed, as we don't want them to see random
> failures like "could not find function ... in file ...", or keeping the ability
> to install the former 1.1 version (with those functions bypassed).

Why would we need a 1.2?  HEAD is the only branch with pg_walinspect
1.1, and it has not been released yet.

ah right I should have checked. but the same ABI compatibility concern still exists for version 1.0 of the extension. 

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Tue, Mar 07, 2023 at 12:42:20PM +0800, Julien Rouhaud wrote:
> ah right I should have checked. but the same ABI compatibility concern
> still exists for version 1.0 of the extension.

Yes, we'd better make sure that the past code is able to run, at
least.  Now I am not really convinced that we have the need to enforce
an error with the new code even if 1.0 is still installed, so as it is
possible to remove all the traces of the code that triggers errors if
an end LSN is higher than the current insert LSN for primaries or
replayed LSN for standbys.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Julien Rouhaud
Date:
On Tue, Mar 07, 2023 at 01:56:24PM +0900, Michael Paquier wrote:
> On Tue, Mar 07, 2023 at 12:42:20PM +0800, Julien Rouhaud wrote:
> > ah right I should have checked. but the same ABI compatibility concern
> > still exists for version 1.0 of the extension.
>
> Yes, we'd better make sure that the past code is able to run, at
> least.  Now I am not really convinced that we have the need to enforce
> an error with the new code even if 1.0 is still installed,

So keep this "deprecated" C function working, as it would only be a few lines
of code?

> so as it is
> possible to remove all the traces of the code that triggers errors if
> an end LSN is higher than the current insert LSN for primaries or
> replayed LSN for standbys.

+1 for that



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Tue, Mar 7, 2023 at 11:17 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Mar 07, 2023 at 01:56:24PM +0900, Michael Paquier wrote:
> > On Tue, Mar 07, 2023 at 12:42:20PM +0800, Julien Rouhaud wrote:
> > > ah right I should have checked. but the same ABI compatibility concern
> > > still exists for version 1.0 of the extension.
> >
> > Yes, we'd better make sure that the past code is able to run, at
> > least.  Now I am not really convinced that we have the need to enforce
> > an error with the new code even if 1.0 is still installed,
>
> So keep this "deprecated" C function working, as it would only be a few lines
> of code?
>
> > so as it is
> > possible to remove all the traces of the code that triggers errors if
> > an end LSN is higher than the current insert LSN for primaries or
> > replayed LSN for standbys.
>
> +1 for that

I understand that we want to keep till_end_of_wal functions defined
around in .c file so that if someone does CREATE EXTENSION
pg_walinspect WITH VERSION '1.0'; on the latest extension shared
library (with 1.1 version), the till_end_of_wal functions should work
for them.

Also, I noticed that there's some improvement needed for the input
validations, especially for the end_lsn.

Here I'm with the v3 patch addressing the above comments. Please
review it further.

1. When start_lsn is NULL or invalid ('0/0'), emit an error. There was
a comment on the functions automatically determining start_lsn to be
the oldest WAL LSN. I'm not implementing this change now, as it
requires extra work to traverse the pg_wal directory. I'm planning to
do it in the next set of improvements where I'm planning to make the
functions timeline-aware, introduce functions for inspecting
wal_buffers and so on.
2. When end_lsn is NULL or invalid ('0/0') IOW end_lsn is not
specified, deduce end_lsn to be the current flush LSN when not in
recovery, current replayed LSN when in recovery. This is the main
change that avoids till_end_of_wal functions in version 1.1.
3. When end_lsn is specified but greater than or equal to the
start_lsn, return NULL. Given the above review comments on more errors
being reported, I chose to return NULL for better usability.
4. When end_lsn is specified but less than the start_lsn, get
info/stats up until end_lsn.
5. Retained pg_get_wal_records_info_till_end_of_wal and
pg_get_wal_stats_till_end_of_wal for backward compatibility.
6. Piggybacked these functions and behaviour under the new HEAD-only
extension version 1.1 introduced recently, instead of bumping to 1.2.
When PG16 is out, users will have 1.1 with all of these new
functionality.
7. Added tests to verify the extension update path in
oldextversions.sql similar to other extensions'. (suggested by Michael
Paquier).
8. Added a note in the pg_walinspect documentation about removal of
pg_get_wal_records_info_till_end_of_wal and
pg_get_wal_stats_till_end_of_wal in version 1.1 and how the other
functions can be used to achieve the same functionality and how these
till_end_of_wal functions can work if extension is installed
explicitly with version 1.0.
9. Refactored the tests according to the new behaviours.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Tue, Mar 07, 2023 at 01:47:01PM +0800, Julien Rouhaud wrote:
> So keep this "deprecated" C function working, as it would only be a few lines
> of code?

Yes, I guess that this would be the final picture, moving forward I'd
like to think that we should just remove the SQL declaration of the
till_end_of_wal() functions to keep a clean interface.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Wed, Mar 08, 2023 at 01:40:46PM +0530, Bharath Rupireddy wrote:
> 1. When start_lsn is NULL or invalid ('0/0'), emit an error. There was
> a comment on the functions automatically determining start_lsn to be
> the oldest WAL LSN. I'm not implementing this change now, as it
> requires extra work to traverse the pg_wal directory. I'm planning to
> do it in the next set of improvements where I'm planning to make the
> functions timeline-aware, introduce functions for inspecting
> wal_buffers and so on.
>
> [.. long description ..]
>
> 9. Refactored the tests according to the new behaviours.

Hmm.  I think this patch ought to have a result simpler than what's
proposed here.

First, do we really have to begin marking the functions as non-STRICT
to abide with the treatment of NULL as a special value?  The part that
I've found personally the most annoying with these functions is that
an incorrect bound leads to a random failure, particularly when such
queries are used for monitoring.  I would simplify the whole with two
simple rules, as of:
- Keeping all the functions strict.
- When end_lsn is a LSN in the future of the current LSN inserted or
replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or
GetFlushRecPtr().  This way, monitoring tools can use a value ahead,
at will.
- Failing if start_lsn > end_lsn.
- Failing if start_lsn refers to a position older than what exists is
still fine by me.

I would also choose to remove
pg_get_wal_records_info_till_end_of_wal() from the SQL interface in
1.1 to limit the confusion arount it, but keep a few lines of code so
as we are still able to use it when pg_walinspect 1.0 is the version
enabled with CREATE EXTENSION.

In short, pg_get_wal_records_info_till_end_of_wal() should be able to
use exactly the same code as pg_get_wal_records_info(), still you need
to keep *two* functions for their prosrc with PG_FUNCTION_ARGS as
arguments so as 1.0 would work when dropped in place.  The result, it
seems to me, mostly comes to simplify ValidateInputLSNs() and remove
its till_end_of_wal argument.

+-- Removed function
+SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_of_wal'::regproc);
+ERROR:  function "pg_get_wal_records_info_till_end_of_wal" does not exist
+LINE 1: SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_...

It seems to me that you should just replace all that and anything
depending on pg_get_functiondef() with a \dx+ pg_walinspect, that
would list all the objects part of the extension for the specific
version you want to test.  Not sure that there is a need to list the
full function definitions, either.  That just bloats the tests.

I think, however, that it is critical to test in oldextversions.out
the *executions* of the functions, so as we make sure that they don't
crash.  The patch is missing that.

+-- Invalid input LSNs
+SELECT * FROM pg_get_wal_record_info('0/0'); -- ERROR
+ERROR:  invalid input LSN
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Wed, Mar 8, 2023 at 1:40 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Mar 7, 2023 at 11:17 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Here I'm with the v3 patch addressing the above comments. Please
> review it further.

Needed a rebase. v4 patch is attached. I'll address the latest review
comments in a bit.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Julien Rouhaud
Date:
On Fri, Mar 10, 2023 at 12:24 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Mar 08, 2023 at 01:40:46PM +0530, Bharath Rupireddy wrote:
> > 1. When start_lsn is NULL or invalid ('0/0'), emit an error. There was
> > a comment on the functions automatically determining start_lsn to be
> > the oldest WAL LSN. I'm not implementing this change now, as it
> > requires extra work to traverse the pg_wal directory. I'm planning to
> > do it in the next set of improvements where I'm planning to make the
> > functions timeline-aware, introduce functions for inspecting
> > wal_buffers and so on.
> >
> > [.. long description ..]
> >
> > 9. Refactored the tests according to the new behaviours.
>
> Hmm.  I think this patch ought to have a result simpler than what's
> proposed here.
>
> First, do we really have to begin marking the functions as non-STRICT
> to abide with the treatment of NULL as a special value?  The part that
> I've found personally the most annoying with these functions is that
> an incorrect bound leads to a random failure, particularly when such
> queries are used for monitoring.

As long as we provide a sensible default value (so I guess '0/0' to
mean "no upper bound") and that we therefore don't have to manually
specify an upper bound if we don't want one I'm fine with keeping the
functions marked as STRICT.

>  I would simplify the whole with two
> simple rules, as of:
> - Keeping all the functions strict.
> - When end_lsn is a LSN in the future of the current LSN inserted or
> replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or
> GetFlushRecPtr().  This way, monitoring tools can use a value ahead,
> at will.
> - Failing if start_lsn > end_lsn.
> - Failing if start_lsn refers to a position older than what exists is
> still fine by me.

+1

> I would also choose to remove
> pg_get_wal_records_info_till_end_of_wal() from the SQL interface in
> 1.1 to limit the confusion arount it, but keep a few lines of code so
> as we are still able to use it when pg_walinspect 1.0 is the version
> enabled with CREATE EXTENSION.

Yeah the SQL function should be removed no matter what.



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Fri, Mar 10, 2023 at 04:04:15PM +0800, Julien Rouhaud wrote:
> As long as we provide a sensible default value (so I guess '0/0' to
> mean "no upper bound") and that we therefore don't have to manually
> specify an upper bound if we don't want one I'm fine with keeping the
> functions marked as STRICT.

FWIW, using also InvalidXLogRecPtr as a shortcut to say "Don't fail,
just do the job" is fine by me.  Something like a FFF/FFFFFFFF should
just mean the same on a fresh cluster, still it gets risky the longer
the WAL is generated.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Julien Rouhaud
Date:
On Fri, 10 Mar 2023, 16:14 Michael Paquier, <michael@paquier.xyz> wrote:
On Fri, Mar 10, 2023 at 04:04:15PM +0800, Julien Rouhaud wrote:
> As long as we provide a sensible default value (so I guess '0/0' to
> mean "no upper bound") and that we therefore don't have to manually
> specify an upper bound if we don't want one I'm fine with keeping the
> functions marked as STRICT.

FWIW, using also InvalidXLogRecPtr as a shortcut to say "Don't fail,
just do the job" is fine by me.

isn't '0/0' the same as InvalidXLogRecPtr? but my point is that we shouldn't require to spell it explicitly, just rely on the default value.

Something like a FFF/FFFFFFFF should
just mean the same on a fresh cluster, still it gets risky the longer
the WAL is generated.

yeah, it would be handy to accept 'infinity' in that context. 

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Fri, Mar 10, 2023 at 04:37:23PM +0800, Julien Rouhaud wrote:
> isn't '0/0' the same as InvalidXLogRecPtr? but my point is that we
> shouldn't require to spell it explicitly, just rely on the default value.

Perhaps.  Still the addition of a DEFAULT to the function definitions
and its value looks like a second patch to me.  The first should just
lift the bound restrictions currently in place while cleaning up the
till_* functions.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Fri, Mar 10, 2023 at 9:54 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Hmm.  I think this patch ought to have a result simpler than what's
> proposed here.
>
> First, do we really have to begin marking the functions as non-STRICT
> to abide with the treatment of NULL as a special value?  The part that
> I've found personally the most annoying with these functions is that
> an incorrect bound leads to a random failure, particularly when such
> queries are used for monitoring.  I would simplify the whole with two
> simple rules, as of:
> - Keeping all the functions strict.
> - When end_lsn is a LSN in the future of the current LSN inserted or
> replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or
> GetFlushRecPtr().  This way, monitoring tools can use a value ahead,
> at will.
> - Failing if start_lsn > end_lsn.
> - Failing if start_lsn refers to a position older than what exists is
> still fine by me.

Done this way in the attached v5 patch.

> I would also choose to remove
> pg_get_wal_records_info_till_end_of_wal() from the SQL interface in
> 1.1 to limit the confusion arount it, but keep a few lines of code so
> as we are still able to use it when pg_walinspect 1.0 is the version
> enabled with CREATE EXTENSION.
>
> In short, pg_get_wal_records_info_till_end_of_wal() should be able to
> use exactly the same code as pg_get_wal_records_info(), still you need
> to keep *two* functions for their prosrc with PG_FUNCTION_ARGS as
> arguments so as 1.0 would work when dropped in place.  The result, it
> seems to me, mostly comes to simplify ValidateInputLSNs() and remove
> its till_end_of_wal argument.

This has already been taken care of in the previous patches, e.g. v3
and v4 and so in the latest v5 patch.

> +-- Removed function
> +SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_of_wal'::regproc);
> +ERROR:  function "pg_get_wal_records_info_till_end_of_wal" does not exist
> +LINE 1: SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_...
>
> It seems to me that you should just replace all that and anything
> depending on pg_get_functiondef() with a \dx+ pg_walinspect, that
> would list all the objects part of the extension for the specific
> version you want to test.  Not sure that there is a need to list the
> full function definitions, either.  That just bloats the tests.

Agreed and used \dx+. One can anyways look at the function definitions
and compare for knowing what's changed.

> I think, however, that it is critical to test in oldextversions.out
> the *executions* of the functions, so as we make sure that they don't
> crash.  The patch is missing that.

You mean, we need to test the till_end_of_wal functions that were
removed in the latest version 1.1 but they must work if the extension
is installed with 1.0? If yes, I now added them.

> +-- Invalid input LSNs
> +SELECT * FROM pg_get_wal_record_info('0/0'); -- ERROR
> +ERROR:  invalid input LSN

Removed InvalidRecPtr checks for input/start LSN because anyways the
functions will fail with ERROR:  could not read WAL at LSN 0/0.

Any comments on the attached v5 patch?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Fri, Mar 10, 2023 at 04:45:06PM +0530, Bharath Rupireddy wrote:
> Any comments on the attached v5 patch?

I have reviewed the patch, and found it pretty messy.  The tests
should have been divided into their own patch, I think.  This is
rather straight-forward once the six functions have their checks
grouped together.  The result was pretty good, so I have begun by
applying that as 1f282c2.  This also includes most of the refinements
you have proposed for the whitespaces in the tests.  Note that we were
missing a few spots with the bound checks for the six functions, so
now the coverage should be full.

After that comes the rest of the patch, and I have found a couple of
mistakes.

-      pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn)
+      pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL)
       returns setof record
[...]
-      pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn, per_record boolean DEFAULT false)
+      pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL, per_record boolean DEFAULT false)

This part of the documentation is now incorrect.

+-- Make sure checkpoints don't interfere with the test.
+SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);

Adding a physical slot is better for stability of course, but the test
also has to drop it or installcheck would cause an existing cluster to
have that still around.  The third argument could be true, as well, so
as we'd use a temporary slot.

-      If <replaceable>start_lsn</replaceable>
-      or <replaceable>end_lsn</replaceable> are not yet available, the
-      function will raise an error. For example:
+      If a future <replaceable>end_lsn</replaceable> (i.e. the LSN server
+      doesn't know about) is specified, it returns stats till end of WAL. It
+      will raise an error, if the server doesn't have WAL available at given
+      <replaceable>start_lsn</replaceable> or if the
+      <replaceable>start_lsn</replaceable> is in future or is past the
+      <replaceable>end_lsn</replaceable>. For example, usage of the function is
+      as follows:

Hmm.  I would simplify that, and just mention that an error is raised
when the start LSN is available, without caring about the rest (valid
end LSN being past the current insert LSN, and error if start > end,
the second being obvious).

+ <note>
+  <para>
+   Note that <function>pg_get_wal_records_info_till_end_of_wal</function> and
+   <function>pg_get_wal_stats_till_end_of_wal</function> functions have been
+   removed in the <filename>pg_walinspect</filename> version
+   <literal>1.1</literal>. The same functionality can be achieved with
+   <function>pg_get_wal_records_info</function> and
+   <function>pg_get_wal_stats</function> functions by specifying a future
+   <replaceable>end_lsn</replaceable>. However, <function>till_end_of_wal</function>
+   functions will still work if the extension is installed explicitly with
+   version <literal>1.0</literal>.
+  </para>
+ </note>

Not convinced that this is necessary.

+       GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal);
+
+       stats_per_record = PG_GETARG_BOOL(2);

This code in GetWalStats() is incorrect.
pg_get_wal_stats_till_end_of_wal() has a stats_per_record, but as
*second* argument, so this would be broken.

+       GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal);

Coming from the last point, I think that this interface is confusing,
and actually incorrect.  From what I can see, we should be doing what
~15 has by grepping the argument values within the main function
calls, and just pass them down to the internal routines GetWalStats()
and GetWALRecordsInfo().

-static bool
-IsFutureLSN(XLogRecPtr lsn, XLogRecPtr *curr_lsn)
+static XLogRecPtr
+GetCurrentLSN(void)

This wrapper is actually a good idea.

At the end, I am finishing with the attached.  ValidateInputLSNs()
ought to be called, IMO, when the caller of the SQL functions can
directly specify an end_lsn.  This means that there is no point to do
this check in the two till_end_* functions.  This has as cost two
extra checks to make sure that the start_lsn is not higher than the
current LSN, but I am fine to live with that.  It seemed rather
natural to me to let ValidateInputLSNs() do a refresh of the end_lsn
if it sees that it is higher than the current LSN.  And if you look
closely, you will see that we only call *once* GetCurrentLSN() for
each function call, so the maths are more precise.

I have cleaned up the comments of the modules, while on it, as there
was not much value in copy-pasting how a function fails while there is
a centralized validation code path.  The tests for the till_end()
functions have been moved to the test path where we install 1.0.

With all these cleanups done, there is less code than at the
beginning, which comes from the docs, so the current code does not
change in size:
 7 files changed, 173 insertions(+), 206 deletions(-)
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Mon, Mar 13, 2023 at 12:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Mar 10, 2023 at 04:45:06PM +0530, Bharath Rupireddy wrote:
>
> After that comes the rest of the patch, and I have found a couple of
> mistakes.
>
> -      pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn)
> +      pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL)
>        returns setof record
> [...]
> -      pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn, per_record boolean DEFAULT false)
> +      pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL, per_record boolean DEFAULT false)
>
> This part of the documentation is now incorrect.

Oh, yeah. Thanks for fixing it.

> +-- Make sure checkpoints don't interfere with the test.
> +SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
>
> Adding a physical slot is better for stability of course, but the test
> also has to drop it or installcheck would cause an existing cluster to
> have that still around.  The third argument could be true, as well, so
> as we'd use a temporary slot.

# Disabled because these tests require "wal_level=replica", which
# some installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1

pg_walinspect can't be run under installcheck. I don't think dropping
the slot at the end is needed, it's unnecessary. I saw
oldextversions.sql using the same replication slot name, well no
problem, but I changed it to a unique name.

> Hmm.  I would simplify that, and just mention that an error is raised
> when the start LSN is available, without caring about the rest (valid
> end LSN being past the current insert LSN, and error if start > end,
> the second being obvious).

Okay.

> + <note>
> +  <para>
> +   Note that <function>pg_get_wal_records_info_till_end_of_wal</function> and
> +   <function>pg_get_wal_stats_till_end_of_wal</function> functions have been
> +   removed in the <filename>pg_walinspect</filename> version
> +   <literal>1.1</literal>. The same functionality can be achieved with
> +   <function>pg_get_wal_records_info</function> and
> +   <function>pg_get_wal_stats</function> functions by specifying a future
> +   <replaceable>end_lsn</replaceable>. However, <function>till_end_of_wal</function>
> +   functions will still work if the extension is installed explicitly with
> +   version <literal>1.0</literal>.
> +  </para>
> + </note>
>
> Not convinced that this is necessary.

As hackers we know that these functions have been removed and how to
achieve till_end_of_wal with the other functions. I noticed that
you've removed my changes (see below) from the docs that were saying
how to get info/stats till_end_of_wal. That leaves end users confused
as to how they can achieve till_end_of_wal functionality. All users
can't look for commit history/message but they can easily read the
docs. I prefer to have the following (did so in the attached v7) and
get rid of the above note if you don't feel strongly about it.

+ If a future <replaceable>end_lsn</replaceable>
+      (i.e. the LSN server doesn't know about) is specified, it returns
+      informaton till end of WAL.

> +       GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal);
> +
> +       stats_per_record = PG_GETARG_BOOL(2);
>
> This code in GetWalStats() is incorrect.
> pg_get_wal_stats_till_end_of_wal() has a stats_per_record, but as
> *second* argument, so this would be broken.

Oh, yeah. Thanks for fixing it.

> +       GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal);
>
> Coming from the last point, I think that this interface is confusing,
> and actually incorrect.  From what I can see, we should be doing what
> ~15 has by grepping the argument values within the main function
> calls, and just pass them down to the internal routines GetWalStats()
> and GetWALRecordsInfo().

Hm, what you have in v6 works for me.

> At the end, I am finishing with the attached.  ValidateInputLSNs()
> ought to be called, IMO, when the caller of the SQL functions can
> directly specify an end_lsn.  This means that there is no point to do
> this check in the two till_end_* functions.  This has as cost two
> extra checks to make sure that the start_lsn is not higher than the
> current LSN, but I am fine to live with that.  It seemed rather
> natural to me to let ValidateInputLSNs() do a refresh of the end_lsn
> if it sees that it is higher than the current LSN.  And if you look
> closely, you will see that we only call *once* GetCurrentLSN() for
> each function call, so the maths are more precise.
>
> I have cleaned up the comments of the modules, while on it, as there
> was not much value in copy-pasting how a function fails while there is
> a centralized validation code path.  The tests for the till_end()
> functions have been moved to the test path where we install 1.0.

I have some comments and fixed them in the attached v7 patch:

1.
+ * pg_get_wal_records_info
  *
+ * pg_get_wal_stats
  *
I think you wanted to be consistent with function comments with
function names atop, but missed adding for all functions. Actually, I
don't have a strong opinion on these changes as they unnecessarily
bloat the changes, so I removed them.

2.
+    if (start_lsn > curr_lsn)
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("cannot accept future start LSN"),
-                 errdetail("Last known WAL LSN on the database system
is at %X/%X.",
-                           LSN_FORMAT_ARGS(curr_lsn))));
-    }
+                 errmsg("WAL start LSN must be less than current LSN")));

I don't like this inconsistency much, especially when
pg_get_wal_record_info emits "cannot accept future input LSN" with the
current LSN details (this current LSN will give a bit more information
to the user). Also, let's be consistent across returning NULLs if
input LSN/start LSN equal to the current LSN. I've done these changes
in the attached v7 patch.

3. I wanted COUNT(*) >= 0 for successful function execution to be
COUNT(*) >= 1 so that we will check for at least the functions
returning 1 record. And failures to be SELECT * FROM. This was my
intention but I don't see that in this patch or in the previous
test-refactoring commit. I added that in the attached v7 patch again.
Also, made test comments conssitent.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Mon, Mar 13, 2023 at 03:53:37PM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 13, 2023 at 12:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>> +-- Make sure checkpoints don't interfere with the test.
>> +SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
>>
>> Adding a physical slot is better for stability of course, but the test
>> also has to drop it or installcheck would cause an existing cluster to
>> have that still around.  The third argument could be true, as well, so
>> as we'd use a temporary slot.
>
> # Disabled because these tests require "wal_level=replica", which
> # some installcheck users do not have (e.g. buildfarm clients).
> NO_INSTALLCHECK = 1
>
> pg_walinspect can't be run under installcheck. I don't think dropping
> the slot at the end is needed, it's unnecessary. I saw
> oldextversions.sql using the same replication slot name, well no
> problem, but I changed it to a unique name.

-SELECT pg_drop_replication_slot('regress_pg_walinspect_slot');
-
+-- Clean up

In my opinion, it is an incorrect practice to assume that nobody will
ever run these tests on a running instance.  FWIW, I have managed
QE/QA flows in the past that did exactly that.  I cannot say for
already-deployed clusters that could be used for production still I
don't feel comfortable with the idea to assume that nobody would do
ever that, and calls of pg_drop_replication_slot() are not a
bottleneck.  So let's be clean and drop these slots to keep the tests
self-contained.  pg_walinspect in REL_15_STABLE gets that right, IMV,
and that's no different from the role cleanup, as one example.

> As hackers we know that these functions have been removed and how to
> achieve till_end_of_wal with the other functions. I noticed that
> you've removed my changes (see below) from the docs that were saying
> how to get info/stats till_end_of_wal. That leaves end users confused
> as to how they can achieve till_end_of_wal functionality. All users
> can't look for commit history/message but they can easily read the
> docs. I prefer to have the following (did so in the attached v7) and
> get rid of the above note if you don't feel strongly about it.
>
> + If a future <replaceable>end_lsn</replaceable>
> +      (i.e. the LSN server doesn't know about) is specified, it returns
> +      informaton till end of WAL.

FWIW, I don't see a strong need for that, because this documents a
behavior where we would not fail.  And FWIW, it just feel natural to
me because the process stops the scan up to where it can.  In short,
it should be enough for the docs to mention the error patterns,
nothing else.

> I have some comments and fixed them in the attached v7 patch:
>
> 1.
> + * pg_get_wal_records_info
>   *
> + * pg_get_wal_stats
>   *
> I think you wanted to be consistent with function comments with
> function names atop, but missed adding for all functions. Actually, I
> don't have a strong opinion on these changes as they unnecessarily
> bloat the changes, so I removed them.

Either is fine if you feel strongly on this one, I am just used to
doing that.

> 2.
> +    if (start_lsn > curr_lsn)
>          ereport(ERROR,
>                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -                 errmsg("cannot accept future start LSN"),
> -                 errdetail("Last known WAL LSN on the database system
> is at %X/%X.",
> -                           LSN_FORMAT_ARGS(curr_lsn))));
> -    }
> +                 errmsg("WAL start LSN must be less than current LSN")));
>
> I don't like this inconsistency much, especially when
> pg_get_wal_record_info emits "cannot accept future input LSN" with the
> current LSN details (this current LSN will give a bit more information
> to the user). Also, let's be consistent across returning NULLs if
> input LSN/start LSN equal to the current LSN. I've done these changes
> in the attached v7 patch.

No arguments against that, consistency is good.

> 3. I wanted COUNT(*) >= 0 for successful function execution to be
> COUNT(*) >= 1 so that we will check for at least the functions
> returning 1 record. And failures to be SELECT * FROM. This was my
> intention but I don't see that in this patch or in the previous
> test-refactoring commit. I added that in the attached v7 patch again.
> Also, made test comments conssitent.

Noticed that as well, still it feels to me that these had better be
separated from the rest, and be in their own patch, perhaps *after*
the main patch discussed on this thread, or just moved into their own
threads.  If a commit finishes with a list of bullet points referring
to a list of completely different things than the subject, there may
be a problem.  In this v7, we have:
- Change the behavior of the functions for end LSNs, tweaking the
tests to do so.
- Adjust more comments and formats in the tests.
- Adjust some tests to be pickier with detection of generated WAL
records.
- Remove the drop slot calls.
But what we need to care most here is the first point.

I am not arguing that none of that should not be changed, but it
should not be inside a patch that slightly tweaks the behaviors of
some existing functions.  First, it creates a lot of noise in the
diffs, making it harder for anybody reading this change to find the
core of what's happening.  Second, it increases the odds of mistakes
and bugs (if a revert is done, the work to-be-done gets greater at the
end).  When it comes to this patch, the changes should only involve
the calls of till_end_of_wal() being moved around from
pg_walinspect.sql to oldextversions.sql.  If you look at v6, the tests
are only focusing on this part, and nothing else.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Tue, Mar 14, 2023 at 5:02 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> So let's be clean and drop these slots to keep the tests
> self-contained.  pg_walinspect in REL_15_STABLE gets that right, IMV,
> and that's no different from the role cleanup, as one example.

Hm, added replication slot drop back.

> > As hackers we know that these functions have been removed and how to
> > achieve till_end_of_wal with the other functions. I noticed that
> > you've removed my changes (see below) from the docs that were saying
> > how to get info/stats till_end_of_wal. That leaves end users confused
> > as to how they can achieve till_end_of_wal functionality. All users
> > can't look for commit history/message but they can easily read the
> > docs. I prefer to have the following (did so in the attached v7) and
> > get rid of the above note if you don't feel strongly about it.
> >
> > + If a future <replaceable>end_lsn</replaceable>
> > +      (i.e. the LSN server doesn't know about) is specified, it returns
> > +      informaton till end of WAL.
>
> FWIW, I don't see a strong need for that, because this documents a
> behavior where we would not fail.  And FWIW, it just feel natural to
> me because the process stops the scan up to where it can.  In short,
> it should be enough for the docs to mention the error patterns,
> nothing else.

My thoughts are simple here - how would one (an end user, not me and
not you) figure out how to get info/stats till the end of WAL? I'm
sure it would be difficult to find that out without looking at the
code or commit history. Search for till end of WAL behaviour with new
version will be more given the 1.0 version has explicit functions to
do that. IMO, there's no harm in being explicit in how to achieve till
end of WAL functionality around in the docs.

> > 3. I wanted COUNT(*) >= 0 for successful function execution to be
> > COUNT(*) >= 1 so that we will check for at least the functions
> > returning 1 record. And failures to be SELECT * FROM. This was my
> > intention but I don't see that in this patch or in the previous
> > test-refactoring commit. I added that in the attached v7 patch again.
> > Also, made test comments conssitent.
>
> Noticed that as well, still it feels to me that these had better be
> separated from the rest, and be in their own patch, perhaps *after*
> the main patch discussed on this thread, or just moved into their own
> threads.  If a commit finishes with a list of bullet points referring
> to a list of completely different things than the subject, there may
> be a problem.  In this v7, we have:
> - Change the behavior of the functions for end LSNs, tweaking the
> tests to do so.
> - Adjust more comments and formats in the tests.
> - Adjust some tests to be pickier with detection of generated WAL
> records.
> - Remove the drop slot calls.
> But what we need to care most here is the first point.

I get it. I divided the patches to 0001 and 0002 with 0001 focussing
on the change of behaviour around future end LSNs, dropping till end
of WAL functions and tests tweakings related to it. 0002 has all other
tests tidy up things.

Please find the attached v8 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Andres Freund
Date:
Hi,

I just rebased a patch over

commit 1f282c24e46
Author: Michael Paquier <michael@paquier.xyz>
Date:   2023-03-13 13:03:29 +0900
 
    Refactor and improve tests of pg_walinspect

and got a test failure:

https://cirrus-ci.com/task/5693041982308352

https://api.cirrus-ci.com/v1/artifact/task/5693041982308352/testrun/build/testrun/pg_walinspect/regress/regression.diffs

diff -w -U3 C:/cirrus/contrib/pg_walinspect/expected/oldextversions.out
C:/cirrus/build/testrun/pg_walinspect/regress/results/oldextversions.out
--- C:/cirrus/contrib/pg_walinspect/expected/oldextversions.out    2023-03-14 21:19:01.399716500 +0000
+++ C:/cirrus/build/testrun/pg_walinspect/regress/results/oldextversions.out    2023-03-14 21:26:27.504876700 +0000
@@ -8,10 +8,10 @@
                     Object description
 -----------------------------------------------------------
  function pg_get_wal_record_info(pg_lsn)
- function pg_get_wal_records_info(pg_lsn,pg_lsn)
  function pg_get_wal_records_info_till_end_of_wal(pg_lsn)
- function pg_get_wal_stats(pg_lsn,pg_lsn,boolean)
+ function pg_get_wal_records_info(pg_lsn,pg_lsn)
  function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean)
+ function pg_get_wal_stats(pg_lsn,pg_lsn,boolean)
 (5 rows)

 -- Make sure checkpoints don't interfere with the test.

Looks like it's missing an ORDER BY.

Greetings,

Andres Freund



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Tue, Mar 14, 2023 at 02:54:40PM -0700, Andres Freund wrote:
>                      Object description
>  -----------------------------------------------------------
>   function pg_get_wal_record_info(pg_lsn)
> - function pg_get_wal_records_info(pg_lsn,pg_lsn)
>   function pg_get_wal_records_info_till_end_of_wal(pg_lsn)
> - function pg_get_wal_stats(pg_lsn,pg_lsn,boolean)
> + function pg_get_wal_records_info(pg_lsn,pg_lsn)
>   function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean)
> + function pg_get_wal_stats(pg_lsn,pg_lsn,boolean)
>  (5 rows)
>
>  -- Make sure checkpoints don't interfere with the test.
>
> Looks like it's missing an ORDER BY.

Interesting.  This is "\dx+ pg_walinspect".
listOneExtensionContents() uses pg_describe_object() for that, and
there is already an ORDER BY based on it.  I would not have expected
this part to be that much sensitive.  Is this using a specific ICU
collation, because this is a side-effect of switching ICU as the
default in initdb?

As a solution, this could use pg_identify_object(classid, objid, 0) in
the ORDER BY clause to enforce a better ordering of the objects dealt
with as it decomposes the object name and the object type.  That
should be enough, I assume, as it looks to be parenthesis vs
underscore that switch the order.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Tue, Mar 14, 2023 at 10:35:43AM +0530, Bharath Rupireddy wrote:
> My thoughts are simple here - how would one (an end user, not me and
> not you) figure out how to get info/stats till the end of WAL? I'm
> sure it would be difficult to find that out without looking at the
> code or commit history. Search for till end of WAL behaviour with new
> version will be more given the 1.0 version has explicit functions to
> do that. IMO, there's no harm in being explicit in how to achieve till
> end of WAL functionality around in the docs.

Okay.  I have kept these notes, but tweaked the wording to be a bit
cleaner, replacing the term "till" by "until".  To my surprise while
studying this point, "till" is a term older than "until" in English
literacy, but it is rarely used.

> I get it. I divided the patches to 0001 and 0002 with 0001 focussing
> on the change of behaviour around future end LSNs, dropping till end
> of WAL functions and tests tweakings related to it. 0002 has all other
> tests tidy up things.
>
> Please find the attached v8 patch set for further review.

The tests of 0001 were still too complex IMO.  The changes can be much
simpler as it requires only to move the till_end_of_wal() calls from
pg_walinspect.sql to oldextversions.sql.  Nothing more.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Andres Freund
Date:
Hi,

On 2023-03-15 09:56:10 +0900, Michael Paquier wrote:
> On Tue, Mar 14, 2023 at 02:54:40PM -0700, Andres Freund wrote:
> >                      Object description
> >  -----------------------------------------------------------
> >   function pg_get_wal_record_info(pg_lsn)
> > - function pg_get_wal_records_info(pg_lsn,pg_lsn)
> >   function pg_get_wal_records_info_till_end_of_wal(pg_lsn)
> > - function pg_get_wal_stats(pg_lsn,pg_lsn,boolean)
> > + function pg_get_wal_records_info(pg_lsn,pg_lsn)
> >   function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean)
> > + function pg_get_wal_stats(pg_lsn,pg_lsn,boolean)
> >  (5 rows)
> > 
> >  -- Make sure checkpoints don't interfere with the test.
> > 
> > Looks like it's missing an ORDER BY.
> 
> Interesting.  This is "\dx+ pg_walinspect".
> listOneExtensionContents() uses pg_describe_object() for that, and
> there is already an ORDER BY based on it.  I would not have expected
> this part to be that much sensitive.  Is this using a specific ICU
> collation, because this is a side-effect of switching ICU as the
> default in initdb?

It's using ICU, but not a specific collation. The build I linked to is WIP
hackery to add ICU support to windows CI. Here's the initdb output:
https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log

The database cluster will be initialized with this locale configuration:
  provider:    icu
  ICU locale:  en_US
  LC_COLLATE:  English_United States.1252
  LC_CTYPE:    English_United States.1252
  LC_MESSAGES: English_United States.1252
  LC_MONETARY: English_United States.1252
  LC_NUMERIC:  English_United States.1252
  LC_TIME:     English_United States.1252
The default database encoding has accordingly been set to "WIN1252".
The default text search configuration will be set to "english".

For comparison, here's a recent CI run (which also failed on windows, but for
unrelated reasons), without ICU:
https://api.cirrus-ci.com/v1/artifact/task/6478925920993280/testrun/build/testrun/pg_walinspect/regress/log/initdb.log

The database cluster will be initialized with locale "English_United States.1252".
The default database encoding has accordingly been set to "WIN1252".
The default text search configuration will be set to "english".


> As a solution, this could use pg_identify_object(classid, objid, 0) in
> the ORDER BY clause to enforce a better ordering of the objects dealt
> with as it decomposes the object name and the object type.  That
> should be enough, I assume, as it looks to be parenthesis vs
> underscore that switch the order.


Greetings,

Andres Freund



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Tue, Mar 14, 2023 at 07:05:20PM -0700, Andres Freund wrote:
> It's using ICU, but not a specific collation. The build I linked to is WIP
> hackery to add ICU support to windows CI. Here's the initdb output:
>
https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log

Hmm.  Thanks.  At the end, I think that I would be tempted to just
remove this \dx query and move on.  I did not anticipate that this
ordering would be that much sensitive, and the solution of using a
COLLATE C at the end of a describe.c query does not sound much
appealing, either.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Wed, Mar 15, 2023 at 12:27 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Mar 14, 2023 at 07:05:20PM -0700, Andres Freund wrote:
> > It's using ICU, but not a specific collation. The build I linked to is WIP
> > hackery to add ICU support to windows CI. Here's the initdb output:
> >
https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log
>
> Hmm.  Thanks.  At the end, I think that I would be tempted to just
> remove this \dx query and move on.  I did not anticipate that this
> ordering would be that much sensitive, and the solution of using a
> COLLATE C at the end of a describe.c query does not sound much
> appealing, either.

-1 for removing \dx+ for pg_walinspect version 1.0, because we wanted
to show the diff of functions along with testing the upgrade path in
the oldextversions.sql. Therefore, I prefer something like [1]:

[1]
diff --git a/contrib/pg_walinspect/sql/oldextversions.sql
b/contrib/pg_walinspect/sql/oldextversions.sql
index 258a009888..32a059c72d 100644
--- a/contrib/pg_walinspect/sql/oldextversions.sql
+++ b/contrib/pg_walinspect/sql/oldextversions.sql
@@ -5,8 +5,17 @@ CREATE EXTENSION pg_walinspect WITH VERSION '1.0';
 -- Mask DETAIL messages as these could refer to current LSN positions.
 \set VERBOSITY terse

+-- \dx+ will give locale-sensitive results, so we can't use it here.
+CREATE VIEW list_pg_walinspect_objects AS
+    SELECT pg_describe_object(classid, objid, 0) AS "Object description"
+          FROM pg_depend
+          WHERE refclassid = 'pg_extension'::regclass AND
+                    refobjid = (SELECT oid FROM pg_extension WHERE
extname = 'pg_walinspect') AND
+                        deptype = 'e'
+          ORDER BY pg_describe_object(classid, objid, 0) COLLATE "C";
+
 -- List what version 1.0 contains
-\dx+ pg_walinspect
+SELECT * FROM list_pg_walinspect_objects;

 -- Make sure checkpoints don't interfere with the test.
 SELECT 'init' FROM
pg_create_physical_replication_slot('regress_pg_walinspect_slot',
true, false);
@@ -25,7 +34,7 @@ SELECT COUNT(*) >= 1 AS ok FROM
pg_get_wal_stats_till_end_of_wal('FFFFFFFF/FFFFF
 ALTER EXTENSION pg_walinspect UPDATE TO '1.1';

 -- List what version 1.1 contains
-\dx+ pg_walinspect
+SELECT * FROM list_pg_walinspect_objects;

 SELECT pg_drop_replication_slot('regress_pg_walinspect_slot');

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Wed, Mar 15, 2023 at 12:40:17PM +0530, Bharath Rupireddy wrote:
> -1 for removing \dx+ for pg_walinspect version 1.0, because we wanted
> to show the diff of functions along with testing the upgrade path in
> the oldextversions.sql. Therefore, I prefer something like [1]:

This is a duplicate of what describe.c uses, with a COLLATE clause.
The main goal was to have a simple check, so I'd still stand by the
simplest choice and move on.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Wed, Mar 15, 2023 at 06:50:01PM +0900, Michael Paquier wrote:
> This is a duplicate of what describe.c uses, with a COLLATE clause.
> The main goal was to have a simple check, so I'd still stand by the
> simplest choice and move on.

Please note that I have done something about that with e643a31 by
replacing the problematic \dx with a SELECT query, but left the second
one as it should not be a problem.
--
Michael

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Bharath Rupireddy
Date:
On Thu, Mar 16, 2023 at 12:48 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Mar 15, 2023 at 06:50:01PM +0900, Michael Paquier wrote:
> > This is a duplicate of what describe.c uses, with a COLLATE clause.
> > The main goal was to have a simple check, so I'd still stand by the
> > simplest choice and move on.
>
> Please note that I have done something about that with e643a31 by
> replacing the problematic \dx with a SELECT query, but left the second
> one as it should not be a problem.

Thanks.

FWIW, I rebased the tests tweaking patch and attached it here as v9.
This should keep the pg_walinspect tests consistent across comments,
spaces, new lines and using count(*) >= 1 for all successful function
executions. Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Combine pg_walinspect till_end_of_wal functions with others

From
Michael Paquier
Date:
On Thu, Mar 16, 2023 at 01:17:59PM +0530, Bharath Rupireddy wrote:
> FWIW, I rebased the tests tweaking patch and attached it here as v9.
> This should keep the pg_walinspect tests consistent across comments,
> spaces, new lines and using count(*) >= 1 for all successful function
> executions. Thoughts?

Mostly OK by me, so applied after tweaking a few tiny things.  The
rewrites of the queries where we should have more than one record and
the removal of count() for the failure cases have been kept as
proposed, as are most of the comments.
--
Michael

Attachment