Thread: Combine pg_walinspect till_end_of_wal functions with others
Hi, In a recent discussion [1], Michael Paquier asked if we can combine pg_walinspect till_end_of_wal functions with other functions pg_get_wal_records_info and pg_get_wal_stats. The code currently looks much duplicated and the number of functions that pg_walinspect exposes to the users is bloated. The point was that the till_end_of_wal functions determine the end LSN and everything else that they do is the same as their counterpart functions. Well, the idea then was to keep things simple, not clutter the APIs, have better and consistent user-inputted end_lsn validations at the cost of usability and code redundancy. However, now I tend to agree with the feedback received. I'm attaching a patch doing the $subject with the following behavior: 1. If start_lsn is NULL, error out/return NULL. 2. If end_lsn isn't specified, default to NULL, then determine the end_lsn. 3. If end_lsn is specified as NULL, then determine the end_lsn. 4. If end_lsn is specified as non-NULL, then determine if it is greater than start_lsn if yes, go ahead do the job, otherwise error out. Another idea is to convert till_end_of_wal flavors to SQL-only functions and remove the c code from pg_walinspect.c. However, I prefer $subject and completely remove till_end_of_wal flavors for better usability in the long term. Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACV-WBN%3DEUgUPyYOGitp%2Brn163vMnQd%3DHcWrnKt-uqFYFA%40mail.gmail.com -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Mar 1, 2023 at 1:00 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Hi, > > In a recent discussion [1], Michael Paquier asked if we can combine > pg_walinspect till_end_of_wal functions with other functions > pg_get_wal_records_info and pg_get_wal_stats. The code currently looks > much duplicated and the number of functions that pg_walinspect exposes > to the users is bloated. The point was that the till_end_of_wal > functions determine the end LSN and everything else that they do is > the same as their counterpart functions. Well, the idea then was to > keep things simple, not clutter the APIs, have better and consistent > user-inputted end_lsn validations at the cost of usability and code > redundancy. However, now I tend to agree with the feedback received. > > I'm attaching a patch doing the $subject with the following behavior: > 1. If start_lsn is NULL, error out/return NULL. > 2. If end_lsn isn't specified, default to NULL, then determine the end_lsn. > 3. If end_lsn is specified as NULL, then determine the end_lsn. > 4. If end_lsn is specified as non-NULL, then determine if it is > greater than start_lsn if yes, go ahead do the job, otherwise error > out. > > Another idea is to convert till_end_of_wal flavors to SQL-only > functions and remove the c code from pg_walinspect.c. However, I > prefer $subject and completely remove till_end_of_wal flavors for > better usability in the long term. > > Thoughts? > > [1] https://www.postgresql.org/message-id/CALj2ACV-WBN%3DEUgUPyYOGitp%2Brn163vMnQd%3DHcWrnKt-uqFYFA%40mail.gmail.com Needed a rebase due to 019f8624664dbf1e25e2bd721c7e99822812d109. Attaching v2 patch. Sorry for the noise. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Mar 01, 2023 at 08:30:00PM +0530, Bharath Rupireddy wrote: > On Wed, Mar 1, 2023 at 1:00 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > In a recent discussion [1], Michael Paquier asked if we can combine > > pg_walinspect till_end_of_wal functions with other functions > > pg_get_wal_records_info and pg_get_wal_stats. The code currently looks > > much duplicated and the number of functions that pg_walinspect exposes > > to the users is bloated. The point was that the till_end_of_wal > > functions determine the end LSN and everything else that they do is > > the same as their counterpart functions. Well, the idea then was to > > keep things simple, not clutter the APIs, have better and consistent > > user-inputted end_lsn validations at the cost of usability and code > > redundancy. However, now I tend to agree with the feedback received. +1, especially since I really don't like the use of "till" in the function names. > > I'm attaching a patch doing the $subject with the following behavior: > > 1. If start_lsn is NULL, error out/return NULL. Maybe naive and unrelated question, but is that really helpful? If for some reason I want to see information about *all available WAL*, I have to manually dig for a suitable LSN. The same action with pg_waldump is easier as I just need to use the oldest available WAL that's present on disk. > > Another idea is to convert till_end_of_wal flavors to SQL-only > > functions and remove the c code from pg_walinspect.c. However, I > > prefer $subject and completely remove till_end_of_wal flavors for > > better usability in the long term. I agree that using default arguments is a way better API. Nitpicking: Maybe we could group the kept unused exported C function at the end of the file? Also: /* - * Get info and data of all WAL records from start LSN till end of WAL. + * NB: This function does nothing and stays here for backward compatibility. + * Without it, the extension fails to install. * - * This function emits an error if a future start i.e. WAL LSN the database - * system doesn't know about is specified. + * Try using pg_get_wal_records_info() for the same till_end_of_wal + * functionaility. */ Datum pg_get_wal_records_info_till_end_of_wal(PG_FUNCTION_ARGS) { - XLogRecPtr start_lsn; - XLogRecPtr end_lsn = InvalidXLogRecPtr; - - start_lsn = PG_GETARG_LSN(0); - - end_lsn = ValidateInputLSNs(true, start_lsn, end_lsn); - - GetWALRecordsInfo(fcinfo, start_lsn, end_lsn); - - PG_RETURN_VOID(); + PG_RETURN_NULL(); } I don't like much this chunk (same for the other kept function). Apart from the obvious typo in "functionaility", I don't think that the comment is really accurate. Also, are we actually helping users if we simply return NULL there? It's quite possible that people will start to use the new shared lib while still having the 1.1 SQL definition of the extension installed. In that case, they will simply retrieve a NULL row and may spend some time wondering why until they eventually realize that their only option is to upgrade the extension first and then use another function. Why not make their life easier and explicity raise a suitable error at the SQL level if users try to use those functions?
On Mon, Mar 6, 2023 at 2:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > I'm attaching a patch doing the $subject with the following behavior: > > > 1. If start_lsn is NULL, error out/return NULL. > > Maybe naive and unrelated question, but is that really helpful? If for some > reason I want to see information about *all available WAL*, I have to manually > dig for a suitable LSN. The same action with pg_waldump is easier as I just > need to use the oldest available WAL that's present on disk. Are you saying that the pg_walinspect functions should figure out the oldest available WAL file and LSN, and start from there if start_lsn specified as NULL or invalid? Note that pg_waldump requires either explicit startlsn and/or startseg (WAL file name), it can't search for the oldest WAL file available and start from there automatically. If the user wants to figure it out, they can do something like below: ostgres=# select * from pg_ls_waldir() order by name; name | size | modification --------------------------+----------+------------------------ 000000010000000000000001 | 16777216 | 2023-03-06 14:54:55+00 000000010000000000000002 | 16777216 | 2023-03-06 14:54:55+00 If we try to make these functions figure out the oldest WAl file and start from there, then it'll unnecessarily complicate the APIs and functions. If we still think we need a better function for the users to figure out the oldest WAL file, perhaps, add a SQL-only view/function to pg_walinspect that returns "select name from pg_ls_waldir() order by name limit 1;", but honestly, that's so trivial. > > > Another idea is to convert till_end_of_wal flavors to SQL-only > > > functions and remove the c code from pg_walinspect.c. However, I > > > prefer $subject and completely remove till_end_of_wal flavors for > > > better usability in the long term. > > I agree that using default arguments is a way better API. Thanks. Yes, that's true. > Nitpicking: > > Maybe we could group the kept unused exported C function at the end of the > file? Will do. > Also: > > /* > - * Get info and data of all WAL records from start LSN till end of WAL. > + * NB: This function does nothing and stays here for backward compatibility. > + * Without it, the extension fails to install. > * > - * This function emits an error if a future start i.e. WAL LSN the database > - * system doesn't know about is specified. > + * Try using pg_get_wal_records_info() for the same till_end_of_wal > + * functionaility. > > I don't like much this chunk (same for the other kept function). Apart from > the obvious typo in "functionaility", I don't think that the comment is really > accurate. Can you be more specific what's inaccurate about the comment? > Also, are we actually helping users if we simply return NULL there? It's quite > possible that people will start to use the new shared lib while still having > the 1.1 SQL definition of the extension installed. In that case, they will > simply retrieve a NULL row and may spend some time wondering why until they > eventually realize that their only option is to upgrade the extension first and > then use another function. Why not make their life easier and explicity raise > a suitable error at the SQL level if users try to use those functions? I thought about it initially, but wanted to avoid more errors. An error would make them use the new version easily. I will change it that way. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, 6 Mar 2023 at 16:06, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > If we try to make these functions figure out the oldest WAl file and > start from there, then it'll unnecessarily complicate the APIs and > functions. If we still think we need a better function for the users > to figure out the oldest WAL file, perhaps, add a SQL-only > view/function to pg_walinspect that returns "select name from > pg_ls_waldir() order by name limit 1;", but honestly, that's so > trivial. That "order by name limit 1" has subtle bugs when you're working on a system that has experienced timeline switches. It is entirely possible that the first file (as sorted by the default collation) is not the first record you can inspect, or even in your timeline's history. Kind regards, Matthias van de Meent
On Mon, Mar 6, 2023 at 8:52 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Mon, 6 Mar 2023 at 16:06, Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > If we try to make these functions figure out the oldest WAl file and > > start from there, then it'll unnecessarily complicate the APIs and > > functions. If we still think we need a better function for the users > > to figure out the oldest WAL file, perhaps, add a SQL-only > > view/function to pg_walinspect that returns "select name from > > pg_ls_waldir() order by name limit 1;", but honestly, that's so > > trivial. > > That "order by name limit 1" has subtle bugs when you're working on a > system that has experienced timeline switches. It is entirely possible > that the first file (as sorted by the default collation) is not the > first record you can inspect, or even in your timeline's history. Hm. Note that pg_walinspect currently searches WAL on insertion timeline; it doesn't care about the older timelines. The idea of making it look at WAL on an older timeline was discussed, but for the sake of simplicity we kept the functions simple. If needed, I can try adding the timeline as input parameters to all the functions (with default -1 meaning current insertion timeline; if specified, look for WAL on that timeline). Are you saying that a pg_walinspect function that traverses the pg_wal directory and figures out the old valid WAL on a given timeline is still useful? Or make the functions look for older WAL if start_lsn is given as NULL or invalid? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, 6 Mar 2023 at 16:37, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Mar 6, 2023 at 8:52 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Mon, 6 Mar 2023 at 16:06, Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > If we try to make these functions figure out the oldest WAl file and > > > start from there, then it'll unnecessarily complicate the APIs and > > > functions. If we still think we need a better function for the users > > > to figure out the oldest WAL file, perhaps, add a SQL-only > > > view/function to pg_walinspect that returns "select name from > > > pg_ls_waldir() order by name limit 1;", but honestly, that's so > > > trivial. > > > > That "order by name limit 1" has subtle bugs when you're working on a > > system that has experienced timeline switches. It is entirely possible > > that the first file (as sorted by the default collation) is not the > > first record you can inspect, or even in your timeline's history. > > Hm. Note that pg_walinspect currently searches WAL on insertion > timeline; it doesn't care about the older timelines. The idea of > making it look at WAL on an older timeline was discussed, but for the > sake of simplicity we kept the functions simple. If needed, I can try > adding the timeline as input parameters to all the functions (with > default -1 meaning current insertion timeline; if specified, look for > WAL on that timeline). > > Are you saying that a pg_walinspect function that traverses the pg_wal > directory and figures out the old valid WAL on a given timeline is > still useful? Or make the functions look for older WAL if start_lsn is > given as NULL or invalid? The specific comment I made was only regarding the following issue: An instance may still have WAL segments from before the latest timeline switch. These segments may have a higher LSN and lower timeline number than your current running timeline+LSN (because of e.g. pg_rewind). This will then result in unwanted behaviour when you sort the segments numerically/alphabetically and then assume that the first file's LSN is valid (or available) in your current timeline. That is why "order by name limit 1" isn't a good solution, and that's what I was commenting on: you need to parse the timeline hierarchy to determine which timelines you can use which WAL segments of. To answer your question on whether I'd like us to traverse timeline switches: Yes, I'd really like it if we were able to decode the current timeline's hierarchical WAL of a PG instance in one go, from the start at (iirc) 0x10000 all the way to the current LSN, assuming the segments are available. Kind regards, Matthias van de Meent
On Mon, Mar 06, 2023 at 08:36:17PM +0530, Bharath Rupireddy wrote: > On Mon, Mar 6, 2023 at 2:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > Also: > > > > /* > > - * Get info and data of all WAL records from start LSN till end of WAL. > > + * NB: This function does nothing and stays here for backward compatibility. > > + * Without it, the extension fails to install. > > * > > - * This function emits an error if a future start i.e. WAL LSN the database > > - * system doesn't know about is specified. > > + * Try using pg_get_wal_records_info() for the same till_end_of_wal > > + * functionaility. > > > > I don't like much this chunk (same for the other kept function). Apart from > > the obvious typo in "functionaility", I don't think that the comment is really > > accurate. > > Can you be more specific what's inaccurate about the comment? It's problematic to install the extension if we rely on upgrade scripts only. We could also provide a pg_walinspect--1.2.sql file and it would just work, and that may have been a good idea if there wasn't also the problem of people still having the version 1.1 locally installed, as we don't want them to see random failures like "could not find function ... in file ...", or keeping the ability to install the former 1.1 version (with those functions bypassed).
On Tue, Mar 07, 2023 at 09:13:46AM +0800, Julien Rouhaud wrote: > It's problematic to install the extension if we rely on upgrade scripts only. > We could also provide a pg_walinspect--1.2.sql file and it would just work, and > that may have been a good idea if there wasn't also the problem of people still > having the version 1.1 locally installed, as we don't want them to see random > failures like "could not find function ... in file ...", or keeping the ability > to install the former 1.1 version (with those functions bypassed). Why would we need a 1.2? HEAD is the only branch with pg_walinspect 1.1, and it has not been released yet. -- Michael
Attachment
On Tue, 7 Mar 2023, 12:36 Michael Paquier, <michael@paquier.xyz> wrote:
On Tue, Mar 07, 2023 at 09:13:46AM +0800, Julien Rouhaud wrote:
> It's problematic to install the extension if we rely on upgrade scripts only.
> We could also provide a pg_walinspect--1.2.sql file and it would just work, and
> that may have been a good idea if there wasn't also the problem of people still
> having the version 1.1 locally installed, as we don't want them to see random
> failures like "could not find function ... in file ...", or keeping the ability
> to install the former 1.1 version (with those functions bypassed).
Why would we need a 1.2? HEAD is the only branch with pg_walinspect
1.1, and it has not been released yet.
ah right I should have checked. but the same ABI compatibility concern still exists for version 1.0 of the extension.
On Tue, Mar 07, 2023 at 12:42:20PM +0800, Julien Rouhaud wrote: > ah right I should have checked. but the same ABI compatibility concern > still exists for version 1.0 of the extension. Yes, we'd better make sure that the past code is able to run, at least. Now I am not really convinced that we have the need to enforce an error with the new code even if 1.0 is still installed, so as it is possible to remove all the traces of the code that triggers errors if an end LSN is higher than the current insert LSN for primaries or replayed LSN for standbys. -- Michael
Attachment
On Tue, Mar 07, 2023 at 01:56:24PM +0900, Michael Paquier wrote: > On Tue, Mar 07, 2023 at 12:42:20PM +0800, Julien Rouhaud wrote: > > ah right I should have checked. but the same ABI compatibility concern > > still exists for version 1.0 of the extension. > > Yes, we'd better make sure that the past code is able to run, at > least. Now I am not really convinced that we have the need to enforce > an error with the new code even if 1.0 is still installed, So keep this "deprecated" C function working, as it would only be a few lines of code? > so as it is > possible to remove all the traces of the code that triggers errors if > an end LSN is higher than the current insert LSN for primaries or > replayed LSN for standbys. +1 for that
On Tue, Mar 7, 2023 at 11:17 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Mar 07, 2023 at 01:56:24PM +0900, Michael Paquier wrote: > > On Tue, Mar 07, 2023 at 12:42:20PM +0800, Julien Rouhaud wrote: > > > ah right I should have checked. but the same ABI compatibility concern > > > still exists for version 1.0 of the extension. > > > > Yes, we'd better make sure that the past code is able to run, at > > least. Now I am not really convinced that we have the need to enforce > > an error with the new code even if 1.0 is still installed, > > So keep this "deprecated" C function working, as it would only be a few lines > of code? > > > so as it is > > possible to remove all the traces of the code that triggers errors if > > an end LSN is higher than the current insert LSN for primaries or > > replayed LSN for standbys. > > +1 for that I understand that we want to keep till_end_of_wal functions defined around in .c file so that if someone does CREATE EXTENSION pg_walinspect WITH VERSION '1.0'; on the latest extension shared library (with 1.1 version), the till_end_of_wal functions should work for them. Also, I noticed that there's some improvement needed for the input validations, especially for the end_lsn. Here I'm with the v3 patch addressing the above comments. Please review it further. 1. When start_lsn is NULL or invalid ('0/0'), emit an error. There was a comment on the functions automatically determining start_lsn to be the oldest WAL LSN. I'm not implementing this change now, as it requires extra work to traverse the pg_wal directory. I'm planning to do it in the next set of improvements where I'm planning to make the functions timeline-aware, introduce functions for inspecting wal_buffers and so on. 2. When end_lsn is NULL or invalid ('0/0') IOW end_lsn is not specified, deduce end_lsn to be the current flush LSN when not in recovery, current replayed LSN when in recovery. This is the main change that avoids till_end_of_wal functions in version 1.1. 3. When end_lsn is specified but greater than or equal to the start_lsn, return NULL. Given the above review comments on more errors being reported, I chose to return NULL for better usability. 4. When end_lsn is specified but less than the start_lsn, get info/stats up until end_lsn. 5. Retained pg_get_wal_records_info_till_end_of_wal and pg_get_wal_stats_till_end_of_wal for backward compatibility. 6. Piggybacked these functions and behaviour under the new HEAD-only extension version 1.1 introduced recently, instead of bumping to 1.2. When PG16 is out, users will have 1.1 with all of these new functionality. 7. Added tests to verify the extension update path in oldextversions.sql similar to other extensions'. (suggested by Michael Paquier). 8. Added a note in the pg_walinspect documentation about removal of pg_get_wal_records_info_till_end_of_wal and pg_get_wal_stats_till_end_of_wal in version 1.1 and how the other functions can be used to achieve the same functionality and how these till_end_of_wal functions can work if extension is installed explicitly with version 1.0. 9. Refactored the tests according to the new behaviours. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Mar 07, 2023 at 01:47:01PM +0800, Julien Rouhaud wrote: > So keep this "deprecated" C function working, as it would only be a few lines > of code? Yes, I guess that this would be the final picture, moving forward I'd like to think that we should just remove the SQL declaration of the till_end_of_wal() functions to keep a clean interface. -- Michael
Attachment
On Wed, Mar 08, 2023 at 01:40:46PM +0530, Bharath Rupireddy wrote: > 1. When start_lsn is NULL or invalid ('0/0'), emit an error. There was > a comment on the functions automatically determining start_lsn to be > the oldest WAL LSN. I'm not implementing this change now, as it > requires extra work to traverse the pg_wal directory. I'm planning to > do it in the next set of improvements where I'm planning to make the > functions timeline-aware, introduce functions for inspecting > wal_buffers and so on. > > [.. long description ..] > > 9. Refactored the tests according to the new behaviours. Hmm. I think this patch ought to have a result simpler than what's proposed here. First, do we really have to begin marking the functions as non-STRICT to abide with the treatment of NULL as a special value? The part that I've found personally the most annoying with these functions is that an incorrect bound leads to a random failure, particularly when such queries are used for monitoring. I would simplify the whole with two simple rules, as of: - Keeping all the functions strict. - When end_lsn is a LSN in the future of the current LSN inserted or replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or GetFlushRecPtr(). This way, monitoring tools can use a value ahead, at will. - Failing if start_lsn > end_lsn. - Failing if start_lsn refers to a position older than what exists is still fine by me. I would also choose to remove pg_get_wal_records_info_till_end_of_wal() from the SQL interface in 1.1 to limit the confusion arount it, but keep a few lines of code so as we are still able to use it when pg_walinspect 1.0 is the version enabled with CREATE EXTENSION. In short, pg_get_wal_records_info_till_end_of_wal() should be able to use exactly the same code as pg_get_wal_records_info(), still you need to keep *two* functions for their prosrc with PG_FUNCTION_ARGS as arguments so as 1.0 would work when dropped in place. The result, it seems to me, mostly comes to simplify ValidateInputLSNs() and remove its till_end_of_wal argument. +-- Removed function +SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_of_wal'::regproc); +ERROR: function "pg_get_wal_records_info_till_end_of_wal" does not exist +LINE 1: SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_... It seems to me that you should just replace all that and anything depending on pg_get_functiondef() with a \dx+ pg_walinspect, that would list all the objects part of the extension for the specific version you want to test. Not sure that there is a need to list the full function definitions, either. That just bloats the tests. I think, however, that it is critical to test in oldextversions.out the *executions* of the functions, so as we make sure that they don't crash. The patch is missing that. +-- Invalid input LSNs +SELECT * FROM pg_get_wal_record_info('0/0'); -- ERROR +ERROR: invalid input LSN -- Michael
Attachment
On Wed, Mar 8, 2023 at 1:40 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Mar 7, 2023 at 11:17 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Here I'm with the v3 patch addressing the above comments. Please > review it further. Needed a rebase. v4 patch is attached. I'll address the latest review comments in a bit. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Mar 10, 2023 at 12:24 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Mar 08, 2023 at 01:40:46PM +0530, Bharath Rupireddy wrote: > > 1. When start_lsn is NULL or invalid ('0/0'), emit an error. There was > > a comment on the functions automatically determining start_lsn to be > > the oldest WAL LSN. I'm not implementing this change now, as it > > requires extra work to traverse the pg_wal directory. I'm planning to > > do it in the next set of improvements where I'm planning to make the > > functions timeline-aware, introduce functions for inspecting > > wal_buffers and so on. > > > > [.. long description ..] > > > > 9. Refactored the tests according to the new behaviours. > > Hmm. I think this patch ought to have a result simpler than what's > proposed here. > > First, do we really have to begin marking the functions as non-STRICT > to abide with the treatment of NULL as a special value? The part that > I've found personally the most annoying with these functions is that > an incorrect bound leads to a random failure, particularly when such > queries are used for monitoring. As long as we provide a sensible default value (so I guess '0/0' to mean "no upper bound") and that we therefore don't have to manually specify an upper bound if we don't want one I'm fine with keeping the functions marked as STRICT. > I would simplify the whole with two > simple rules, as of: > - Keeping all the functions strict. > - When end_lsn is a LSN in the future of the current LSN inserted or > replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or > GetFlushRecPtr(). This way, monitoring tools can use a value ahead, > at will. > - Failing if start_lsn > end_lsn. > - Failing if start_lsn refers to a position older than what exists is > still fine by me. +1 > I would also choose to remove > pg_get_wal_records_info_till_end_of_wal() from the SQL interface in > 1.1 to limit the confusion arount it, but keep a few lines of code so > as we are still able to use it when pg_walinspect 1.0 is the version > enabled with CREATE EXTENSION. Yeah the SQL function should be removed no matter what.
On Fri, Mar 10, 2023 at 04:04:15PM +0800, Julien Rouhaud wrote: > As long as we provide a sensible default value (so I guess '0/0' to > mean "no upper bound") and that we therefore don't have to manually > specify an upper bound if we don't want one I'm fine with keeping the > functions marked as STRICT. FWIW, using also InvalidXLogRecPtr as a shortcut to say "Don't fail, just do the job" is fine by me. Something like a FFF/FFFFFFFF should just mean the same on a fresh cluster, still it gets risky the longer the WAL is generated. -- Michael
Attachment
On Fri, 10 Mar 2023, 16:14 Michael Paquier, <michael@paquier.xyz> wrote:
On Fri, Mar 10, 2023 at 04:04:15PM +0800, Julien Rouhaud wrote:
> As long as we provide a sensible default value (so I guess '0/0' to
> mean "no upper bound") and that we therefore don't have to manually
> specify an upper bound if we don't want one I'm fine with keeping the
> functions marked as STRICT.
FWIW, using also InvalidXLogRecPtr as a shortcut to say "Don't fail,
just do the job" is fine by me.
isn't '0/0' the same as InvalidXLogRecPtr? but my point is that we shouldn't require to spell it explicitly, just rely on the default value.
Something like a FFF/FFFFFFFF should
just mean the same on a fresh cluster, still it gets risky the longer
the WAL is generated.
yeah, it would be handy to accept 'infinity' in that context.
On Fri, Mar 10, 2023 at 04:37:23PM +0800, Julien Rouhaud wrote: > isn't '0/0' the same as InvalidXLogRecPtr? but my point is that we > shouldn't require to spell it explicitly, just rely on the default value. Perhaps. Still the addition of a DEFAULT to the function definitions and its value looks like a second patch to me. The first should just lift the bound restrictions currently in place while cleaning up the till_* functions. -- Michael
Attachment
On Fri, Mar 10, 2023 at 9:54 AM Michael Paquier <michael@paquier.xyz> wrote: > > Hmm. I think this patch ought to have a result simpler than what's > proposed here. > > First, do we really have to begin marking the functions as non-STRICT > to abide with the treatment of NULL as a special value? The part that > I've found personally the most annoying with these functions is that > an incorrect bound leads to a random failure, particularly when such > queries are used for monitoring. I would simplify the whole with two > simple rules, as of: > - Keeping all the functions strict. > - When end_lsn is a LSN in the future of the current LSN inserted or > replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or > GetFlushRecPtr(). This way, monitoring tools can use a value ahead, > at will. > - Failing if start_lsn > end_lsn. > - Failing if start_lsn refers to a position older than what exists is > still fine by me. Done this way in the attached v5 patch. > I would also choose to remove > pg_get_wal_records_info_till_end_of_wal() from the SQL interface in > 1.1 to limit the confusion arount it, but keep a few lines of code so > as we are still able to use it when pg_walinspect 1.0 is the version > enabled with CREATE EXTENSION. > > In short, pg_get_wal_records_info_till_end_of_wal() should be able to > use exactly the same code as pg_get_wal_records_info(), still you need > to keep *two* functions for their prosrc with PG_FUNCTION_ARGS as > arguments so as 1.0 would work when dropped in place. The result, it > seems to me, mostly comes to simplify ValidateInputLSNs() and remove > its till_end_of_wal argument. This has already been taken care of in the previous patches, e.g. v3 and v4 and so in the latest v5 patch. > +-- Removed function > +SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_of_wal'::regproc); > +ERROR: function "pg_get_wal_records_info_till_end_of_wal" does not exist > +LINE 1: SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_... > > It seems to me that you should just replace all that and anything > depending on pg_get_functiondef() with a \dx+ pg_walinspect, that > would list all the objects part of the extension for the specific > version you want to test. Not sure that there is a need to list the > full function definitions, either. That just bloats the tests. Agreed and used \dx+. One can anyways look at the function definitions and compare for knowing what's changed. > I think, however, that it is critical to test in oldextversions.out > the *executions* of the functions, so as we make sure that they don't > crash. The patch is missing that. You mean, we need to test the till_end_of_wal functions that were removed in the latest version 1.1 but they must work if the extension is installed with 1.0? If yes, I now added them. > +-- Invalid input LSNs > +SELECT * FROM pg_get_wal_record_info('0/0'); -- ERROR > +ERROR: invalid input LSN Removed InvalidRecPtr checks for input/start LSN because anyways the functions will fail with ERROR: could not read WAL at LSN 0/0. Any comments on the attached v5 patch? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Mar 10, 2023 at 04:45:06PM +0530, Bharath Rupireddy wrote: > Any comments on the attached v5 patch? I have reviewed the patch, and found it pretty messy. The tests should have been divided into their own patch, I think. This is rather straight-forward once the six functions have their checks grouped together. The result was pretty good, so I have begun by applying that as 1f282c2. This also includes most of the refinements you have proposed for the whitespaces in the tests. Note that we were missing a few spots with the bound checks for the six functions, so now the coverage should be full. After that comes the rest of the patch, and I have found a couple of mistakes. - pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) + pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL) returns setof record [...] - pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn, per_record boolean DEFAULT false) + pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL, per_record boolean DEFAULT false) This part of the documentation is now incorrect. +-- Make sure checkpoints don't interfere with the test. +SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false); Adding a physical slot is better for stability of course, but the test also has to drop it or installcheck would cause an existing cluster to have that still around. The third argument could be true, as well, so as we'd use a temporary slot. - If <replaceable>start_lsn</replaceable> - or <replaceable>end_lsn</replaceable> are not yet available, the - function will raise an error. For example: + If a future <replaceable>end_lsn</replaceable> (i.e. the LSN server + doesn't know about) is specified, it returns stats till end of WAL. It + will raise an error, if the server doesn't have WAL available at given + <replaceable>start_lsn</replaceable> or if the + <replaceable>start_lsn</replaceable> is in future or is past the + <replaceable>end_lsn</replaceable>. For example, usage of the function is + as follows: Hmm. I would simplify that, and just mention that an error is raised when the start LSN is available, without caring about the rest (valid end LSN being past the current insert LSN, and error if start > end, the second being obvious). + <note> + <para> + Note that <function>pg_get_wal_records_info_till_end_of_wal</function> and + <function>pg_get_wal_stats_till_end_of_wal</function> functions have been + removed in the <filename>pg_walinspect</filename> version + <literal>1.1</literal>. The same functionality can be achieved with + <function>pg_get_wal_records_info</function> and + <function>pg_get_wal_stats</function> functions by specifying a future + <replaceable>end_lsn</replaceable>. However, <function>till_end_of_wal</function> + functions will still work if the extension is installed explicitly with + version <literal>1.0</literal>. + </para> + </note> Not convinced that this is necessary. + GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal); + + stats_per_record = PG_GETARG_BOOL(2); This code in GetWalStats() is incorrect. pg_get_wal_stats_till_end_of_wal() has a stats_per_record, but as *second* argument, so this would be broken. + GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal); Coming from the last point, I think that this interface is confusing, and actually incorrect. From what I can see, we should be doing what ~15 has by grepping the argument values within the main function calls, and just pass them down to the internal routines GetWalStats() and GetWALRecordsInfo(). -static bool -IsFutureLSN(XLogRecPtr lsn, XLogRecPtr *curr_lsn) +static XLogRecPtr +GetCurrentLSN(void) This wrapper is actually a good idea. At the end, I am finishing with the attached. ValidateInputLSNs() ought to be called, IMO, when the caller of the SQL functions can directly specify an end_lsn. This means that there is no point to do this check in the two till_end_* functions. This has as cost two extra checks to make sure that the start_lsn is not higher than the current LSN, but I am fine to live with that. It seemed rather natural to me to let ValidateInputLSNs() do a refresh of the end_lsn if it sees that it is higher than the current LSN. And if you look closely, you will see that we only call *once* GetCurrentLSN() for each function call, so the maths are more precise. I have cleaned up the comments of the modules, while on it, as there was not much value in copy-pasting how a function fails while there is a centralized validation code path. The tests for the till_end() functions have been moved to the test path where we install 1.0. With all these cleanups done, there is less code than at the beginning, which comes from the docs, so the current code does not change in size: 7 files changed, 173 insertions(+), 206 deletions(-) -- Michael
Attachment
On Mon, Mar 13, 2023 at 12:26 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Mar 10, 2023 at 04:45:06PM +0530, Bharath Rupireddy wrote: > > After that comes the rest of the patch, and I have found a couple of > mistakes. > > - pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) > + pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL) > returns setof record > [...] > - pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn, per_record boolean DEFAULT false) > + pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL, per_record boolean DEFAULT false) > > This part of the documentation is now incorrect. Oh, yeah. Thanks for fixing it. > +-- Make sure checkpoints don't interfere with the test. > +SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false); > > Adding a physical slot is better for stability of course, but the test > also has to drop it or installcheck would cause an existing cluster to > have that still around. The third argument could be true, as well, so > as we'd use a temporary slot. # Disabled because these tests require "wal_level=replica", which # some installcheck users do not have (e.g. buildfarm clients). NO_INSTALLCHECK = 1 pg_walinspect can't be run under installcheck. I don't think dropping the slot at the end is needed, it's unnecessary. I saw oldextversions.sql using the same replication slot name, well no problem, but I changed it to a unique name. > Hmm. I would simplify that, and just mention that an error is raised > when the start LSN is available, without caring about the rest (valid > end LSN being past the current insert LSN, and error if start > end, > the second being obvious). Okay. > + <note> > + <para> > + Note that <function>pg_get_wal_records_info_till_end_of_wal</function> and > + <function>pg_get_wal_stats_till_end_of_wal</function> functions have been > + removed in the <filename>pg_walinspect</filename> version > + <literal>1.1</literal>. The same functionality can be achieved with > + <function>pg_get_wal_records_info</function> and > + <function>pg_get_wal_stats</function> functions by specifying a future > + <replaceable>end_lsn</replaceable>. However, <function>till_end_of_wal</function> > + functions will still work if the extension is installed explicitly with > + version <literal>1.0</literal>. > + </para> > + </note> > > Not convinced that this is necessary. As hackers we know that these functions have been removed and how to achieve till_end_of_wal with the other functions. I noticed that you've removed my changes (see below) from the docs that were saying how to get info/stats till_end_of_wal. That leaves end users confused as to how they can achieve till_end_of_wal functionality. All users can't look for commit history/message but they can easily read the docs. I prefer to have the following (did so in the attached v7) and get rid of the above note if you don't feel strongly about it. + If a future <replaceable>end_lsn</replaceable> + (i.e. the LSN server doesn't know about) is specified, it returns + informaton till end of WAL. > + GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal); > + > + stats_per_record = PG_GETARG_BOOL(2); > > This code in GetWalStats() is incorrect. > pg_get_wal_stats_till_end_of_wal() has a stats_per_record, but as > *second* argument, so this would be broken. Oh, yeah. Thanks for fixing it. > + GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal); > > Coming from the last point, I think that this interface is confusing, > and actually incorrect. From what I can see, we should be doing what > ~15 has by grepping the argument values within the main function > calls, and just pass them down to the internal routines GetWalStats() > and GetWALRecordsInfo(). Hm, what you have in v6 works for me. > At the end, I am finishing with the attached. ValidateInputLSNs() > ought to be called, IMO, when the caller of the SQL functions can > directly specify an end_lsn. This means that there is no point to do > this check in the two till_end_* functions. This has as cost two > extra checks to make sure that the start_lsn is not higher than the > current LSN, but I am fine to live with that. It seemed rather > natural to me to let ValidateInputLSNs() do a refresh of the end_lsn > if it sees that it is higher than the current LSN. And if you look > closely, you will see that we only call *once* GetCurrentLSN() for > each function call, so the maths are more precise. > > I have cleaned up the comments of the modules, while on it, as there > was not much value in copy-pasting how a function fails while there is > a centralized validation code path. The tests for the till_end() > functions have been moved to the test path where we install 1.0. I have some comments and fixed them in the attached v7 patch: 1. + * pg_get_wal_records_info * + * pg_get_wal_stats * I think you wanted to be consistent with function comments with function names atop, but missed adding for all functions. Actually, I don't have a strong opinion on these changes as they unnecessarily bloat the changes, so I removed them. 2. + if (start_lsn > curr_lsn) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot accept future start LSN"), - errdetail("Last known WAL LSN on the database system is at %X/%X.", - LSN_FORMAT_ARGS(curr_lsn)))); - } + errmsg("WAL start LSN must be less than current LSN"))); I don't like this inconsistency much, especially when pg_get_wal_record_info emits "cannot accept future input LSN" with the current LSN details (this current LSN will give a bit more information to the user). Also, let's be consistent across returning NULLs if input LSN/start LSN equal to the current LSN. I've done these changes in the attached v7 patch. 3. I wanted COUNT(*) >= 0 for successful function execution to be COUNT(*) >= 1 so that we will check for at least the functions returning 1 record. And failures to be SELECT * FROM. This was my intention but I don't see that in this patch or in the previous test-refactoring commit. I added that in the attached v7 patch again. Also, made test comments conssitent. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Mar 13, 2023 at 03:53:37PM +0530, Bharath Rupireddy wrote: > On Mon, Mar 13, 2023 at 12:26 PM Michael Paquier <michael@paquier.xyz> wrote: >> +-- Make sure checkpoints don't interfere with the test. >> +SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false); >> >> Adding a physical slot is better for stability of course, but the test >> also has to drop it or installcheck would cause an existing cluster to >> have that still around. The third argument could be true, as well, so >> as we'd use a temporary slot. > > # Disabled because these tests require "wal_level=replica", which > # some installcheck users do not have (e.g. buildfarm clients). > NO_INSTALLCHECK = 1 > > pg_walinspect can't be run under installcheck. I don't think dropping > the slot at the end is needed, it's unnecessary. I saw > oldextversions.sql using the same replication slot name, well no > problem, but I changed it to a unique name. -SELECT pg_drop_replication_slot('regress_pg_walinspect_slot'); - +-- Clean up In my opinion, it is an incorrect practice to assume that nobody will ever run these tests on a running instance. FWIW, I have managed QE/QA flows in the past that did exactly that. I cannot say for already-deployed clusters that could be used for production still I don't feel comfortable with the idea to assume that nobody would do ever that, and calls of pg_drop_replication_slot() are not a bottleneck. So let's be clean and drop these slots to keep the tests self-contained. pg_walinspect in REL_15_STABLE gets that right, IMV, and that's no different from the role cleanup, as one example. > As hackers we know that these functions have been removed and how to > achieve till_end_of_wal with the other functions. I noticed that > you've removed my changes (see below) from the docs that were saying > how to get info/stats till_end_of_wal. That leaves end users confused > as to how they can achieve till_end_of_wal functionality. All users > can't look for commit history/message but they can easily read the > docs. I prefer to have the following (did so in the attached v7) and > get rid of the above note if you don't feel strongly about it. > > + If a future <replaceable>end_lsn</replaceable> > + (i.e. the LSN server doesn't know about) is specified, it returns > + informaton till end of WAL. FWIW, I don't see a strong need for that, because this documents a behavior where we would not fail. And FWIW, it just feel natural to me because the process stops the scan up to where it can. In short, it should be enough for the docs to mention the error patterns, nothing else. > I have some comments and fixed them in the attached v7 patch: > > 1. > + * pg_get_wal_records_info > * > + * pg_get_wal_stats > * > I think you wanted to be consistent with function comments with > function names atop, but missed adding for all functions. Actually, I > don't have a strong opinion on these changes as they unnecessarily > bloat the changes, so I removed them. Either is fine if you feel strongly on this one, I am just used to doing that. > 2. > + if (start_lsn > curr_lsn) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > - errmsg("cannot accept future start LSN"), > - errdetail("Last known WAL LSN on the database system > is at %X/%X.", > - LSN_FORMAT_ARGS(curr_lsn)))); > - } > + errmsg("WAL start LSN must be less than current LSN"))); > > I don't like this inconsistency much, especially when > pg_get_wal_record_info emits "cannot accept future input LSN" with the > current LSN details (this current LSN will give a bit more information > to the user). Also, let's be consistent across returning NULLs if > input LSN/start LSN equal to the current LSN. I've done these changes > in the attached v7 patch. No arguments against that, consistency is good. > 3. I wanted COUNT(*) >= 0 for successful function execution to be > COUNT(*) >= 1 so that we will check for at least the functions > returning 1 record. And failures to be SELECT * FROM. This was my > intention but I don't see that in this patch or in the previous > test-refactoring commit. I added that in the attached v7 patch again. > Also, made test comments conssitent. Noticed that as well, still it feels to me that these had better be separated from the rest, and be in their own patch, perhaps *after* the main patch discussed on this thread, or just moved into their own threads. If a commit finishes with a list of bullet points referring to a list of completely different things than the subject, there may be a problem. In this v7, we have: - Change the behavior of the functions for end LSNs, tweaking the tests to do so. - Adjust more comments and formats in the tests. - Adjust some tests to be pickier with detection of generated WAL records. - Remove the drop slot calls. But what we need to care most here is the first point. I am not arguing that none of that should not be changed, but it should not be inside a patch that slightly tweaks the behaviors of some existing functions. First, it creates a lot of noise in the diffs, making it harder for anybody reading this change to find the core of what's happening. Second, it increases the odds of mistakes and bugs (if a revert is done, the work to-be-done gets greater at the end). When it comes to this patch, the changes should only involve the calls of till_end_of_wal() being moved around from pg_walinspect.sql to oldextversions.sql. If you look at v6, the tests are only focusing on this part, and nothing else. -- Michael
Attachment
On Tue, Mar 14, 2023 at 5:02 AM Michael Paquier <michael@paquier.xyz> wrote: > > So let's be clean and drop these slots to keep the tests > self-contained. pg_walinspect in REL_15_STABLE gets that right, IMV, > and that's no different from the role cleanup, as one example. Hm, added replication slot drop back. > > As hackers we know that these functions have been removed and how to > > achieve till_end_of_wal with the other functions. I noticed that > > you've removed my changes (see below) from the docs that were saying > > how to get info/stats till_end_of_wal. That leaves end users confused > > as to how they can achieve till_end_of_wal functionality. All users > > can't look for commit history/message but they can easily read the > > docs. I prefer to have the following (did so in the attached v7) and > > get rid of the above note if you don't feel strongly about it. > > > > + If a future <replaceable>end_lsn</replaceable> > > + (i.e. the LSN server doesn't know about) is specified, it returns > > + informaton till end of WAL. > > FWIW, I don't see a strong need for that, because this documents a > behavior where we would not fail. And FWIW, it just feel natural to > me because the process stops the scan up to where it can. In short, > it should be enough for the docs to mention the error patterns, > nothing else. My thoughts are simple here - how would one (an end user, not me and not you) figure out how to get info/stats till the end of WAL? I'm sure it would be difficult to find that out without looking at the code or commit history. Search for till end of WAL behaviour with new version will be more given the 1.0 version has explicit functions to do that. IMO, there's no harm in being explicit in how to achieve till end of WAL functionality around in the docs. > > 3. I wanted COUNT(*) >= 0 for successful function execution to be > > COUNT(*) >= 1 so that we will check for at least the functions > > returning 1 record. And failures to be SELECT * FROM. This was my > > intention but I don't see that in this patch or in the previous > > test-refactoring commit. I added that in the attached v7 patch again. > > Also, made test comments conssitent. > > Noticed that as well, still it feels to me that these had better be > separated from the rest, and be in their own patch, perhaps *after* > the main patch discussed on this thread, or just moved into their own > threads. If a commit finishes with a list of bullet points referring > to a list of completely different things than the subject, there may > be a problem. In this v7, we have: > - Change the behavior of the functions for end LSNs, tweaking the > tests to do so. > - Adjust more comments and formats in the tests. > - Adjust some tests to be pickier with detection of generated WAL > records. > - Remove the drop slot calls. > But what we need to care most here is the first point. I get it. I divided the patches to 0001 and 0002 with 0001 focussing on the change of behaviour around future end LSNs, dropping till end of WAL functions and tests tweakings related to it. 0002 has all other tests tidy up things. Please find the attached v8 patch set for further review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Hi, I just rebased a patch over commit 1f282c24e46 Author: Michael Paquier <michael@paquier.xyz> Date: 2023-03-13 13:03:29 +0900 Refactor and improve tests of pg_walinspect and got a test failure: https://cirrus-ci.com/task/5693041982308352 https://api.cirrus-ci.com/v1/artifact/task/5693041982308352/testrun/build/testrun/pg_walinspect/regress/regression.diffs diff -w -U3 C:/cirrus/contrib/pg_walinspect/expected/oldextversions.out C:/cirrus/build/testrun/pg_walinspect/regress/results/oldextversions.out --- C:/cirrus/contrib/pg_walinspect/expected/oldextversions.out 2023-03-14 21:19:01.399716500 +0000 +++ C:/cirrus/build/testrun/pg_walinspect/regress/results/oldextversions.out 2023-03-14 21:26:27.504876700 +0000 @@ -8,10 +8,10 @@ Object description ----------------------------------------------------------- function pg_get_wal_record_info(pg_lsn) - function pg_get_wal_records_info(pg_lsn,pg_lsn) function pg_get_wal_records_info_till_end_of_wal(pg_lsn) - function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) + function pg_get_wal_records_info(pg_lsn,pg_lsn) function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean) + function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) (5 rows) -- Make sure checkpoints don't interfere with the test. Looks like it's missing an ORDER BY. Greetings, Andres Freund
On Tue, Mar 14, 2023 at 02:54:40PM -0700, Andres Freund wrote: > Object description > ----------------------------------------------------------- > function pg_get_wal_record_info(pg_lsn) > - function pg_get_wal_records_info(pg_lsn,pg_lsn) > function pg_get_wal_records_info_till_end_of_wal(pg_lsn) > - function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) > + function pg_get_wal_records_info(pg_lsn,pg_lsn) > function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean) > + function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) > (5 rows) > > -- Make sure checkpoints don't interfere with the test. > > Looks like it's missing an ORDER BY. Interesting. This is "\dx+ pg_walinspect". listOneExtensionContents() uses pg_describe_object() for that, and there is already an ORDER BY based on it. I would not have expected this part to be that much sensitive. Is this using a specific ICU collation, because this is a side-effect of switching ICU as the default in initdb? As a solution, this could use pg_identify_object(classid, objid, 0) in the ORDER BY clause to enforce a better ordering of the objects dealt with as it decomposes the object name and the object type. That should be enough, I assume, as it looks to be parenthesis vs underscore that switch the order. -- Michael
Attachment
On Tue, Mar 14, 2023 at 10:35:43AM +0530, Bharath Rupireddy wrote: > My thoughts are simple here - how would one (an end user, not me and > not you) figure out how to get info/stats till the end of WAL? I'm > sure it would be difficult to find that out without looking at the > code or commit history. Search for till end of WAL behaviour with new > version will be more given the 1.0 version has explicit functions to > do that. IMO, there's no harm in being explicit in how to achieve till > end of WAL functionality around in the docs. Okay. I have kept these notes, but tweaked the wording to be a bit cleaner, replacing the term "till" by "until". To my surprise while studying this point, "till" is a term older than "until" in English literacy, but it is rarely used. > I get it. I divided the patches to 0001 and 0002 with 0001 focussing > on the change of behaviour around future end LSNs, dropping till end > of WAL functions and tests tweakings related to it. 0002 has all other > tests tidy up things. > > Please find the attached v8 patch set for further review. The tests of 0001 were still too complex IMO. The changes can be much simpler as it requires only to move the till_end_of_wal() calls from pg_walinspect.sql to oldextversions.sql. Nothing more. -- Michael
Attachment
Hi, On 2023-03-15 09:56:10 +0900, Michael Paquier wrote: > On Tue, Mar 14, 2023 at 02:54:40PM -0700, Andres Freund wrote: > > Object description > > ----------------------------------------------------------- > > function pg_get_wal_record_info(pg_lsn) > > - function pg_get_wal_records_info(pg_lsn,pg_lsn) > > function pg_get_wal_records_info_till_end_of_wal(pg_lsn) > > - function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) > > + function pg_get_wal_records_info(pg_lsn,pg_lsn) > > function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean) > > + function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) > > (5 rows) > > > > -- Make sure checkpoints don't interfere with the test. > > > > Looks like it's missing an ORDER BY. > > Interesting. This is "\dx+ pg_walinspect". > listOneExtensionContents() uses pg_describe_object() for that, and > there is already an ORDER BY based on it. I would not have expected > this part to be that much sensitive. Is this using a specific ICU > collation, because this is a side-effect of switching ICU as the > default in initdb? It's using ICU, but not a specific collation. The build I linked to is WIP hackery to add ICU support to windows CI. Here's the initdb output: https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log The database cluster will be initialized with this locale configuration: provider: icu ICU locale: en_US LC_COLLATE: English_United States.1252 LC_CTYPE: English_United States.1252 LC_MESSAGES: English_United States.1252 LC_MONETARY: English_United States.1252 LC_NUMERIC: English_United States.1252 LC_TIME: English_United States.1252 The default database encoding has accordingly been set to "WIN1252". The default text search configuration will be set to "english". For comparison, here's a recent CI run (which also failed on windows, but for unrelated reasons), without ICU: https://api.cirrus-ci.com/v1/artifact/task/6478925920993280/testrun/build/testrun/pg_walinspect/regress/log/initdb.log The database cluster will be initialized with locale "English_United States.1252". The default database encoding has accordingly been set to "WIN1252". The default text search configuration will be set to "english". > As a solution, this could use pg_identify_object(classid, objid, 0) in > the ORDER BY clause to enforce a better ordering of the objects dealt > with as it decomposes the object name and the object type. That > should be enough, I assume, as it looks to be parenthesis vs > underscore that switch the order. Greetings, Andres Freund
On Tue, Mar 14, 2023 at 07:05:20PM -0700, Andres Freund wrote: > It's using ICU, but not a specific collation. The build I linked to is WIP > hackery to add ICU support to windows CI. Here's the initdb output: > https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log Hmm. Thanks. At the end, I think that I would be tempted to just remove this \dx query and move on. I did not anticipate that this ordering would be that much sensitive, and the solution of using a COLLATE C at the end of a describe.c query does not sound much appealing, either. -- Michael
Attachment
On Wed, Mar 15, 2023 at 12:27 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Mar 14, 2023 at 07:05:20PM -0700, Andres Freund wrote: > > It's using ICU, but not a specific collation. The build I linked to is WIP > > hackery to add ICU support to windows CI. Here's the initdb output: > > https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log > > Hmm. Thanks. At the end, I think that I would be tempted to just > remove this \dx query and move on. I did not anticipate that this > ordering would be that much sensitive, and the solution of using a > COLLATE C at the end of a describe.c query does not sound much > appealing, either. -1 for removing \dx+ for pg_walinspect version 1.0, because we wanted to show the diff of functions along with testing the upgrade path in the oldextversions.sql. Therefore, I prefer something like [1]: [1] diff --git a/contrib/pg_walinspect/sql/oldextversions.sql b/contrib/pg_walinspect/sql/oldextversions.sql index 258a009888..32a059c72d 100644 --- a/contrib/pg_walinspect/sql/oldextversions.sql +++ b/contrib/pg_walinspect/sql/oldextversions.sql @@ -5,8 +5,17 @@ CREATE EXTENSION pg_walinspect WITH VERSION '1.0'; -- Mask DETAIL messages as these could refer to current LSN positions. \set VERBOSITY terse +-- \dx+ will give locale-sensitive results, so we can't use it here. +CREATE VIEW list_pg_walinspect_objects AS + SELECT pg_describe_object(classid, objid, 0) AS "Object description" + FROM pg_depend + WHERE refclassid = 'pg_extension'::regclass AND + refobjid = (SELECT oid FROM pg_extension WHERE extname = 'pg_walinspect') AND + deptype = 'e' + ORDER BY pg_describe_object(classid, objid, 0) COLLATE "C"; + -- List what version 1.0 contains -\dx+ pg_walinspect +SELECT * FROM list_pg_walinspect_objects; -- Make sure checkpoints don't interfere with the test. SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false); @@ -25,7 +34,7 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_stats_till_end_of_wal('FFFFFFFF/FFFFF ALTER EXTENSION pg_walinspect UPDATE TO '1.1'; -- List what version 1.1 contains -\dx+ pg_walinspect +SELECT * FROM list_pg_walinspect_objects; SELECT pg_drop_replication_slot('regress_pg_walinspect_slot'); -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Mar 15, 2023 at 12:40:17PM +0530, Bharath Rupireddy wrote: > -1 for removing \dx+ for pg_walinspect version 1.0, because we wanted > to show the diff of functions along with testing the upgrade path in > the oldextversions.sql. Therefore, I prefer something like [1]: This is a duplicate of what describe.c uses, with a COLLATE clause. The main goal was to have a simple check, so I'd still stand by the simplest choice and move on. -- Michael
Attachment
On Wed, Mar 15, 2023 at 06:50:01PM +0900, Michael Paquier wrote: > This is a duplicate of what describe.c uses, with a COLLATE clause. > The main goal was to have a simple check, so I'd still stand by the > simplest choice and move on. Please note that I have done something about that with e643a31 by replacing the problematic \dx with a SELECT query, but left the second one as it should not be a problem. -- Michael
Attachment
On Thu, Mar 16, 2023 at 12:48 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Mar 15, 2023 at 06:50:01PM +0900, Michael Paquier wrote: > > This is a duplicate of what describe.c uses, with a COLLATE clause. > > The main goal was to have a simple check, so I'd still stand by the > > simplest choice and move on. > > Please note that I have done something about that with e643a31 by > replacing the problematic \dx with a SELECT query, but left the second > one as it should not be a problem. Thanks. FWIW, I rebased the tests tweaking patch and attached it here as v9. This should keep the pg_walinspect tests consistent across comments, spaces, new lines and using count(*) >= 1 for all successful function executions. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Mar 16, 2023 at 01:17:59PM +0530, Bharath Rupireddy wrote: > FWIW, I rebased the tests tweaking patch and attached it here as v9. > This should keep the pg_walinspect tests consistent across comments, > spaces, new lines and using count(*) >= 1 for all successful function > executions. Thoughts? Mostly OK by me, so applied after tweaking a few tiny things. The rewrites of the queries where we should have more than one record and the removal of count() for the failure cases have been kept as proposed, as are most of the comments. -- Michael