Thread: Add a new pg_walinspect function to extract FPIs from WAL records
Hi, Here's a patch that implements the idea of extracting full page images from WAL records [1] [2] with a function in pg_walinspect. This new function accepts start and end lsn and returns full page image info such as WAL record lsn, tablespace oid, database oid, relfile number, block number, fork name and the raw full page (as bytea). I'll register this in the next commitfest. Thoughts? [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8 [2] https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Add a new pg_walinspect function to extract FPIs from WAL records
From
"Drouvot, Bertrand"
Date:
Hi, On 12/27/22 12:48 PM, Bharath Rupireddy wrote: > Hi, > > Here's a patch that implements the idea of extracting full page images > from WAL records [1] [2] with a function in pg_walinspect. This new > function accepts start and end lsn and returns full page image info > such as WAL record lsn, tablespace oid, database oid, relfile number, > block number, fork name and the raw full page (as bytea). I'll > register this in the next commitfest. > > Thoughts? > I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features". And since [1] added FPI "extraction" then +1 for the proposed patch in this thread. > [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8 > [2] https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com I just have a few comments: + +/* + * Get full page images and their info associated with a given WAL record. + */ What about adding a few words about compression? (like "Decompression is applied if necessary"?) + /* Full page exists, so let's output it. */ + if (!RestoreBlockImage(record, block_id, page)) "Full page exists, so let's output its info and content." instead? + <para> + Gets raw full page images and their information associated with all the + valid WAL records between <replaceable>start_lsn</replaceable> and + <replaceable>end_lsn</replaceable>. Returns one row per full page image. Worth to add a few words about decompression too? I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both extracted images are the same and matches the one modified right after the checkpoint. What do you think? (could be done later in another patch though). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Jan 4, 2023 at 8:19 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote: > > I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features". > And since [1] added FPI "extraction" then +1 for the proposed patch in this thread. > > > [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8 > > [2] https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com > > I just have a few comments: Thanks for reviewing. > + > +/* > + * Get full page images and their info associated with a given WAL record. > + */ > > > + <para> > + Gets raw full page images and their information associated with all the > + valid WAL records between <replaceable>start_lsn</replaceable> and > + <replaceable>end_lsn</replaceable>. Returns one row per full page image. > > Worth to add a few words about decompression too? Done. > What about adding a few words about compression? (like "Decompression is applied if necessary"?) > > > + /* Full page exists, so let's output it. */ > + if (!RestoreBlockImage(record, block_id, page)) > > "Full page exists, so let's output its info and content." instead? Done. > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both > extracted images are the same and matches the one modified right after the checkpoint. > > What do you think? (could be done later in another patch though). I think pageinspect can be used here. We can fetch the raw page from the table after the checkpoint and raw FPI from the WAL record logged as part of the update. I've tried to do so [1], but I see a slight difference in the raw output. The expectation is that they both be the same. It might be that the update operation logs the FPI with some more info set (prune_xid). I'll try to see why it is so. I'm attaching the v2 patch for further review. [1] SELECT * FROM page_header(:'page_from_table'); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 | 4 | 0 (1 row) SELECT * FROM page_header(:'page_from_wal'); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 | 4 | 735 (1 row) -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, 5 Jan 2023 at 18:52, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Jan 4, 2023 at 8:19 PM Drouvot, Bertrand > <bertranddrouvot.pg@gmail.com> wrote: > > > > I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features". > > And since [1] added FPI "extraction" then +1 for the proposed patch in this thread. > > > > > [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8 > > > [2] https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com > > > > I just have a few comments: > > Thanks for reviewing. > > > + > > +/* > > + * Get full page images and their info associated with a given WAL record. > > + */ > > > > > > + <para> > > + Gets raw full page images and their information associated with all the > > + valid WAL records between <replaceable>start_lsn</replaceable> and > > + <replaceable>end_lsn</replaceable>. Returns one row per full page image. > > > > Worth to add a few words about decompression too? > > Done. > > > What about adding a few words about compression? (like "Decompression is applied if necessary"?) > > > > > > + /* Full page exists, so let's output it. */ > > + if (!RestoreBlockImage(record, block_id, page)) > > > > "Full page exists, so let's output its info and content." instead? > > Done. > > > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both > > extracted images are the same and matches the one modified right after the checkpoint. > > > > What do you think? (could be done later in another patch though). > > I think pageinspect can be used here. We can fetch the raw page from > the table after the checkpoint and raw FPI from the WAL record logged > as part of the update. I've tried to do so [1], but I see a slight > difference in the raw output. The expectation is that they both be the > same. It might be that the update operation logs the FPI with some > more info set (prune_xid). I'll try to see why it is so. > > I'm attaching the v2 patch for further review. I felt one of the files was missing in the patch: [13:39:03.534] contrib/pg_walinspect/meson.build:19:0: ERROR: File pg_walinspect--1.0--1.1.sql does not exist. Please post an updated version for the same. Regards, Vignesh
On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both > > extracted images are the same and matches the one modified right after the checkpoint. > > > > What do you think? (could be done later in another patch though). > > I think pageinspect can be used here. We can fetch the raw page from > the table after the checkpoint and raw FPI from the WAL record logged > as part of the update. I've tried to do so [1], but I see a slight > difference in the raw output. The expectation is that they both be the > same. It might be that the update operation logs the FPI with some > more info set (prune_xid). I'll try to see why it is so. > > I'm attaching the v2 patch for further review. > > [1] > SELECT * FROM page_header(:'page_from_table'); > lsn | checksum | flags | lower | upper | special | pagesize | > version | prune_xid > -----------+----------+-------+-------+-------+---------+----------+---------+----------- > 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 | > 4 | 0 > (1 row) > > SELECT * FROM page_header(:'page_from_wal'); > lsn | checksum | flags | lower | upper | special | pagesize | > version | prune_xid > -----------+----------+-------+-------+-------+---------+----------+---------+----------- > 0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 | > 4 | 735 > (1 row) Ugh, v2 patch missed the new file added, I'm attaching v3 patch for further review. Sorry for the noise. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both > > > extracted images are the same and matches the one modified right after the checkpoint. > > > > > > What do you think? (could be done later in another patch though). > > > > I think pageinspect can be used here. We can fetch the raw page from > > the table after the checkpoint and raw FPI from the WAL record logged > > as part of the update. I've tried to do so [1], but I see a slight > > difference in the raw output. The expectation is that they both be the > > same. It might be that the update operation logs the FPI with some > > more info set (prune_xid). I'll try to see why it is so. > > > > I'm attaching the v2 patch for further review. > > > > [1] > > SELECT * FROM page_header(:'page_from_table'); > > lsn | checksum | flags | lower | upper | special | pagesize | > > version | prune_xid > > -----------+----------+-------+-------+-------+---------+----------+---------+----------- > > 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 | > > 4 | 0 > > (1 row) > > > > SELECT * FROM page_header(:'page_from_wal'); > > lsn | checksum | flags | lower | upper | special | pagesize | > > version | prune_xid > > -----------+----------+-------+-------+-------+---------+----------+---------+----------- > > 0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 | > > 4 | 735 > > (1 row) > > Ugh, v2 patch missed the new file added, I'm attaching v3 patch for > further review. Sorry for the noise. I took a stab at how and what gets logged as FPI in WAL records: Option 1: WAL record with FPI contains both the unmodified table page from the disk after checkpoint and new tuple (not applied to the unmodified page) and the recovery (redo) applies the new tuple to the unmodified page as part of recovery. A bit more WAL is needed to store both unmodified page and new tuple data in the WAL record and recovery can get slower a bit too as it needs to stitch the modified page. Option 2: WAL record with FPI contains only the modified page (new tuple applied to the unmodified page from the disk after checkpoint) and the recovery (redo) just returns the applied block as BLK_RESTORED. Recovery can get faster with this approach and less WAL is needed to store just the modified page. My earlier understanding was that postgres does option (1), however, I was wrong, option (2) is what actually postgres has implemented for the obvious advantages specified. I now made the tests a bit stricter in checking the FPI contents (tuple values) pulled from the WAL record with raw page contents pulled from the table using the pageinspect extension. Please see the attached v4 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Add a new pg_walinspect function to extract FPIs from WAL records
From
"Drouvot, Bertrand"
Date:
Hi, On 1/6/23 6:41 PM, Bharath Rupireddy wrote: > On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: >> >> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy >> <bharath.rupireddyforpostgres@gmail.com> wrote: >>> >>>> I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both >>>> extracted images are the same and matches the one modified right after the checkpoint. >>>> >>>> What do you think? (could be done later in another patch though). >>> >>> I think pageinspect can be used here. We can fetch the raw page from >>> the table after the checkpoint and raw FPI from the WAL record logged >>> as part of the update. I've tried to do so [1], but I see a slight >>> difference in the raw output. The expectation is that they both be the >>> same. It might be that the update operation logs the FPI with some >>> more info set (prune_xid). I'll try to see why it is so. >>> >>> I'm attaching the v2 patch for further review. >>> >>> [1] >>> SELECT * FROM page_header(:'page_from_table'); >>> lsn | checksum | flags | lower | upper | special | pagesize | >>> version | prune_xid >>> -----------+----------+-------+-------+-------+---------+----------+---------+----------- >>> 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 | >>> 4 | 0 >>> (1 row) >>> >>> SELECT * FROM page_header(:'page_from_wal'); >>> lsn | checksum | flags | lower | upper | special | pagesize | >>> version | prune_xid >>> -----------+----------+-------+-------+-------+---------+----------+---------+----------- >>> 0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 | >>> 4 | 735 >>> (1 row) >> >> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for >> further review. Sorry for the noise. > > I took a stab at how and what gets logged as FPI in WAL records: > > Option 1: > WAL record with FPI contains both the unmodified table page from the > disk after checkpoint and new tuple (not applied to the unmodified > page) and the recovery (redo) applies the new tuple to the unmodified > page as part of recovery. A bit more WAL is needed to store both > unmodified page and new tuple data in the WAL record and recovery can > get slower a bit too as it needs to stitch the modified page. > > Option 2: > WAL record with FPI contains only the modified page (new tuple applied > to the unmodified page from the disk after checkpoint) and the > recovery (redo) just returns the applied block as BLK_RESTORED. > Recovery can get faster with this approach and less WAL is needed to > store just the modified page. > > My earlier understanding was that postgres does option (1), however, I > was wrong, option (2) is what actually postgres has implemented for > the obvious advantages specified. > > I now made the tests a bit stricter in checking the FPI contents > (tuple values) pulled from the WAL record with raw page contents > pulled from the table using the pageinspect extension. Please see the > attached v4 patch. > Thanks for updating the patch! +-- Compare FPI from WAL record and page from table, they must be same I think "must be the same" or "must be identical" sounds better (but not 100% sure). Except this nit, V4 looks good to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Jan 10, 2023 at 09:29:03AM +0100, Drouvot, Bertrand wrote: > Thanks for updating the patch! > > +-- Compare FPI from WAL record and page from table, they must be same > > I think "must be the same" or "must be identical" sounds better (but not 100% sure). > > Except this nit, V4 looks good to me. +postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number, block_number, fork_name, length(fpi) > 0 as fpi_ok FROM pg_get_wal_fpi_info('0/7418E60', '0/7518218'); This query in the docs is too long IMO. Could you split that across multiple lines for readability? + pg_get_wal_fpi_info(start_lsn pg_lsn, + end_lsn pg_lsn, + lsn OUT pg_lsn, + tablespace_oid OUT oid, + database_oid OUT oid, + relfile_number OUT oid, + block_number OUT int8, + fork_name OUT text, + fpi OUT bytea) I am a bit surprised by this format, used to define the functions part of the module in the docs, while we have examples that actually show what's printed out. I understand that this comes from the original commit of the module, but the rendered docs are really hard to parse as well, no? FWIW, I think that this had better be fixed as well in the docs of v15.. Showing a full set of attributes for the returned record is fine by me, still if these are too long we could just use \x. For this one, I think that there is little point in showing 14 records, so I would stick with a style similar to pageinspect. +CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn, + IN end_lsn pg_lsn, + OUT lsn pg_lsn, + OUT tablespace_oid oid, Slight indentation issue here. Using "relfile_number" would be a first, for what is defined in the code and the docs as a filenode. +SELECT pg_current_wal_lsn() AS wal_lsn4 \gset +-- Get FPI from WAL record +SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4') + WHERE relfile_number = :'sample_tbl_oid' \gset I would be tempted to keep the checks run here minimal with only a basic set of checks on the LSN, without the dependencies to pageinspect (tuple_data_split and get_raw_page), which would be fine enough to check the execution of the function. FWIW, I am surprised by the design choice behind ValidateInputLSNs() to allow data to be gathered until the end of WAL in some cases, but to not allow it in others. It is likely too late to come back to this choice for the existing functions in v15 (quoique?), but couldn't it be useful to make this new FPI function work at least with an insanely high LSN value to make sure that we fetch all the FPIs from a given start position, up to the end of WAL? That looks like a pretty good default behavior to me, rather than issuing an error when a LSN is defined as in the future.. I am really wondering why we have ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could just allow any LSN value in the future automatically, as we can know the current insert or replay LSNs (depending on the recovery state). -- Michael
Attachment
On Wed, Jan 11, 2023 at 10:07 AM Michael Paquier <michael@paquier.xyz> wrote: > > +postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number, > block_number, fork_name, length(fpi) > 0 as fpi_ok FROM > pg_get_wal_fpi_info('0/7418E60', '0/7518218'); > > This query in the docs is too long IMO. Could you split that across > multiple lines for readability? Done. > + pg_get_wal_fpi_info(start_lsn pg_lsn, > + end_lsn pg_lsn, > + lsn OUT pg_lsn, > + tablespace_oid OUT oid, > + database_oid OUT oid, > + relfile_number OUT oid, > + block_number OUT int8, > + fork_name OUT text, > + fpi OUT bytea) > I am a bit surprised by this format, used to define the functions part > of the module in the docs, while we have examples that actually show > what's printed out. I understand that this comes from the original > commit of the module, but the rendered docs are really hard to parse > as well, no? FWIW, I think that this had better be fixed as well in > the docs of v15.. Showing a full set of attributes for the returned > record is fine by me, still if these are too long we could just use > \x. Thanks. I'll work on that separately. > For this one, I think that there is little point in showing 14 > records, so I would stick with a style similar to pageinspect. I've done it that way for pg_get_wal_fpi_info. If this format looks okay, I can propose to do the same for other functions (for backpatching too) in a separate thread though. > +CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn, > + IN end_lsn pg_lsn, > + OUT lsn pg_lsn, > + OUT tablespace_oid oid, > Slight indentation issue here. Done. > Using "relfile_number" would be a first, for what is defined in the > code and the docs as a filenode. Yes, I've changed the column names to be consistent (like in pg_buffercache). > +SELECT pg_current_wal_lsn() AS wal_lsn4 \gset > +-- Get FPI from WAL record > +SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4') > + WHERE relfile_number = :'sample_tbl_oid' \gset > I would be tempted to keep the checks run here minimal with only a > basic set of checks on the LSN, without the dependencies to > pageinspect (tuple_data_split and get_raw_page), which would be fine > enough to check the execution of the function. I understand the concern here that creating dependency between extensions just for testing isn't good. I'm okay to just read the LSN (lsn1) from raw FPI (bytea stream) and the WAL record's LSN (lsn2) and compare them to be lsn2 > lsn1. I'm looking for a way to convert the first 8 bytes from bytea stream to pg_lsn type, on a quick look I couldn't find direct conversion functions, however, I'll try to figure out a way. > FWIW, I am surprised by the design choice behind ValidateInputLSNs() > to allow data to be gathered until the end of WAL in some cases, but > to not allow it in others. It is likely too late to come back to this > choice for the existing functions in v15 (quoique?), but couldn't it Separate functions for users passing end_lsn by themselves and users letting functions decide the end_lsn (current flush LSN or replay LSN) were chosen for better and easier usability and easier validation of user-entered input lsns. We deliberated to have something like below: pg_get_wal_stats(start_lsn, end_lsn, till_end_of_wal default false); pg_get_wal_records_info(start_lsn, end_lsn, till_end_of_wal default false); We wanted to have better validation of the start_lsn and end_lsn, that is, start_lsn < end_lsn and end_lsn mustn't fall into the future when users specify it by themselves (otherwise, one can easily trick the server by passing in the extreme end of the LSN - 0xFFFFFFFFFFFFFFFF). And, we couldn't find a better way to deal with when till_end_of_wal is passed as true (in the above version of the functions). Another idea was to have something like below: pg_get_wal_stats(start_lsn, end_lsn default '0/0'); pg_get_wal_records_info(start_lsn, end_lsn default '0/0'); When end_lsn is not entered or entered as invalid lsn, then return the stats/info till end of the WAL. Again, we wanted to have some validation of the user-entered end_lsn. Instead of cooking multiple behaviours into a single function we opted for till_end_of_wal versions. I still feel this is better unless there's a strong reason against till_end_of_wal versions. > be useful to make this new FPI function work at least with an insanely > high LSN value to make sure that we fetch all the FPIs from a given > start position, up to the end of WAL? That looks like a pretty good > default behavior to me, rather than issuing an error when a LSN is > defined as in the future.. I am really wondering why we have > ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could > just allow any LSN value in the future automatically, as we can know > the current insert or replay LSNs (depending on the recovery state). Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then? With some of the review comments addressed, I'm attaching v5 patch herewith. I would like to hear thoughts on the above open points before writing the v6 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Jan 11, 2023 at 06:59:18PM +0530, Bharath Rupireddy wrote: > I've done it that way for pg_get_wal_fpi_info. If this format looks > okay, I can propose to do the same for other functions (for > backpatching too) in a separate thread though. My vote would be to make that happen first, to have in place cleaner basics for the docs. I could just do it and move on.. > We deliberated to have something like below: > pg_get_wal_stats(start_lsn, end_lsn, till_end_of_wal default false); > pg_get_wal_records_info(start_lsn, end_lsn, till_end_of_wal default false); > > We wanted to have better validation of the start_lsn and end_lsn, that > is, start_lsn < end_lsn and end_lsn mustn't fall into the future when > users specify it by themselves (otherwise, one can easily trick the > server by passing in the extreme end of the LSN - 0xFFFFFFFFFFFFFFFF). > And, we couldn't find a better way to deal with when till_end_of_wal > is passed as true (in the above version of the functions). > > Another idea was to have something like below: > pg_get_wal_stats(start_lsn, end_lsn default '0/0'); > pg_get_wal_records_info(start_lsn, end_lsn default '0/0'); > > When end_lsn is not entered or entered as invalid lsn, then return the > stats/info till end of the WAL. Again, we wanted to have some > validation of the user-entered end_lsn. This reminds me of the slot advancing, where we discarded this case just because it is useful to enforce a LSN far in the future. Honestly, I cannot think of any case where I would use this level of validation, especially having *two* functions to decide one behavior or the other: this stuff could just use one function and use for example NULL as a setup to enforce the end of WAL, on top of a LSN far ahead.. But well.. >> be useful to make this new FPI function work at least with an insanely >> high LSN value to make sure that we fetch all the FPIs from a given >> start position, up to the end of WAL? That looks like a pretty good >> default behavior to me, rather than issuing an error when a LSN is >> defined as in the future.. I am really wondering why we have >> ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could >> just allow any LSN value in the future automatically, as we can know >> the current insert or replay LSNs (depending on the recovery state). > > Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then? I don't really want to make the interface more bloated with more functions than necessary, TBH, though I get your point to push for consistency in these functions. This makes me really wonder whether we'd better make relax all the existing functions, but I may get outvoted. -- Michael
Attachment
On Thu, Jan 12, 2023 at 11:23 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jan 11, 2023 at 06:59:18PM +0530, Bharath Rupireddy wrote: > > I've done it that way for pg_get_wal_fpi_info. If this format looks > > okay, I can propose to do the same for other functions (for > > backpatching too) in a separate thread though. > > My vote would be to make that happen first, to have in place cleaner > basics for the docs. I could just do it and move on.. Sure. Posted a separate patch here - https://www.postgresql.org/message-id/CALj2ACVGcUpziGgQrcT-1G3dHWQQfWjYBu1YQ2ypv9y86dgogg%40mail.gmail.com > This reminds me of the slot advancing, where we discarded this case > just because it is useful to enforce a LSN far in the future. > Honestly, I cannot think of any case where I would use this level of > validation, especially having *two* functions to decide one behavior > or the other: this stuff could just use one function and use for > example NULL as a setup to enforce the end of WAL, on top of a LSN far > ahead.. But well.. I understand. I don't mind discussing something like [1] with the following behaviour and discarding till_end_of_wal functions altogether: If start_lsn is NULL, error out/return NULL. If end_lsn isn't specified, default to NULL, then determine the end_lsn. If end_lsn is specified as NULL, then determine the end_lsn. If end_lsn is specified as non-NULL, then determine if it is greater than start_lsn if yes, go ahead do the job, otherwise error out. I'll think a bit more on this and perhaps discuss it separately. > > Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then? > > I don't really want to make the interface more bloated with more > functions than necessary, TBH, though I get your point to push for > consistency in these functions. This makes me really wonder whether > we'd better make relax all the existing functions, but I may get > outvoted. I'll keep the FPI extract function simple as proposed in the patch and I'll not go write till_end_of_wal version. If needed to get all the FPIs till the end of WAL, one can always determine the end_lsn with pg_current_wal_flush_lsn()/pg_last_wal_replay_lsn() when in recovery, and use the proposed function. Note that I kept the FPI extract function test simple - ensure FPI gets generated and check if the new function can fetch it, discarding lsn or other FPI sanity checks. Whole FPI sanity check needs pageinspect and we don't want to create the dependency just for tests. And checking for FPI lsn with WAL record lsn requires us to fetch lsn from raw bytea stream. As there's no straightforward way to convert raw bytes from bytea to pg_lsn, doing that in a platform-agnostic manner (little-endian and big-endian comes into play here) actually is a no-go IMO. FWIW, for a little-endian system I had to do [2]. Therefore I stick to the simple test unless there's a better way. I'm attaching the v6 patch for further review. [1] CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, IN end_lsn pg_lsn DEFAULT NULL, OUT start_lsn pg_lsn, OUT end_lsn pg_lsn, OUT prev_lsn pg_lsn, OUT xid xid, OUT resource_manager text, OUT record_type text, OUT record_length int4, OUT main_data_length int4, OUT fpi_length int4, OUT description text, OUT block_ref text ) RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_get_wal_records_info' LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE; CREATE FUNCTION pg_get_wal_stats(IN start_lsn pg_lsn, IN end_lsn pg_lsn DEFAULT NULL, IN per_record boolean DEFAULT false, OUT "resource_manager/record_type" text, OUT count int8, OUT count_percentage float8, OUT record_size int8, OUT record_size_percentage float8, OUT fpi_size int8, OUT fpi_size_percentage float8, OUT combined_size int8, OUT combined_size_percentage float8 ) RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_get_wal_stats' LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE; [2] select '\x00000000b8078901000000002c00601f00200420df020000e09f3800c09f3800a09f380080'::bytea AS fpi \gset select substring(:'fpi' from 3 for 16) AS rlsn \gset select substring(:'rlsn' from 7 for 2) || substring(:'rlsn' from 5 for 2) || substring(:'rlsn' from 3 for 2) || substring(:'rlsn' from 1 for 2) AS hi \gset select substring(:'rlsn' from 15 for 2) || substring(:'rlsn' from 13 for 2) || substring(:'rlsn' from 11 for 2) || substring(:'rlsn' from 9 for 2) AS lo \gset select (:'hi' || '/' || :'lo')::pg_lsn; postgres=# select (:'hi' || '/' || :'lo')::pg_lsn; pg_lsn ----------- 0/18907B8 (1 row) -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Jan 12, 2023 at 05:37:40PM +0530, Bharath Rupireddy wrote: > I understand. I don't mind discussing something like [1] with the > following behaviour and discarding till_end_of_wal functions > altogether: > If start_lsn is NULL, error out/return NULL. > If end_lsn isn't specified, default to NULL, then determine the end_lsn. > If end_lsn is specified as NULL, then determine the end_lsn. > If end_lsn is specified as non-NULL, then determine if it is greater > than start_lsn if yes, go ahead do the job, otherwise error out. > > I'll think a bit more on this and perhaps discuss it separately. FWIW, I still find the current interface of the module bloated. So, while it is possible to stick some pg_current_wal_lsn() calls to bypass the error in most cases, enforcing the end of WAL with a NULL or larger value would still be something I would push for based on my own experience as there would be no need to worry about the latest LSN as being two different values in two function contexts. You could keep the functions as STRICT for consistency, and just allow larger values as a synonym for the end of WAL. Saying that, the version of pg_get_wal_fpi_info() committed respects the current behavior of the module, with an error on an incorrect end LSN. > I'll keep the FPI extract function simple as proposed in the patch and > I'll not go write till_end_of_wal version. If needed to get all the > FPIs till the end of WAL, one can always determine the end_lsn with > pg_current_wal_flush_lsn()/pg_last_wal_replay_lsn() when in recovery, > and use the proposed function. I was reading the patch this morning, and that's pretty much what I would have done in terms of simplicity with a test checking that at least one FPI has been generated. I have shortened a bit the documentation, tweaked a few comments and applied the whole after seeing the result. One thing that I have been wondering about is whether it is worth adding the block_id from the record in the output, but discarded this idea as it could be confused with the physical block number, even if this function is for advanced users. -- Michael