Re: pg_walinspect - a new extension to get raw WAL data and WAL stats - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date
Msg-id CALj2ACXBH6n40sgYZvtkYzkatYteL+bHftc-rEcOmWcmrfnF=A@mail.gmail.com
Whole thread Raw
In response to Re: pg_walinspect - a new extension to get raw WAL data and WAL stats  (RKN Sai Krishna <rknsaiforpostgres@gmail.com>)
Responses Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
List pgsql-hackers
On Fri, Mar 25, 2022 at 8:37 PM RKN Sai Krishna
<rknsaiforpostgres@gmail.com> wrote:
>
> Hi Bharath,
>
> First look at the patch, bear with me if any of the following comments are repeated.

Thanks RKN, for playing around with the patches.

> 1. With pg_get_wal_record(lsn), say a WAL record start, end lsn range contains the specified LSN, wouldn't it be more
meaningfulto show the corresponding WAL record. 

In general, all the functions will first look for a first valid WAL
record from the given input lsn/start lsn(XLogFindNextRecord) and then
give info of all the valid records including the first valid WAL
record until either the given end lsn or till end of WAL depending on
the function used.

> For example, upon providing '0/17335E7' as input, and I see get the WAL record ('0/1733618', '0/173409F') as output
andnot the one with start and end lsn as ('0/17335E0', '0/1733617'). 

If  '0/17335E7' is an LSN containing a valid WAL record,
pg_get_wal_record gives the info of that, otherwise if there's any
next valid WAL record, it finds and gives that info. '0/17335E0' is
before '0/17335E7' the input lsn, so it doesn't show that record, but
the next valid record.

All the pg_walinspect functions don't look for the nearest valid WAL
record (could be previous to input lsn or next to input lsn), but they
look for the next valid WAL record. This is because the xlogreader
infra now has no API for backward iteration from a given LSN ( it has
XLogFindNextRecord and XLogReadRecord which scans the WAL in forward
direction). But, it's a good idea to have XLogFindPreviousRecord and
XLogReadPreviousRecord versions (as we have links for previous WAL
record in each WAL record) but that's a separate discussion.

> With pg_walfile_name(lsn), we can find the WAL segment file name that contains the specified LSN.

Yes.

> 2. I see the following output for pg_get_wal_record. Need to have a look at the spaces I suppose.

I believe this is something psql does for larger column outputs for
pretty-display. When used in a non-psql client, the column values are
returned properly. Nothing to do with the pg_walinspect patches here.

> 3. Should these functions be running in standby mode too? We do not allow WAL control functions to be executed during
recoveryright? 

There are functions that can be executable during recovery
pg_last_wal_receive_lsn, pg_last_wal_replay_lsn. The pg_walinspect
functions are useful even in recovery and I don't see a strong reason
to not support them. Hence, I'm right now supporting them.

> 4. If the wal segment corresponding to the start lsn is removed, but there are WAL records which could be read in the
specifiedinput lsn range, would it be better to output the existing WAL records displaying a message that it is a
partiallist of WAL records and the WAL files corresponding to the rest are already removed, rather than erroring out
saying"requested WAL segment has already been removed"? 

"requested WAL segment %s has already been removed" is a common error
across the xlogreader infra (see wal_segment_open) and I don't want to
invent a new behaviour. And all the segment_open callbacks report an
error when they are not finding the WAL file that they are looking
for.

> 5. Following are very minor comments in the code
>
> Correct the function description by removing "return the LSN up to which the server has WAL" for IsFutureLSN

That's fine, because it actually returns curr_lsn via the function
param curr_lsn. However, I modified the comment a bit.

> In GetXLogRecordInfo, good to have pfree in place for rec_desc, rec_blk_ref, data

No, we are just returning pointer to the string, not deep copying, see
CStringGetTextDatum. All the functions get executed within a
function's memory context and after handing off the results to the
client that gets deleted, deallocating all the memory.

> In GetXLogRecordInfo, can avoid calling XLogRecGetInfo(record) multiple times by capturing in a variable

XLogRecGetInfo is not a function, it's a macro, so that's fine.
#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info)

> In GetWALDetailsGuts, setting end_lsn could be done in single if else and similarly we can club the if statements
verifyingif the start lsn is a future lsn. 

The existing if conditions are:

    if (IsFutureLSN(start_lsn, &curr_lsn))
    if (!till_end_of_wal && end_lsn >= curr_lsn)
    if (till_end_of_wal)
    if (start_lsn >= end_lsn)

I clubbed them like this:
    if (!till_end_of_wal)
    if (IsFutureLSN(start_lsn, &curr_lsn))
    if (!till_end_of_wal && end_lsn >= curr_lsn)
    else if (till_end_of_wal)

Other if conditions are serving different purposes, so I'm leaving them as-is.

Attaching v16 patch-set, only change in v16-0002-pg_walinspect.patch,
others remain the same.

Regards,
Bharath Rupireddy.

Attachment

pgsql-hackers by date:

Previous
From: "Shinoda, Noriyoshi (PN Japan FSIP)"
Date:
Subject: RE: Column Filtering in Logical Replication
Next
From: Bharath Rupireddy
Date:
Subject: Remove an unused function GetWalRcvWriteRecPtr