Re: pg_walinspect - a new extension to get raw WAL data and WAL stats - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date
Msg-id CALj2ACXsu42jnjyV7zpNtGMmDCsvXr59A9x-C94uyHov2nmp6Q@mail.gmail.com
Whole thread Raw
In response to Re: pg_walinspect - a new extension to get raw WAL data and WAL stats  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Responses Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
List pgsql-hackers
On Mon, Feb 14, 2022 at 8:32 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Here are few comments:

Thanks for reviewing the patches.

> +/*
> + * Verify the authenticity of the given raw WAL record.
> + */
> +Datum
> +pg_verify_raw_wal_record(PG_FUNCTION_ARGS)
> +{
>
>
> Do we really need this function? I see that whenever the record is
> read, we verify it. So could there be a scenario where any of these
> functions would return an invalid WAL record?

Yes, this function can be useful. Imagine a case where raw WAL records
are fetched from one server using pg_get_wal_record_info and sent over
the network to another server (for fixing some of the corrupted data
pages or for whatever reasons), using pg_verify_raw_wal_record one can
verify authenticity.

> Should we add a function that returns the pointer to the first and
> probably the last WAL record in the WAL segment? This would help users
> to inspect the wal records in the entire wal segment if they wish to
> do so.

Good point. One can do this already with pg_get_wal_records_info and
pg_walfile_name_offset. Usually, the LSN format itself can give an
idea about the WAL file it is in.

postgres=# select lsn, pg_walfile_name_offset(lsn) from
pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc
limit 1;
    lsn    |    pg_walfile_name_offset
-----------+-------------------------------
 0/5000038 | (000000010000000000000005,56)
(1 row)

postgres=# select lsn, pg_walfile_name_offset(lsn) from
pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc
limit 1;
    lsn    |       pg_walfile_name_offset
-----------+-------------------------------------
 0/5FFFFC0 | (000000010000000000000005,16777152)
(1 row)

Having said that, we can always add a function or a view (with the
above sort of queries) to pg_walinspect  - given an LSN can give the
valid start record in that wal file (by following previous lsn links)
and valid end record lsn. IMO, that's not required now, maybe later
once the initial version of pg_walinspect gets committed, as we
already have a way to achieve what we wanted here.

> +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
>
> I think we should allow all these functions to be executed in wait and
> *nowait* mode. If a user specifies nowait mode, the function should
> return if no WAL data is present, rather than waiting for new WAL data
> to become available, default behaviour could be anything you like.

Currently, pg_walinspect uses read_local_xlog_page which waits in the
while(1) loop if a future LSN is specified. As read_local_xlog_page is
an implementation of XLogPageReadCB, which doesn't have a wait/nowait
parameter, if we really need a wait/nowait mode behaviour, we need to
do extra things(either add a backend-level global wait variable, set
before XLogReadRecord, if set, read_local_xlog_page can just exit
without waiting and reset after the XLogReadRecord or add an extra
bool wait variable to XLogReaderState and use it in
read_local_xlog_page).

Another problem with the wait mode is - wait until when? Because we
don't want to wait forever by specifying a really really future LSN,
maybe you could think of adding a timeout (if the future LSN hasn't
generated the given timeout, then just return). As I said upthread, I
think all of these functions can be parked to future pg_walinspect
versions once it gets committed with most-useful functions as proposed
in the v4 patch set.

> +Datum
> +pg_get_wal_records_info(PG_FUNCTION_ARGS)
> +{
> +#define PG_GET_WAL_RECORDS_INFO_COLS 10
>
>
> We could probably have another variant of this function that would
> work even if the end pointer is not specified, in which case the
> default end pointer would be the last WAL record in the WAL segment.
> Currently it mandates the use of an end pointer which slightly reduces
> flexibility.

Last WAL record in the WAL segment may not be of much use(one can
figure out the last valid WAL record in a wal file as mentioned
above), but the WAL records info till the latest current flush LSN of
the server would be a useful functionality. But that too, can be found
using something like "select lsn, prev_lsn, resource_manager from
pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());"

> +
> +/*
> + * Get the first valid raw WAL record lsn.
> + */
> +Datum
> +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
>
>
> I think this function should return a pointer to the nearest valid WAL
> record which can be the previous WAL record to the LSN entered by the
> user or the next WAL record. If a user unknowingly enters an lsn that
> does not exist then in such cases we should probably return the lsn of
> the previous WAL record instead of hanging or waiting for the new WAL
> record to arrive.

Is it useful? If there's a strong reason, how about naming
pg_get_next_valid_wal_record_lsn returning the next valid wal record
LSN and pg_get_previous_valid_wal_record_lsn returning the previous
valid wal record LSN ? If you think having two functions is too much,
then, how about pg_get_first_valid_wal_record_lsn returning both the
next valid wal record LSN and its previous wal record LSN?

> Another important point I would like to mention here is - have we made
> an attempt to ensure that we try to share as much of code with
> pg_waldump as possible so that if any changes happens in the
> pg_waldump in future it gets applied here as well and additionally it
> will also reduce the code duplication.

I tried, please have a look at the patch. Also, I added a note at the
beginning of pg_walinspect and pg_waldump to consider fixing
issues/changing the code in both the places also.

> I haven't yet looked into the code in detail. I will have a look at it
> asap. thanks.

That will be great.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Move scanint8() to numutils.c
Next
From: Peter Geoghegan
Date:
Subject: Re: adding 'zstd' as a compression algorithm