Re: pg_walinspect - a new extension to get raw WAL data and WAL stats - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date
Msg-id CAE9k0PnTjffOg9GcS2paHbV0snkTqrYJ8Ym4sQ7hfaAhy_SiFA@mail.gmail.com
Whole thread Raw
In response to Re: pg_walinspect - a new extension to get raw WAL data and WAL stats  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Wed, Mar 2, 2022 at 10:37 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Some review comments on v5 patch (v5-0001-pg_walinspect.patch)
>
> Thanks for reviewing.
>
> > +--
> > +-- pg_get_wal_records_info()
> > +--
> > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> > +    IN end_lsn pg_lsn,
> > +    IN wait_for_wal boolean DEFAULT false,
> > +    OUT lsn pg_lsn,
> >
> > What does the wait_for_wal flag mean here when one has already
> > specified the start and end lsn? AFAIU, If a user has specified a
> > start and stop LSN, it means that the user knows the extent to which
> > he/she wants to display the WAL records in which case we need to stop
> > once the end lsn has reached . So what is the meaning of wait_for_wal
> > flag? Does it look sensible to have the wait_for_wal flag here? To me
> > it doesn't.
>
> Users can always specify a future end_lsn and set wait_for_wal to
> true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
> wait for the WAL. IMO, this is useful. If you remember you were okay
> with wait/nowait versions for some of the functions upthread [1]. I'm
> not going to retain this behaviour for both
> pg_get_wal_records_info/pg_get_wal_stats as it is similar to
> pg_waldump's --follow option.
>

It is not at all similar to pg_waldumps behaviour. Please check the
behaviour of pg_waldump properly. Does it wait for any wal records
when a user has specified a stop pointer? It doesn't and it shouldn't.
I mean does it even make sense to wait for the WAL when a stop pointer
is specified? And it's quite understandable that if a user has asked
pg_walinspect to stop at a certain point, it must. Also, What if there
are already WAL records after the stop pointer, in that case does it
even make sense to have a wait flag. WHat would be the meaning of the
wait flag in that case?

Further, have you checked wait_for_wal flag behaviour, is it even working?

> >
> > +--
> > +-- pg_get_wal_records_info_till_end_of_wal()
> > +--
> > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> > +    OUT lsn pg_lsn,
> > +    OUT prev_lsn pg_lsn,
> > +    OUT xid xid,
> >
> > Why is this function required? Is pg_get_wal_records_info() alone not
> > enough? I think it is. See if we can make end_lsn optional in
> > pg_get_wal_records_info() and lets just have it alone. I think it can
> > do the job of pg_get_wal_records_info_till_end_of_wal function.
> >
> > ==
> >
> > +--
> > +-- pg_get_wal_stats_till_end_of_wal()
> > +--
> > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> > +    OUT resource_manager text,
> > +    OUT count int8,
> >
> > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?
>
> I'm doing the following input validations for these functions to not
> cause any issues with invalid LSN. If I were to have the default value
> for end_lsn as 0/0, I can't perform input validations right? That is
> the reason I'm having separate functions {pg_get_wal_records_info,
> pg_get_wal_stats}_till_end_of_wal() versions.
>

You can do it. Please check pg_waldump to understand how it is done
there. You cannot have multiple functions doing different things when
one single function can do all the job.

> > ==
> >
> >
> > +                       if (loc <= read_upto)
> > +                               break;
> > +
> > +                       /* Let's not wait for WAL to be available if
> > indicated */
> > +                       if (loc > read_upto &&
> > +                               state->private_data != NULL)
> > +                       {
> >
> > Why loc > read_upto? The first if condition is (loc <= read_upto)
> > followed by the second if condition - (loc > read_upto). Is the first
> > if condition (loc <= read_upto) not enough to indicate that loc >
> > read_upto?
>
> Yeah, that's unnecessary, I improved the comment there and removed loc
> > read_upto.
>
> > ==
> >
> > +#define IsEndOfWALReached(state) \
> > +               (state->private_data != NULL && \
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->no_wait == true) && \
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->reached_end_of_wal == true))
> >
> > I think we should either use state or xlogreader. First line says
> > state->private_data and second line xlogreader->private_data.
>
> I've changed it to use state instead of xlogreader.
>
> > ==
> >
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->reached_end_of_wal == true))
> > +
> >
> > There is a new patch coming to make the end of WAL messages less
> > scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> > that instead of introducing new flags in private area to represent the
> > end of WAL.
>
> Yeah that would be great. But we never know which one gets committed
> first. Until then it's not good to have dependencies on two "on-going"
> patches. Later, we can change.
>
> > ==
> >
> > +/*
> > + * XLogReaderRoutine->page_read callback for reading local xlog files
> > + *
> > + * This function is same as read_local_xlog_page except that it works in both
> > + * wait and no wait mode. The callers can specify about waiting in private_data
> > + * of XLogReaderState.
> > + */
> > +int
> > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> > +                                          int reqLen, XLogRecPtr
> > targetRecPtr, char *cur_page)
> > +{
> > +       XLogRecPtr      read_upto,
> >
> > Do we really need this function? Can't we make use of an existing WAL
> > reader function - read_local_xlog_page()?
>
> I clearly explained the reasons upthread [2]. Please let me know if
> you have more thoughts/doubts here, we can connect offlist.
>
> Attaching v6 patch set with above review comments addressed. Please
> review it further.
>
> [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com
> +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
>
> I think we should allow all these functions to be executed in wait and
> *nowait* mode. If a user specifies nowait mode, the function should
> return if no WAL data is present, rather than waiting for new WAL data
> to become available, default behaviour could be anything you like.
>
> [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com
> I've added a new function read_local_xlog_page_2 (similar to
> read_local_xlog_page but works in wait and no wait mode) and the
> callers can specify whether to wait or not wait using private_data.
> Actually, I wanted to use the private_data structure of
> read_local_xlog_page but the logical decoding already has context as
> private_data, that is why I had to have a new function. I know it
> creates a bit of duplicate code, but its cleaner than using
> backend-local variables or additional flags in XLogReaderState or
> adding wait/no-wait boolean to page_read callback. Any other
> suggestions are welcome here.
>
> With this, I'm able to have wait/no wait versions for any functions.
> But for now, I'm having wait/no wait for two functions
> (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
> sense.
>
> Regards,
> Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Next
From: Amit Kapila
Date:
Subject: Re: row filtering for logical replication