Re: pg_walinspect - a new extension to get raw WAL data and WAL stats - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date
Msg-id CALj2ACVX5wCtr7Q4kq0VgrnAS28jJGufZOQt074J8=EWyyDCUw@mail.gmail.com
Whole thread Raw
In response to Re: pg_walinspect - a new extension to get raw WAL data and WAL stats  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Fri, Mar 11, 2022 at 8:08 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> > Attaching the v8 patch-set resolving above comments and some tests for
> > checking function permissions. Please review it further.
>
> I played with this a bit, and would like to share some thoughts on it.

Thanks a lot Kyotaro-san for reviewing.

> It seems to me too rigorous that pg_get_wal_records_info/stats()
> reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> the real end-of-WAL is more kind to users.  I think the same with the
> restriction that start and end LSN are required to be different.

Throwing error on future LSNs is the same behaviour for all of the
pg_walinspect function input LSNs. IMO it is a cleaner thing to do
rather than confuse the users with different behaviours for each
function. The principle is this - pg_walinspect functions can't show
future WAL info. Having said that, I agree to make it a WARNING
instead of ERROR, for the simple reason that ERROR aborts the txn and
the applications can retry without aborting the txn. For instance,
pg_terminate_backend emits a WARNING if the PID isn't a postgres
process id.

PS: WARNING may not be a better idea than ERROR if we turn
pg_get_wal_stats a SQL function, see my response below.

> The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
> functions, they tell me the beginning of the current insertion point
> in error message.  On the other hand they don't accept the same
> value as end-LSN.  I think it is right that they tell the current
> insertion point and they should take the end-LSN as the LSN to stop
> reading.

The future LSN is determined by this:

if (!RecoveryInProgress())
available_lsn = GetFlushRecPtr(NULL);
else
available_lsn = GetXLogReplayRecPtr(NULL);

GetFlushRecPtr returns last byte + 1 flushed meaning this is the end
LSN currently known in the server, but it is not the start LSN of the
last WAL record in the server. Same goes with GetXLogReplayRecPtr
which gives lastReplayedEndRecPtr end+1 position. I picked
GetFlushRecPtr and GetXLogReplayRecPtr to determine the future WAL LSN
because this is how read_local_xlog_page determines to read WAL upto
and goes for wait mode, but I wanted to avoid the wait mode completely
for all the pg_walinspect functions (to keep things simple for now),
hence doing the similar checks within the input validation code and
emitting warning.

And you are right when we emit something like below, users tend to use
0/15B6D68 (from the DETAIL message) as the end LSN. I don't want to
ignore this DETAIL message altogether as it gives an idea where the
server is. How about rephrasing the DETAIL message a bit, something
like "Database system flushed the WAL up to WAL LSN %X/% X.'' or some
other better phrasing?

WARNING:  WAL start LSN cannot be a future WAL LSN
DETAIL:  Last known WAL LSN on the database system is 0/15B6D68.

If the users aren't sure about what's the end record LSN, they can
just use pg_get_wal_records_info and pg_get_wal_stats without end LSN:
select * from pg_get_wal_records_info('0/15B6D68');
select * from pg_get_wal_stats('0/15B6D68');

> I think pg_get_wal_stats() is worth to have but I think it should be
> implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
> about FPI since pg_waldump doesn't but it is internally collected (of
> course!) and easily revealed. If we do that, the
> pg_get_wal_records_stats() would be reduced to the following SQL
> statement
>
> SELECT resource_manager resmgr,
>        count(*) AS N,
>            (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
>            sum(total_length) AS "combined size",
>            (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
>            sum(fpi_len) AS fpilen,
>            (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
>            FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
>            GROUP by resource_manager
>            WINDOW tot AS ()
>            ORDER BY "combined size" desc;
>
> The only difference with pg_waldump is the statement above doesn't
> show lines for the resource managers that don't contained in the
> result of pg_get_wal_records_info(). But I don't think that matters.

Yeah, this is better. One problem with the above is when
pg_get_wal_records_info emits a warning for future LSN. But this
shouldn't stop us doing it via SQL. Instead I would let all the
pg_walinspect functions emit errors as opposed to WARNING. Thoughts?

postgres=# SELECT resource_manager, count(*) AS count,
(count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS count_percentage,
sum(total_length) AS combined_size,
(sum(total_length) * 100 / sum(sum(total_length)) OVER
tot)::numeric(5,2) AS combined_size_percentage
FROM pg_get_wal_records_info('0/10A3E50', '0/25B6F00')
GROUP BY resource_manager
WINDOW tot AS ()
ORDER BY combined_size desc;
WARNING:  WAL end LSN cannot be a future WAL LSN
DETAIL:  Last known WAL LSN on the database system is 0/15CAA70.
 resource_manager | count | count_percentage | combined_size |
combined_size_percentage
------------------+-------+------------------+---------------+--------------------------
                  |     1 |           100.00 |               |
(1 row)

> Sometimes the field description has very long (28kb long) content. It
> makes the result output almost unreadable and I had a bit hard time
> struggling with the output full of '-'s.  I would like have a default
> limit on the length of such fields that can be long but I'm not sure
> we want that.

Yeah, it's a text column, let's leave it as-is, if required users can
always ignore the description columns.

> And about pg_get_raw_wal_record(). I don't see any use-case of the
> function alone on SQL interface.  Even if we need to inspect broken
> WAL files, it needs profound knowledge of WAL format and tools that
> doesn't work on SQL interface.
>However like pageinspect, if we separate the WAL-record fetching and
> parsing it could be thought as useful.
> SELECT * FROM pg_walinspect_parse(raw)
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));
>
> And pg_get_wal_stats woule be like:
>
> SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));

Imagine pg_get_raw_wal_record function feeding raw WAL record to an
external tool/extension that understands the WAL. Apart from this, I
don't have a concrete reason either. I'm open to removing this
function as well and adding it along with the raw WAL parsing function
in future.

I haven't thought about the raw WAL parsing functions for now. In
fact, there are many functions we can add to pg_walinspect - functions
with wait mode for future WAL, WAL parsing, function to return all the
WAL record info/stats given a WAL file name, functions to return WAL
info/stats from historic timelines as well, function to see if the
given WAL file is valid and so on. We can park these functions for
future versions of pg_walinspect once the extension itself with basic
yet-useful-and-effective functions gets in. I will make a note of
these functions and will work in future based on how pg_walinspect
gets received by the users and community out there.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Next
From: Tomas Vondra
Date:
Subject: Re: Column Filtering in Logical Replication