Re: pg_walinspect - a new extension to get raw WAL data and WAL stats - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date
Msg-id CALj2ACUvU2fGLw7keEpxZhGAoMQ9vrCPX-13hexQPoR+QRbuOw@mail.gmail.com
Whole thread Raw
In response to Re: pg_walinspect - a new extension to get raw WAL data and WAL stats  (Andres Freund <andres@anarazel.de>)
Responses Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
List pgsql-hackers
On Sat, Mar 19, 2022 at 5:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> First look at this patch, so I might be repeating stuff already commented on /
> discussed.

Thanks for taking a look at the patch.

> On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote:
> > +--
> > +-- pg_get_raw_wal_record()
>
> What is raw about the function?

It right now gives data starting from the output of XLogReadRecord
upto XLogRecGetTotalLen(xlogreader); length. Given that XLogReadRecord
returns a pointer to the decoded record's header, I'm not sure it's
the right choice. Actually, this function's intention(not an immediate
use-case though), is to feed the WAL record to another function and
then, say, repair a corrupted page given a base data page.

As I said upthread, I'm open to removing this function for now, when a
realistic need comes we can add it back. It also raised some concerns
around the security and permissions. Thoughts?

> Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a
> NULL lsn, does it?  Also, that's the default, why is it restated here?

pg_get_wal_records_info needed that option (if end_lsn being the
default, providing WAL info upto the end of WAL). Also, we can emit
better error message ("invalid WAL start LSN") instead of generic one.
I wanted to keep error message and code same across all the functions
hence CALLED ON NULL INPUT option for pg_get_raw_wal_record.

> > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC;
> > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor;
>
> I don't think it's appropriate for pg_monitor to see all the data in the WAL.

How about pg_read_server_files or some other?

> > +-- pg_get_wal_stats()
>
> This seems like an exceedingly expensive way to compute this. Not just because
> of doing the grouping, window etc, but also because it's serializing the
> "data" field from pg_get_wal_records_info() just to never use it. With any
> appreciatable amount of data the return value pg_get_wal_records_info() will
> be serialized into a on-disk tuplestore.
>
> This is probably close to an order of magnitude slower than pg_waldump
> --stats. Which imo renders this largely useless.

Yeah that's true. Do you suggest having pg_get_wal_stats() a
c-function like in v8 patch [1]?

SEe some numbers at [2] with pg_get_wal_stats using
pg_get_wal_records_info and pg_get_wal_records_info as a direct
c-function like in v8 patch [1]. A direct c-function always fares
better (84 msec vs 1400msec).

> The column names don't seem great either. "tfpil"?

"total fpi length" - tfpil wanted to keep it short - it's just an
internal column name isn't it? The actual column name the user sees is
fpi_length.

> > +void
> > +_PG_init(void)
>
> > +void
> > +_PG_fini(void)
>
> Why have this stuff if it's not used?

I kept it as a placeholder for future code additions, for instance
test_decoding.c and ssl_passphrase_func.c has empty _PG_init(),
_PG_fini(). If okay, I can mention there like "placeholder for now",
otherwise I can remove it.

> > +static XLogRecPtr
> > +ValidateInputLSN(XLogRecPtr lsn)
>
> > +static XLogRecPtr
> > +ValidateStartAndEndLSNs(XLogRecPtr start_lsn, XLogRecPtr end_lsn)
> > +{
>
> These two functions are largely redundant, that doesn't seem great.

I will modify it in the next version.

> > +Datum
> > +pg_get_raw_wal_record(PG_FUNCTION_ARGS)
>
> Most of this has another copy in pg_get_wal_record_info(). Can more of this be
> deduplicated?

I will do, if we decide on whether or not to have the
pg_get_raw_wal_record function at all? Please see my comments above.

> > +     initStringInfo(&temp);
> > +     desc->rm_desc(&temp, record);
> > +     appendStringInfo(&rec_desc, "%s", temp.data);
> > +     pfree(temp.data);
> > +     initStringInfo(&rec_blk_ref);
>
> This seems unnecessarily wasteful. You serialize into one stringinfo, just to
> then copy that stringinfo into another stringinfo. Just to then allocate yet
> another stringinfo.

Yeah, I will remove it. Looks like all the rm_desc callbacks append to
the passed-in buffer and not reset it, so we should be good.

> > +     /* Block references (detailed format). */
>
> This comment seems copied from pg_waldump, but doesn't make sense here,
> because there's no short format.

Yes, I will remove it.

> > +     for (block_id = 0; block_id <= record->max_block_id; block_id++)
> > +     {
>
> To me duplicating this much code from waldump seems like a bad idea from a
> maintainability POV.

Even if we were to put the above code from pg_walinspect and
pg_waldump into, say, walutils.c or some other existing file, there we
had to make if (pg_walinspect) appendStringInfo else if (pg_waldump)
printf() sort of thing, isn't it clumsy? Also, unnecessary if
conditions need to be executed for every record. For maintainability,
I added a note in pg_walinspect.c and pg_waldump.c to consider fixing
things in both places (of course this might sound dumbest way of doing
it, IMHO, it's sensible, given the if(pg_walinspect)-else
if(pg_waldump) sorts of code that we need to do in the common
functions). Thoughts?

> > +/*
> > + * Get info and data of all WAL records between start LSN and end LSN.
> > + */
> > +static void
> > +GetWALRecordsInfoInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
>
> This should likely use the infrastructure introduced in 5b81703787bfc1e6072c8e37125eba0c5598b807.

Yes, I will change it.

> > +     for (;;)
> > +     {
> > +             (void) ReadNextXLogRecord(xlogreader, first_record);
> > +
> > +             /*
> > +              * Let's not show the record info if it is spanning more than the
> > +              * end_lsn. EndRecPtr is "end+1" of the last read record, hence
> > +              * use "end" here.
> > +              */
> > +             if ((xlogreader->EndRecPtr - 1) <= end_lsn)
> > +             {
> > +                     GetXLogRecordInfo(xlogreader, xlogreader->currRecPtr, values, nulls,
> > +                                                       PG_GET_WAL_RECORDS_INFO_COLS);
> > +
> > +                     tuplestore_putvalues(tupstore, tupdesc, values, nulls);
> > +             }
> > +
> > +             /* Exit loop if read up to end_lsn. */
> > +             if (xlogreader->EndRecPtr >= end_lsn)
> > +                     break;
>
> Seems weird to have both of these conditions separately.

Yeah. It is to handle some edge cases to print the WAL  upto end_lsn
and avoid waiting in read_local_xlog_page. I will change it.

Actually, there's an open point as specified in [3]. Any thoughts on it?

[1] https://www.postgresql.org/message-id/CALj2ACWhcbW_s4BXLyCpLWcCppZN9ncTXHbn4dv1F0Vpe0kxqA%40mail.gmail.com
[2] with pg_get_wal_stats using pg_get_wal_stats:
Time: 1394.919 ms (00:01.395)
Time: 1403.199 ms (00:01.403)
Time: 1408.138 ms (00:01.408)
Time: 1397.670 ms (00:01.398)

with pg_get_wal_stats as a c-function like in v8 patch [1]:
Time: 84.319 ms
Time: 84.303 ms
Time: 84.208 ms
Time: 84.452 ms

use case:
create extension pg_walinspect;

create table foo(col int);
insert into foo select * from generate_series(1, 100000);
update foo set col = col*2+1;
delete from foo;

\timing on
select * from pg_get_wal_stats('0/01000028');
\timing off

output:
postgres=# select * from pg_get_wal_stats('0/01000028');
 resource_manager | count  | count_percentage | record_length |
record_length_percentage | fpi_length | fpi_length_percentage

------------------+--------+------------------+---------------+--------------------------+------------+-----------------------
 Storage          |     13 |                0 |           546 |
                0 |          0 |                     0
 CLOG             |      1 |                0 |            30 |
                0 |          0 |                     0
 Database         |      2 |                0 |            84 |
                0 |          0 |                     0
 Btree            |  13078 |              3.1 |       1486990 |
             4.97 |     461512 |                 23.13
 Heap             | 404835 |            95.84 |      26354653 |
            88.17 |     456576 |                 22.88
 Transaction      |    721 |             0.17 |        178933 |
              0.6 |          0 |                     0
 Heap2            |   3056 |             0.72 |       1131836 |
             3.79 |     376932 |                 18.89
 Standby          |    397 |             0.09 |         23226 |
             0.08 |          0 |                     0
 XLOG             |    316 |             0.07 |        716027 |
              2.4 |     700164 |                 35.09
(9 rows)

[3] https://www.postgresql.org/message-id/CALj2ACVBST5Us6-eDz4q_Gem3rUHSC7AYNOB7tmp9Yqq6PHsXw%40mail.gmail.com

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: New Object Access Type hooks
Next
From: Tom Lane
Date:
Subject: Re: New Object Access Type hooks