Re: pg_walinspect - a new extension to get raw WAL data and WAL stats - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: pg_walinspect - a new extension to get raw WAL data and WAL stats |
Date | |
Msg-id | CALj2ACUvU2fGLw7keEpxZhGAoMQ9vrCPX-13hexQPoR+QRbuOw@mail.gmail.com Whole thread Raw |
In response to | Re: pg_walinspect - a new extension to get raw WAL data and WAL stats (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
|
List | pgsql-hackers |
On Sat, Mar 19, 2022 at 5:18 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > First look at this patch, so I might be repeating stuff already commented on / > discussed. Thanks for taking a look at the patch. > On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote: > > +-- > > +-- pg_get_raw_wal_record() > > What is raw about the function? It right now gives data starting from the output of XLogReadRecord upto XLogRecGetTotalLen(xlogreader); length. Given that XLogReadRecord returns a pointer to the decoded record's header, I'm not sure it's the right choice. Actually, this function's intention(not an immediate use-case though), is to feed the WAL record to another function and then, say, repair a corrupted page given a base data page. As I said upthread, I'm open to removing this function for now, when a realistic need comes we can add it back. It also raised some concerns around the security and permissions. Thoughts? > Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a > NULL lsn, does it? Also, that's the default, why is it restated here? pg_get_wal_records_info needed that option (if end_lsn being the default, providing WAL info upto the end of WAL). Also, we can emit better error message ("invalid WAL start LSN") instead of generic one. I wanted to keep error message and code same across all the functions hence CALLED ON NULL INPUT option for pg_get_raw_wal_record. > > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC; > > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor; > > I don't think it's appropriate for pg_monitor to see all the data in the WAL. How about pg_read_server_files or some other? > > +-- pg_get_wal_stats() > > This seems like an exceedingly expensive way to compute this. Not just because > of doing the grouping, window etc, but also because it's serializing the > "data" field from pg_get_wal_records_info() just to never use it. With any > appreciatable amount of data the return value pg_get_wal_records_info() will > be serialized into a on-disk tuplestore. > > This is probably close to an order of magnitude slower than pg_waldump > --stats. Which imo renders this largely useless. Yeah that's true. Do you suggest having pg_get_wal_stats() a c-function like in v8 patch [1]? SEe some numbers at [2] with pg_get_wal_stats using pg_get_wal_records_info and pg_get_wal_records_info as a direct c-function like in v8 patch [1]. A direct c-function always fares better (84 msec vs 1400msec). > The column names don't seem great either. "tfpil"? "total fpi length" - tfpil wanted to keep it short - it's just an internal column name isn't it? The actual column name the user sees is fpi_length. > > +void > > +_PG_init(void) > > > +void > > +_PG_fini(void) > > Why have this stuff if it's not used? I kept it as a placeholder for future code additions, for instance test_decoding.c and ssl_passphrase_func.c has empty _PG_init(), _PG_fini(). If okay, I can mention there like "placeholder for now", otherwise I can remove it. > > +static XLogRecPtr > > +ValidateInputLSN(XLogRecPtr lsn) > > > +static XLogRecPtr > > +ValidateStartAndEndLSNs(XLogRecPtr start_lsn, XLogRecPtr end_lsn) > > +{ > > These two functions are largely redundant, that doesn't seem great. I will modify it in the next version. > > +Datum > > +pg_get_raw_wal_record(PG_FUNCTION_ARGS) > > Most of this has another copy in pg_get_wal_record_info(). Can more of this be > deduplicated? I will do, if we decide on whether or not to have the pg_get_raw_wal_record function at all? Please see my comments above. > > + initStringInfo(&temp); > > + desc->rm_desc(&temp, record); > > + appendStringInfo(&rec_desc, "%s", temp.data); > > + pfree(temp.data); > > + initStringInfo(&rec_blk_ref); > > This seems unnecessarily wasteful. You serialize into one stringinfo, just to > then copy that stringinfo into another stringinfo. Just to then allocate yet > another stringinfo. Yeah, I will remove it. Looks like all the rm_desc callbacks append to the passed-in buffer and not reset it, so we should be good. > > + /* Block references (detailed format). */ > > This comment seems copied from pg_waldump, but doesn't make sense here, > because there's no short format. Yes, I will remove it. > > + for (block_id = 0; block_id <= record->max_block_id; block_id++) > > + { > > To me duplicating this much code from waldump seems like a bad idea from a > maintainability POV. Even if we were to put the above code from pg_walinspect and pg_waldump into, say, walutils.c or some other existing file, there we had to make if (pg_walinspect) appendStringInfo else if (pg_waldump) printf() sort of thing, isn't it clumsy? Also, unnecessary if conditions need to be executed for every record. For maintainability, I added a note in pg_walinspect.c and pg_waldump.c to consider fixing things in both places (of course this might sound dumbest way of doing it, IMHO, it's sensible, given the if(pg_walinspect)-else if(pg_waldump) sorts of code that we need to do in the common functions). Thoughts? > > +/* > > + * Get info and data of all WAL records between start LSN and end LSN. > > + */ > > +static void > > +GetWALRecordsInfoInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn, > > This should likely use the infrastructure introduced in 5b81703787bfc1e6072c8e37125eba0c5598b807. Yes, I will change it. > > + for (;;) > > + { > > + (void) ReadNextXLogRecord(xlogreader, first_record); > > + > > + /* > > + * Let's not show the record info if it is spanning more than the > > + * end_lsn. EndRecPtr is "end+1" of the last read record, hence > > + * use "end" here. > > + */ > > + if ((xlogreader->EndRecPtr - 1) <= end_lsn) > > + { > > + GetXLogRecordInfo(xlogreader, xlogreader->currRecPtr, values, nulls, > > + PG_GET_WAL_RECORDS_INFO_COLS); > > + > > + tuplestore_putvalues(tupstore, tupdesc, values, nulls); > > + } > > + > > + /* Exit loop if read up to end_lsn. */ > > + if (xlogreader->EndRecPtr >= end_lsn) > > + break; > > Seems weird to have both of these conditions separately. Yeah. It is to handle some edge cases to print the WAL upto end_lsn and avoid waiting in read_local_xlog_page. I will change it. Actually, there's an open point as specified in [3]. Any thoughts on it? [1] https://www.postgresql.org/message-id/CALj2ACWhcbW_s4BXLyCpLWcCppZN9ncTXHbn4dv1F0Vpe0kxqA%40mail.gmail.com [2] with pg_get_wal_stats using pg_get_wal_stats: Time: 1394.919 ms (00:01.395) Time: 1403.199 ms (00:01.403) Time: 1408.138 ms (00:01.408) Time: 1397.670 ms (00:01.398) with pg_get_wal_stats as a c-function like in v8 patch [1]: Time: 84.319 ms Time: 84.303 ms Time: 84.208 ms Time: 84.452 ms use case: create extension pg_walinspect; create table foo(col int); insert into foo select * from generate_series(1, 100000); update foo set col = col*2+1; delete from foo; \timing on select * from pg_get_wal_stats('0/01000028'); \timing off output: postgres=# select * from pg_get_wal_stats('0/01000028'); resource_manager | count | count_percentage | record_length | record_length_percentage | fpi_length | fpi_length_percentage ------------------+--------+------------------+---------------+--------------------------+------------+----------------------- Storage | 13 | 0 | 546 | 0 | 0 | 0 CLOG | 1 | 0 | 30 | 0 | 0 | 0 Database | 2 | 0 | 84 | 0 | 0 | 0 Btree | 13078 | 3.1 | 1486990 | 4.97 | 461512 | 23.13 Heap | 404835 | 95.84 | 26354653 | 88.17 | 456576 | 22.88 Transaction | 721 | 0.17 | 178933 | 0.6 | 0 | 0 Heap2 | 3056 | 0.72 | 1131836 | 3.79 | 376932 | 18.89 Standby | 397 | 0.09 | 23226 | 0.08 | 0 | 0 XLOG | 316 | 0.07 | 716027 | 2.4 | 700164 | 35.09 (9 rows) [3] https://www.postgresql.org/message-id/CALj2ACVBST5Us6-eDz4q_Gem3rUHSC7AYNOB7tmp9Yqq6PHsXw%40mail.gmail.com Regards, Bharath Rupireddy.
pgsql-hackers by date: