Re: WAL usage calculation patch - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: WAL usage calculation patch |
Date | |
Msg-id | 20200329121944.GA79261@nol Whole thread Raw |
In response to | Re: WAL usage calculation patch (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: WAL usage calculation patch
|
List | pgsql-hackers |
On Mon, Mar 23, 2020 at 11:24:50PM +0900, Fujii Masao wrote: > > > Here are the comments for 0001 patch. > > > > + /* > > + * Report a full page image constructed for the WAL record > > + */ > > + pgWalUsage.wal_fp_records++; > > > > Isn't it better to use "fpw" or "fpi" for the variable name rather than > > "fp" here? In other places, "fpw" and "fpi" are used for full page > > writes/image. Agreed, I went with fpw. > > ISTM that this counter could be incorrect if XLogInsertRecord() determines to > > calculate again whether FPI is necessary or not. No? IOW, this issue could > > happen if XLogInsert() calls XLogRecordAssemble() multiple times in > > its do-while loop. Isn't this problematic? Yes probably. I also see while adding support for EXPLAIN/auto_explain that the previous approach was incrementing both records and fpw_records, while it should be only one of those for each record. I fixed this using the approach I previously mentionned in [1] which seems to work just fine. > > + long wal_bytes; /* size of wal records produced */ > > > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable > > rather than long? Yes indeed. I switched to uint64, and modified everything accordingly (and changed pgss to output numeric as there's no other way to handle unsigned int8) > > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); > > > > bufusage_space should be walusage_space here? Good catch, fixed. > > /* > > * Finish parallel execution. We wait for parallel workers to finish, and > > * accumulate their buffer usage. > > */ > > > > There are some comments mentioning buffer usage, in execParallel.c. > > For example, the top comment for ExecParallelFinish(), as the above. > > These should be updated. I went through all the file and quickly checked in other places, and I think I fixed all required comments. > Here are the comments for 0002 patch. > > + OUT wal_write_bytes int8, > + OUT wal_write_records int8, > + OUT wal_write_fp_records int8 > > Isn't "write" part in the column names confusing because it's WAL > *generated* (not written) by the statement? Agreed, I simply dropped the "_write" part everywhere. > +RETURNS SETOF record > +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' > +LANGUAGE C STRICT VOLATILE; > > PARALLEL SAFE should be specified? Indeed, fixed. > +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ > > ISTM it's good timing to have also pg_stat_statements--1.8.sql since > the definition of pg_stat_statements() is changed. Thought? As mentionned in other pgss thread, I think the general agreement is to never provide full script anymore, so I didn't changed that. > +-- CHECKPOINT before WAL tests to ensure test stability > +CHECKPOINT; > > Is this true? I thought you added this because the number of FPI > should be larger than zero in the subsequent test. No? But there > seems no such test. I'm not excited about adding the test checking > the number of FPI because it looks fragile, though... It should ensure a FPW for each new block touch, but yes that's quite fragile. Since I fixed the record / FPW record counters, I saw that this was actually already broken as there was a mix of FPW and non-FPW, so I dropped the checkpoint and just tested (wal_record + wal_fpw_record) instead. > +UPDATE pgss_test SET b = '333' WHERE a = 3 \; > +UPDATE pgss_test SET b = '444' WHERE a = 4 ; > > Could you tell me why several queries need to be run to test > the WAL usage? Isn't running a few query enough for the test purpase? As far as I can see it's used to test multiple scenario (single command / multiple commands in or outside explicit transaction). It shouldn't add a lot of overhead and since some commands are issues with "\;" it's also testing proper query string isolation when multi-command query string is provided, which doesn't seem like a bad idea. I didn't changed that but I'm not opposed to remove some of the updates if needed. Also, to answer Amit Kapila's comments about WAL records and parallel query, I added support for both EXPLAIN and auto_explain (tab completion and documentation are also updated), and using a simple table with an index, with forced parallelism and no leader participation and concurrent update on the same table, I could test that WAL usage is working as expected: rjuju=# explain (analyze, wal, verbose) select * from t1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Gather (cost=0.00..8805.05 rows=100010 width=14) (actual time=8.695..47.592 rows=100010 loops=1) Output: id, val Workers Planned: 2 Workers Launched: 2 WAL: records=204 bytes=86198 -> Parallel Seq Scan on public.t1 (cost=0.00..8805.05 rows=50005 width=14) (actual time=0.056..29.112 rows=50005 loops Output: id, val WAL: records=204 bytes=86198 Worker 0: actual time=0.060..28.995 rows=49593 loops=1 WAL: records=105 bytes=44222 Worker 1: actual time=0.052..29.230 rows=50417 loops=1 WAL: records=99 bytes=41976 Planning Time: 0.038 ms Execution Time: 53.957 ms (14 rows) and the same query when nothing end up being modified: rjuju=# explain (analyze, wal, verbose) select * from t1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Gather (cost=0.00..8805.05 rows=100010 width=14) (actual time=9.413..48.187 rows=100010 loops=1) Output: id, val Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on public.t1 (cost=0.00..8805.05 rows=50005 width=14) (actual time=0.033..24.697 rows=50005 loops Output: id, val Worker 0: actual time=0.028..24.786 rows=50447 loops=1 Worker 1: actual time=0.038..24.609 rows=49563 loops=1 Planning Time: 0.282 ms Execution Time: 55.643 ms (10 rows) So it seems to me that WAL usage infrastructure for parallel query is working just fine. I added the EXPLAIN/auto_explain in a separate commit just in case. [1] https://www.postgresql.org/message-id/CAOBaU_aECK1Z7Nn+x=MhvEwrJzK8wyPsPtWAafjqtZN1fYjEmg@mail.gmail.com
Attachment
pgsql-hackers by date: