Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Date | |
Msg-id | CAAKRu_bCU3rqyjvWYszSHMvQRRYtJjmzV49MJqsX6wY9ftoazA@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
List | pgsql-hackers |
v48 attached. On Fri, Jan 13, 2023 at 6:36 PM Andres Freund <andres@anarazel.de> wrote: > On 2023-01-13 13:38:15 -0500, Melanie Plageman wrote: > > From f8c9077631169a778c893fd16b7a973ad5725f2a Mon Sep 17 00:00:00 2001 > > From: Andres Freund <andres@anarazel.de> > > Date: Fri, 9 Dec 2022 18:23:19 -0800 > > Subject: [PATCH v47 2/5] pgstat: Infrastructure to track IO operations > > diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c > > index 0fa5370bcd..608c3b59da 100644 > > --- a/src/backend/utils/activity/pgstat.c > > +++ b/src/backend/utils/activity/pgstat.c > > Reminder to self: Need to bump PGSTAT_FILE_FORMAT_ID before commit. > > Perhaps you could add a note about that to the commit message? > done > > > > @@ -359,6 +360,15 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { > > .snapshot_cb = pgstat_checkpointer_snapshot_cb, > > }, > > > > + [PGSTAT_KIND_IO] = { > > + .name = "io_ops", > > That should be "io" now I think? > Oh no! I didn't notice this was broken. I've added pg_stat_have_stats() to the IO stats tests now. It would be nice if pgstat_get_kind_from_str() could be used in pg_stat_reset_shared() to avoid having to remember to change both. It doesn't really work because we want to be able to throw the error message in pg_stat_reset_shared() when the user input is wrong -- not the one in pgstat_get_kind_from_str(). Also: - Since recovery_prefetch doesn't have a statistic kind, it doesn't fit well into this paradigm - Only a subset of the statistics kinds are reset through this function - bgwriter and checkpointer share a reset target I added a comment -- perhaps that's all I can do? On a separate note, should we be setting have_[io/slru/etc]stats to false in the reset all functions? > > > +/* > > + * Check that stats have not been counted for any combination of IOContext, > > + * IOObject, and IOOp which are not tracked for the passed-in BackendType. The > > + * passed-in PgStat_BackendIO must contain stats from the BackendType specified > > + * by the second parameter. Caller is responsible for locking the passed-in > > + * PgStat_BackendIO, if needed. > > + */ > > Other PgStat_Backend* structs are just for pending data. Perhaps we could > rename it slightly to make that clearer? PgStat_BktypeIO? > PgStat_IOForBackendType? or a similar variation? I've done this. > > > +bool > > +pgstat_bktype_io_stats_valid(PgStat_BackendIO *backend_io, > > + BackendType bktype) > > +{ > > + bool bktype_tracked = pgstat_tracks_io_bktype(bktype); > > + > > + for (IOContext io_context = IOCONTEXT_FIRST; > > + io_context < IOCONTEXT_NUM_TYPES; io_context++) > > + { > > + for (IOObject io_object = IOOBJECT_FIRST; > > + io_object < IOOBJECT_NUM_TYPES; io_object++) > > + { > > + /* > > + * Don't bother trying to skip to the next loop iteration if > > + * pgstat_tracks_io_object() would return false here. We still > > + * need to validate that each counter is zero anyway. > > + */ > > + for (IOOp io_op = IOOP_FIRST; io_op < IOOP_NUM_TYPES; io_op++) > > + { > > + if ((!bktype_tracked || !pgstat_tracks_io_op(bktype, io_context, io_object, io_op)) && > > + backend_io->data[io_context][io_object][io_op] != 0) > > + return false; > > Hm, perhaps this could be broken up into multiple lines? Something like > > /* no stats, so nothing to validate */ > if (backend_io->data[io_context][io_object][io_op] == 0) > continue; > > /* something went wrong if have stats for something not tracked */ > if (!bktype_tracked || > !pgstat_tracks_io_op(bktype, io_context, io_object, io_op)) > return false; I've done this. > > +typedef struct PgStat_BackendIO > > +{ > > + PgStat_Counter data[IOCONTEXT_NUM_TYPES][IOOBJECT_NUM_TYPES][IOOP_NUM_TYPES]; > > +} PgStat_BackendIO; > > Would it bother you if we swapped the order of iocontext and iobject here and > related places? It makes more sense to me semantically, and should now be > pretty easy, code wise. So, thinking about this I started noticing inconsistencies in other areas around this order: For example: ordering of objects mentioned in commit messages and comments, ordering of parameters (like in pgstat_count_io_op() [currently in reverse order]). I think we should make a final decision about this ordering and then make everywhere consistent (including ordering in the view). Currently the order is: BackendType IOContext IOObject IOOp You are suggesting this order: BackendType IOObject IOContext IOOp Could you explain what you find more natural about this ordering (as I find the other more natural)? This is one possible natural sentence with these objects: During COPY, a client backend may read in data from a permanent relation. This order is: IOContext BackendType IOOp IOObject I think English sentences are often structured subject, verb, object -- but in our case, we have an extra thing that doesn't fit neatly (IOContext). Also, IOOp in a sentence would be in the middle (as the verb). I made it last because a) it feels like the smallest unit b) it would make the code a lot more annoying if it wasn't last. WRT IOObject and IOContext, is there a future case for which having IOObject first will be better or lead to fewer mistakes? I actually see loads of places where this needs to be made consistent. > > > +/* shared version of PgStat_IO */ > > +typedef struct PgStatShared_IO > > +{ > > Maybe /* PgStat_IO in shared memory */? > updated. > > > Subject: [PATCH v47 3/5] pgstat: Count IO for relations > > Nearly happy with this now. See one minor nit below. > > I don't love the counting in register_dirty_segment() and mdsyncfiletag(), but > I don't have a better idea, and it doesn't seem too horrible. You don't like it because such things shouldn't be in md.c -- since we went to the trouble of having function pointers and making it general? > > > @@ -1441,6 +1474,28 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, > > > > UnlockBufHdr(buf, buf_state); > > > > + if (oldFlags & BM_VALID) > > + { > > + /* > > + * When a BufferAccessStrategy is in use, blocks evicted from shared > > + * buffers are counted as IOOP_EVICT in the corresponding context > > + * (e.g. IOCONTEXT_BULKWRITE). Shared buffers are evicted by a > > + * strategy in two cases: 1) while initially claiming buffers for the > > + * strategy ring 2) to replace an existing strategy ring buffer > > + * because it is pinned or in use and cannot be reused. > > + * > > + * Blocks evicted from buffers already in the strategy ring are > > + * counted as IOOP_REUSE in the corresponding strategy context. > > + * > > + * At this point, we can accurately count evictions and reuses, > > + * because we have successfully claimed the valid buffer. Previously, > > + * we may have been forced to release the buffer due to concurrent > > + * pinners or erroring out. > > + */ > > + pgstat_count_io_op(from_ring ? IOOP_REUSE : IOOP_EVICT, > > + IOOBJECT_RELATION, *io_context); > > + } > > + > > if (oldPartitionLock != NULL) > > { > > BufTableDelete(&oldTag, oldHash); > > There's no reason to do this while we still hold the buffer partition lock, > right? That's a highly contended lock, and we can just move the counting a few > lines down. Thanks, I've done this. > > > @@ -1410,6 +1432,9 @@ mdsyncfiletag(const FileTag *ftag, char *path) > > if (need_to_close) > > FileClose(file); > > > > + if (result >= 0) > > + pgstat_count_io_op(IOOP_FSYNC, IOOBJECT_RELATION, IOCONTEXT_NORMAL); > > + > > I'd lean towards doing this unconditionally, it's still an fsync if it > failed... Not that it matters. Good point. We still incurred the costs if not benefited from the effects. I've updated this. > > > Subject: [PATCH v47 4/5] Add system view tracking IO ops per backend type > > Note to self + commit message: Remember the need to do a catversion bump. Noted. > > > +-- pg_stat_io test: > > +-- verify_heapam always uses a BAS_BULKREAD BufferAccessStrategy. > > Maybe add that "whereas a sequential scan does not, see ..."? Updated. > > > This allows > > +-- us to reliably test that pg_stat_io BULKREAD reads are being captured > > +-- without relying on the size of shared buffers or on an expensive operation > > +-- like CREATE DATABASE. > > CREATE / DROP TABLESPACE is also pretty expensive, but I don't have a better > idea. I've added a comment. > > > +-- Create an alternative tablespace and move the heaptest table to it, causing > > +-- it to be rewritten. > > IIRC the point of that is that it reliably evicts all the buffers from s_b, > correct? If so, mention that? Done. > > > +Datum > > +pg_stat_get_io(PG_FUNCTION_ARGS) > > +{ > > + ReturnSetInfo *rsinfo; > > + PgStat_IO *backends_io_stats; > > + Datum reset_time; > > + > > + InitMaterializedSRF(fcinfo, 0); > > + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; > > + > > + backends_io_stats = pgstat_fetch_stat_io(); > > + > > + reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp); > > + > > + for (BackendType bktype = B_INVALID; bktype < BACKEND_NUM_TYPES; bktype++) > > + { > > + bool bktype_tracked; > > + Datum bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype)); > > + PgStat_BackendIO *bktype_stats = &backends_io_stats->stats[bktype]; > > + > > + /* > > + * For those BackendTypes without IO Operation stats, skip > > + * representing them in the view altogether. We still loop through > > + * their counters so that we can assert that all values are zero. > > + */ > > + bktype_tracked = pgstat_tracks_io_bktype(bktype); > > How about instead just doing Assert(pgstat_bktype_io_stats_valid(...))? That > deduplicates the logic for the asserts, and avoids doing the full loop when > assertions aren't enabled anyway? > I've done this and added a comment. > > > > +-- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes > > +-- and fsyncs. > > +-- The second checkpoint ensures that stats from the first checkpoint have been > > +-- reported and protects against any potential races amongst the table > > +-- creation, a possible timing-triggered checkpoint, and the explicit > > +-- checkpoint in the test. > > There's a comment about the subsequent checkpoints earlier in the file, and I > think the comment is slightly more precise. Mybe just reference the earlier comment? > > > > +-- Change the tablespace so that the table is rewritten directly, then SELECT > > +-- from it to cause it to be read back into shared buffers. > > +SET allow_in_place_tablespaces = true; > > +CREATE TABLESPACE regress_io_stats_tblspc LOCATION ''; > > Perhaps worth doing this in tablespace.sql, to avoid the additional > checkpoints done as part of CREATE/DROP TABLESPACE? > > Or, at least combine this with the CHECKPOINTs above? I see a checkpoint is requested when dropping the tablespace if not all the files in it are deleted. It seems like if the DROP TABLE for the permanent table is before the explicit checkpoints in the test, then the DROP TABLESPACE will not cause an additional checkpoint. Is this what you are suggesting? Dropping the temporary table should not have an effect on this. > > > +-- Drop the table so we can drop the tablespace later. > > +DROP TABLE test_io_shared; > > +-- Test that the follow IOCONTEXT_LOCAL IOOps are tracked in pg_stat_io: > > +-- - eviction of local buffers in order to reuse them > > +-- - reads of temporary table blocks into local buffers > > +-- - writes of local buffers to permanent storage > > +-- - extends of temporary tables > > +-- Set temp_buffers to a low value so that we can trigger writes with fewer > > +-- inserted tuples. Do so in a new session in case temporary tables have been > > +-- accessed by previous tests in this session. > > +\c > > +SET temp_buffers TO '1MB'; > > I'd set it to the actual minimum '100' (in pages). Perhaps that'd allow to > make test_io_local a bit smaller? I've done this. > > > +CREATE TEMPORARY TABLE test_io_local(a int, b TEXT); > > +SELECT sum(extends) AS io_sum_local_extends_before > > + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset > > +SELECT sum(evictions) AS io_sum_local_evictions_before > > + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset > > +SELECT sum(writes) AS io_sum_local_writes_before > > + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset > > +-- Insert tuples into the temporary table, generating extends in the stats. > > +-- Insert enough values that we need to reuse and write out dirty local > > +-- buffers, generating evictions and writes. > > +INSERT INTO test_io_local SELECT generate_series(1, 8000) as id, repeat('a', 100); > > +SELECT sum(reads) AS io_sum_local_reads_before > > + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset > > Maybe add something like > > SELECT pg_relation_size('test_io_local') / current_setting('block_size')::int8 > 100; > > Better toast compression or such could easily make test_io_local smaller than > it's today. Seeing that it's too small would make it easier to understand the > failure. Good idea. So, I used pg_table_size() because it seems like pg_relation_size() does not include the toast relations. However, I'm not sure this is a good idea, because pg_table_size() includes FSM and visibility map. Should I write a query to get the toast relation name and add pg_relation_size() of that relation and the main relation? > > > +SELECT sum(evictions) AS io_sum_local_evictions_after > > + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset > > +SELECT sum(reads) AS io_sum_local_reads_after > > + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset > > +SELECT sum(writes) AS io_sum_local_writes_after > > + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset > > +SELECT sum(extends) AS io_sum_local_extends_after > > + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset > > This could just be one select with multiple columns? > > I think if you use something like \gset io_sum_local_after_ you can also avoid > the need to repeat "io_sum_local_" so many times. Thanks. I didn't realize. I've fixed this throughout the test file. On Mon, Jan 16, 2023 at 4:42 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote: > I missed a couple of versions, but I think the docs are clearer now. > I'm torn on losing some of the detail, but overall I do think it's a > good trade-off. Moving some details out to after the table does keep > the bulk of the view documentation more readable, and the "inform > database tuning" part is great. I really like the idea of a separate > Interpreting Statistics section, but for now this works. > > >+ <literal>vacuum</literal>: I/O operations performed outside of shared > >+ buffers while vacuuming and analyzing permanent relations. > > Why only permanent relations? Are temporary relations treated > differently? I imagine if someone has a temp-table-heavy workload that > requires regularly vacuuming and analyzing those relations, this point > may be confusing without some additional explanation. Ah, yes. This is a bit confusing. We don't use buffer access strategies when operating on temp relations, so vacuuming them is counted in IO Context normal. I've added this information to the docs but now that definition is a bit long. Perhaps it should be a note? That seems like it would draw too much attention to this detail, though... - Melanie
Attachment
pgsql-hackers by date: