Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Date | |
Msg-id | CALDaNm2iFT5Ubx_EYUq2x7P1PgQ8jJ=SiML3h4C9rn8wfhURbw@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
|
List | pgsql-hackers |
On Wed, 18 Jan 2023 at 03:30, Melanie Plageman <melanieplageman@gmail.com> wrote: > > v49 attached > > On Tue, Jan 17, 2023 at 2:12 PM Andres Freund <andres@anarazel.de> wrote: > > On 2023-01-17 12:22:14 -0500, Melanie Plageman wrote: > > > > > > > +typedef struct PgStat_BackendIO > > > > > +{ > > > > > + PgStat_Counter data[IOCONTEXT_NUM_TYPES][IOOBJECT_NUM_TYPES][IOOP_NUM_TYPES]; > > > > > +} PgStat_BackendIO; > > > > > > > > Would it bother you if we swapped the order of iocontext and iobject here and > > > > related places? It makes more sense to me semantically, and should now be > > > > pretty easy, code wise. > > > > > > So, thinking about this I started noticing inconsistencies in other > > > areas around this order: > > > For example: ordering of objects mentioned in commit messages and comments, > > > ordering of parameters (like in pgstat_count_io_op() [currently in > > > reverse order]). > > > > > > I think we should make a final decision about this ordering and then > > > make everywhere consistent (including ordering in the view). > > > > > > Currently the order is: > > > BackendType > > > IOContext > > > IOObject > > > IOOp > > > > > > You are suggesting this order: > > > BackendType > > > IOObject > > > IOContext > > > IOOp > > > > > > Could you explain what you find more natural about this ordering (as I > > > find the other more natural)? > > > > The object we're performing IO on determines more things than the context. So > > it just seems like the natural hierarchical fit. The context is a sub-category > > of the object. Consider how it'll look like if we also have objects for 'wal', > > 'temp files'. It'll make sense to group by just the object, but it won't make > > sense to group by just the context. > > > > If it were trivial to do I'd use a different IOContext for each IOObject. But > > it'd make it much harder. So there'll just be a bunch of values of IOContext > > that'll only be used for one or a subset of the IOObjects. > > > > > > The reason to put BackendType at the top is pragmatic - one backend is of a > > single type, but can do IO for all kinds of objects/contexts. So any other > > hierarchy would make the locking etc much harder. > > > > > > > This is one possible natural sentence with these objects: > > > > > > During COPY, a client backend may read in data from a permanent > > > relation. > > > This order is: > > > IOContext > > > BackendType > > > IOOp > > > IOObject > > > > > > I think English sentences are often structured subject, verb, object -- > > > but in our case, we have an extra thing that doesn't fit neatly > > > (IOContext). > > > > "..., to avoid polluting the buffer cache it uses the bulk (read|write) > > strategy". > > > > > > > Also, IOOp in a sentence would be in the middle (as the > > > verb). I made it last because a) it feels like the smallest unit b) it > > > would make the code a lot more annoying if it wasn't last. > > > > Yea, I think pragmatically that is the right choice. > > I have changed the order and updated all the places using > PgStat_BktypeIO as well as in all locations in which it should be > ordered for consistency (that I could find in the pass I did) -- e.g. > the view definition, function signatures, comments, commit messages, > etc. > > > > > > +-- Change the tablespace so that the table is rewritten directly, then SELECT > > > > > +-- from it to cause it to be read back into shared buffers. > > > > > +SET allow_in_place_tablespaces = true; > > > > > +CREATE TABLESPACE regress_io_stats_tblspc LOCATION ''; > > > > > > > > Perhaps worth doing this in tablespace.sql, to avoid the additional > > > > checkpoints done as part of CREATE/DROP TABLESPACE? > > > > > > > > Or, at least combine this with the CHECKPOINTs above? > > > > > > I see a checkpoint is requested when dropping the tablespace if not all > > > the files in it are deleted. It seems like if the DROP TABLE for the > > > permanent table is before the explicit checkpoints in the test, then the > > > DROP TABLESPACE will not cause an additional checkpoint. > > > > Unfortunately, that's not how it works :(. See the comment above mdunlink(): > > > > > * For regular relations, we don't unlink the first segment file of the rel, > > > * but just truncate it to zero length, and record a request to unlink it after > > > * the next checkpoint. Additional segments can be unlinked immediately, > > > * however. Leaving the empty file in place prevents that relfilenumber > > > * from being reused. The scenario this protects us from is: > > > ... > > > > > > > Is this what you are suggesting? Dropping the temporary table should not > > > have an effect on this. > > > > I was wondering about simply moving that portion of the test to > > tablespace.sql, where we already created a tablespace. > > > > > > An alternative would be to propose splitting tablespace.sql into one portion > > running at the start of parallel_schedule, and one at the end. Historically, > > we needed tablespace.sql to be optional due to causing problems when > > replicating to another instance on the same machine, but now we have > > allow_in_place_tablespaces. > > It seems like the best way would be to split up the tablespace test file > as you suggested and drop the tablespace at the end of the regression > test suite. There could be other tests that could use a tablespace. > Though what I wrote is kind of tablespace test coverage, if this > rewriting behavior no longer happened when doing alter table set > tablespace, we would want to come up with a new test which exercised > that code to count those IO stats, not simply delete it from the > tablespace tests. > > > > > SELECT pg_relation_size('test_io_local') / current_setting('block_size')::int8 > 100; > > > > > > > > Better toast compression or such could easily make test_io_local smaller than > > > > it's today. Seeing that it's too small would make it easier to understand the > > > > failure. > > > > > > Good idea. So, I used pg_table_size() because it seems like > > > pg_relation_size() does not include the toast relations. However, I'm > > > not sure this is a good idea, because pg_table_size() includes FSM and > > > visibility map. Should I write a query to get the toast relation name > > > and add pg_relation_size() of that relation and the main relation? > > > > I think it's the right thing to just include the relation size. Your queries > > IIRC won't use the toast table or other forks. So I'd leave it at just > > pg_relation_size(). > > I did notice that this test wasn't using the toast table for the > toastable column -- but you mentioned better toast compression affecting > the future test stability, so I'm confused. The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID 4f74f5641d53559ec44e74d5bf552e167fdd5d20 === === applying patch ./v49-0003-Add-system-view-tracking-IO-ops-per-backend-type.patch .... patching file src/test/regress/expected/rules.out Hunk #1 FAILED at 1876. 1 out of 1 hunk FAILED -- saving rejects to file src/test/regress/expected/rules.out.rej [1] - http://cfbot.cputube.org/patch_41_3272.log Regards, Vignesh
pgsql-hackers by date: