Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

From vignesh C
Subject Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date
Msg-id CALDaNm2iFT5Ubx_EYUq2x7P1PgQ8jJ=SiML3h4C9rn8wfhURbw@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
List pgsql-hackers
On Wed, 18 Jan 2023 at 03:30, Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> v49 attached
>
> On Tue, Jan 17, 2023 at 2:12 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2023-01-17 12:22:14 -0500, Melanie Plageman wrote:
> >
> > > > > +typedef struct PgStat_BackendIO
> > > > > +{
> > > > > +     PgStat_Counter data[IOCONTEXT_NUM_TYPES][IOOBJECT_NUM_TYPES][IOOP_NUM_TYPES];
> > > > > +} PgStat_BackendIO;
> > > >
> > > > Would it bother you if we swapped the order of iocontext and iobject here and
> > > > related places? It makes more sense to me semantically, and should now be
> > > > pretty easy, code wise.
> > >
> > > So, thinking about this I started noticing inconsistencies in other
> > > areas around this order:
> > > For example: ordering of objects mentioned in commit messages and comments,
> > > ordering of parameters (like in pgstat_count_io_op() [currently in
> > > reverse order]).
> > >
> > > I think we should make a final decision about this ordering and then
> > > make everywhere consistent (including ordering in the view).
> > >
> > > Currently the order is:
> > > BackendType
> > >   IOContext
> > >     IOObject
> > >       IOOp
> > >
> > > You are suggesting this order:
> > > BackendType
> > >   IOObject
> > >     IOContext
> > >       IOOp
> > >
> > > Could you explain what you find more natural about this ordering (as I
> > > find the other more natural)?
> >
> > The object we're performing IO on determines more things than the context. So
> > it just seems like the natural hierarchical fit. The context is a sub-category
> > of the object. Consider how it'll look like if we also have objects for 'wal',
> > 'temp files'. It'll make sense to group by just the object, but it won't make
> > sense to group by just the context.
> >
> > If it were trivial to do I'd use a different IOContext for each IOObject. But
> > it'd make it much harder. So there'll just be a bunch of values of IOContext
> > that'll only be used for one or a subset of the IOObjects.
> >
> >
> > The reason to put BackendType at the top is pragmatic - one backend is of a
> > single type, but can do IO for all kinds of objects/contexts. So any other
> > hierarchy would make the locking etc much harder.
> >
> >
> > > This is one possible natural sentence with these objects:
> > >
> > > During COPY, a client backend may read in data from a permanent
> > > relation.
> > > This order is:
> > > IOContext
> > >   BackendType
> > >     IOOp
> > >       IOObject
> > >
> > > I think English sentences are often structured subject, verb, object --
> > > but in our case, we have an extra thing that doesn't fit neatly
> > > (IOContext).
> >
> > "..., to avoid polluting the buffer cache it uses the bulk (read|write)
> > strategy".
> >
> >
> > > Also, IOOp in a sentence would be in the middle (as the
> > > verb). I made it last because a) it feels like the smallest unit b) it
> > > would make the code a lot more annoying if it wasn't last.
> >
> > Yea, I think pragmatically that is the right choice.
>
> I have changed the order and updated all the places using
> PgStat_BktypeIO as well as in all locations in which it should be
> ordered for consistency (that I could find in the pass I did) -- e.g.
> the view definition, function signatures, comments, commit messages,
> etc.
>
> > > > > +-- Change the tablespace so that the table is rewritten directly, then SELECT
> > > > > +-- from it to cause it to be read back into shared buffers.
> > > > > +SET allow_in_place_tablespaces = true;
> > > > > +CREATE TABLESPACE regress_io_stats_tblspc LOCATION '';
> > > >
> > > > Perhaps worth doing this in tablespace.sql, to avoid the additional
> > > > checkpoints done as part of CREATE/DROP TABLESPACE?
> > > >
> > > > Or, at least combine this with the CHECKPOINTs above?
> > >
> > > I see a checkpoint is requested when dropping the tablespace if not all
> > > the files in it are deleted. It seems like if the DROP TABLE for the
> > > permanent table is before the explicit checkpoints in the test, then the
> > > DROP TABLESPACE will not cause an additional checkpoint.
> >
> > Unfortunately, that's not how it works :(. See the comment above mdunlink():
> >
> > > * For regular relations, we don't unlink the first segment file of the rel,
> > > * but just truncate it to zero length, and record a request to unlink it after
> > > * the next checkpoint.  Additional segments can be unlinked immediately,
> > > * however.  Leaving the empty file in place prevents that relfilenumber
> > > * from being reused.  The scenario this protects us from is:
> > > ...
> >
> >
> > > Is this what you are suggesting? Dropping the temporary table should not
> > > have an effect on this.
> >
> > I was wondering about simply moving that portion of the test to
> > tablespace.sql, where we already created a tablespace.
> >
> >
> > An alternative would be to propose splitting tablespace.sql into one portion
> > running at the start of parallel_schedule, and one at the end. Historically,
> > we needed tablespace.sql to be optional due to causing problems when
> > replicating to another instance on the same machine, but now we have
> > allow_in_place_tablespaces.
>
> It seems like the best way would be to split up the tablespace test file
> as you suggested and drop the tablespace at the end of the regression
> test suite. There could be other tests that could use a tablespace.
> Though what I wrote is kind of tablespace test coverage, if this
> rewriting behavior no longer happened when doing alter table set
> tablespace, we would want to come up with a new test which exercised
> that code to count those IO stats, not simply delete it from the
> tablespace tests.
>
> > > > SELECT pg_relation_size('test_io_local') / current_setting('block_size')::int8 > 100;
> > > >
> > > > Better toast compression or such could easily make test_io_local smaller than
> > > > it's today. Seeing that it's too small would make it easier to understand the
> > > > failure.
> > >
> > > Good idea. So, I used pg_table_size() because it seems like
> > > pg_relation_size() does not include the toast relations. However, I'm
> > > not sure this is a good idea, because pg_table_size() includes FSM and
> > > visibility map. Should I write a query to get the toast relation name
> > > and add pg_relation_size() of that relation and the main relation?
> >
> > I think it's the right thing to just include the relation size. Your queries
> > IIRC won't use the toast table or other forks. So I'd leave it at just
> > pg_relation_size().
>
> I did notice that this test wasn't using the toast table for the
> toastable column -- but you mentioned better toast compression affecting
> the future test stability, so I'm confused.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
4f74f5641d53559ec44e74d5bf552e167fdd5d20 ===
=== applying patch
./v49-0003-Add-system-view-tracking-IO-ops-per-backend-type.patch
....
patching file src/test/regress/expected/rules.out
Hunk #1 FAILED at 1876.
1 out of 1 hunk FAILED -- saving rejects to file
src/test/regress/expected/rules.out.rej

[1] - http://cfbot.cputube.org/patch_41_3272.log

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Adjust the description of OutputPluginCallbacks in pg-doc
Next
From: Amit Kapila
Date:
Subject: Re: Adjust the description of OutputPluginCallbacks in pg-doc