Re: per backend I/O statistics - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: per backend I/O statistics
Date
Msg-id ZzX7Y5i56UrOtlRC@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: per backend I/O statistics  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
Hi,

On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote:
> On Fri, Nov 08, 2024 at 02:09:30PM +0000, Bertrand Drouvot wrote:
> > 1. one assertion in pgstat_drop_entry_internal() is not necessary true anymore 
> > with this new stat kind. So, adding an extra bool as parameter to take care of it.
> 
> Why?  I have to admit that the addition of this argument at this level
> specific to a new stats kind feels really weird and it looks like a
> layer violation, while this happens only when resetting the whole
> pgstats state because we have loaded a portion of the stats but the
> file loaded was in such a state that we don't have a consistent
> picture.

Yes. I agree that looks weird and I don't like it that much. But the assertion
is not true anymore. If you:

- change the arguments to false in the pgstat_drop_entry_internal() call in 
pgstat_drop_all_entries()
- start the engine
- kill -9 postgres
- restart the engine

You'll see the assert failing due to the startup process. I don't think it is
doing something wrong though, just populating its backend stats. Do you have
another view on it?

> > 2. a new struct "PgStat_Backend" is created and does contain the backend type.
> > The backend type is used for filtering purpose when the stats are displayed.
> 
> Is that necessary?  We can guess that from the PID with a join in
> pg_stat_activity.  pg_stat_get_backend_io() from 0004 returns a set of
> tuples for a specific PID, as well..

How would that work for someone doing "select * from pg_stat_get_backend_io(<pid>)"?

Do you mean copy/paste part of the code from pg_stat_get_activity() into
pg_stat_get_backend_io() to get the backend type? That sounds like an idea,
I'll have a look at it.

> +    /* save the bktype */
> +    if (kind == PGSTAT_KIND_PER_BACKEND)
> +        bktype = ((PgStatShared_Backend *) header)->stats.bktype;
> [...]
> +    /* restore the bktype */
> +    if (kind == PGSTAT_KIND_PER_BACKEND)
> +        ((PgStatShared_Backend *) header)->stats.bktype = bktype;
> 
> Including the backend type results in these blips in
> shared_stat_reset_contents() which should not have anything related 
> to stats kinds and should remain neutral, as well.

Yeah, we can simply get rid of it if we remove the backend type in PgStat_Backend.

> pgstat_prep_per_backend_pending() is used only in pgstat_io.c, so I
> guess that it should be static in this file rather than declared in
> pgstat.h?

Good catch, thanks!

> +typedef struct PgStat_PendingIO
> 
> Perhaps this part should use a separate structure named
> "BackendPendingIO"?  The definition of the structure has to be in
> pgstat.h as this is the pending_size of the new stats kind.  It looks
> like it would be cleaner to keep PgStat_PendingIO local to
> pgstat_io.c, and define PgStat_PendingIO based on
> PgStat_BackendPendingIO?

I see what you meean, what about simply "PgStat_BackendPending" in pgstat.h?

> +    /*
> +     * Do serialize or not this kind of stats.
> +     */
> +    bool        to_serialize:1;
> 
> Not sure that "serialize" is the best term that applies here.  For
> pgstats entries, serialization refers to the matter of writing their
> entries with a "serialized" name because they have an undefined number
> when stored locally after a reload.  I'd suggest to split this concept
> into its own patch, rename the flag as "write_to_file" (or what you
> think is suited), and also apply the flag in the fixed-numbered loop
> done in pgstat_write_statsfile() before going through the dshash.

Makes sense to create its own patch, will have a look.

> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        <indexterm>
> +         <primary>pg_stat_reset_single_backend_io_counters</primary>
> +        </indexterm>
> +        <function>pg_stat_reset_single_backend_io_counters</function> ( <type>int4</type> )
> +        <returnvalue>void</returnvalue>
> 
> This should document that the input argument is a PID.

Yeap, will add.

> 
> Is pg_my_stat_io() the best name ever? 

Not 100% sure, but there is already "pg_my_temp_schema" so I thought that
pg_my_stat_io would not be that bad. Open to suggestions though.

> I'd suggest to just merge 0004 with 0001.

Sure.

> Structurally, it may be cleaner to keep all the callbacks and the
> backend-I/O specific logic into a separate file, perhaps
> pgstat_io_backend.c or pgstat_backend_io?

Yeah, I was wondering the same. What about pgstat_backend.c (that would contain
only one "section" dedicated to IO currently)?

> > ==== 0002
> > 
> > Merge both IO stats flush callback. There is no need to keep both callbacks.
> > 
> > Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES,
> > IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES.
> 
> Not sure to be a fan of that, TBH, still I get the performance
> argument of the matter.  Each stats kind has its own flush path, and
> this assumes that we'll never going to diverge in terms of the
> information maintained for each backend and the existing pg_stat_io.
> Perhaps there could be a divergence at some point?

Yeah perhaps, so in case of divergence let's split "again"? I mean for the moment
I don't see any reason to keep both and we have pros related to efficiency during
flush.

> > === Remarks
> > 
> > R1. The key field is still "objid" even if it stores the proc number. I think
> > that's fine and there is no need to rename it as it's related comment states:
> 
> Let's not change the object ID and the hash key.  The approach you are
> using here with a variable-numbered stats kinds and a lookup with the
> procnumber is sensible here.

Agree, thanks for confirming.

> > R2. pg_stat_get_io() and pg_stat_get_backend_io() could probably be merged (
> > duplicated code). That's not mandatory to provide the new per backend I/O stats
> > feature. So let's keep it as future work to ease the review.
> 
> Not sure about that.

I don't have a strong opinion about this one, so I'm ok to not merge those.

> > R3. There is no use case of retrieving other backend's IO stats with
> > stats_fetch_consistency set to 'snapshot'. Avoiding this behavior allows us to
> > save memory (that could be non negligeable as pgstat_build_snapshot() would add
> > _all_ the other backend's stats in the snapshot). I choose to document it and to
> > not return any rows: I think that's better than returning some data (that would
> > not "ensure" the snapshot consistency) or an error.
> 
> I'm pretty sure that there is a sensible argument about being able to
> get a snapshot of the I/O stat of another backend and consider that a
> feature.  For example, keeping a look at the activity of the
> checkpointer while doing a scan at a certain point in time?

hm, not sure how the stats_fetch_consistency set to 'snapshot' could help here:

- the checkpointer could write stuff for reason completly unrelated to "your"
backend

- if the idea is to 1. record checkpointer stats, 2. do stuff in the backend and
3. check again the checkpointer stats then I don't think setting stats_fetch_consistency
to snapshot is needed at all. In fact it's quite the opposite as 1. and 3. would
report the same values with stats_fetch_consistency set to 'snapshot' (if done 
in the same transaction).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: On non-Windows, hard depend on uselocale(3)
Next
From: Jan Wieck
Date:
Subject: Re: Commit Timestamp and LSN Inversion issue