Re: per backend I/O statistics - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: per backend I/O statistics
Date
Msg-id Z0QjeIkwC0HNI16K@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: per backend I/O statistics  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
Hi,

On Mon, Nov 25, 2024 at 10:06:44AM +0900, Michael Paquier wrote:
> On Fri, Nov 22, 2024 at 07:49:58AM +0000, Bertrand Drouvot wrote:
> > On Fri, Nov 22, 2024 at 10:36:29AM +0900, Michael Paquier wrote:
> >> Hmm.  created_entry only matters for pgstat_init_function_usage().
> >> All the other callers of pgstat_prep_pending_entry() pass a NULL
> >> value. 
> > 
> > I meant to say all the calls that passe "create" as true in pgstat_get_entry_ref().
> 
> Ah, OK, I think that I see your point here.
> 
> I am wondering how much this would matter as well for custom stats,
> but we're not there yet without at least one release out and folks try
> new things with these APIs and variable-numbered kinds.

Not sure here, could custom stats start incrementing before the database system
is ready to accept connections?

> pgstat_prep_pending_entry() to return NULL even if "create" is true
> may be a good thing, at the end, because that's the only way I can see
> based on the current APIs where we could say "Sorry, but the stats
> have not been loaded yet, so you cannot try to do anything related to
> the dshash".

Yeah, same here.

> From my view having a kind of barrier would be cleaner in the long
> run, but it's true that it may not be mandatory, as well.  pg_stat_io
> is currently OK to be called because the stats are loaded for
> auxiliary processes because it uses fixed-numbered stats in shmem.
> And it means we already have early calls that add stats getting
> overwritten once the stats are loaded from the on-disk file (Am I
> getting this part right?).

Yeah, we can already see that, for example, the background writer could enter 
pgstat_io_flush_cb() before the stats are reset or restored.

> Anyway, do we really require that for the sake of this thread?  We
> know that there's only one of each auxiliary process at a time, and
> they keep a footprint in pg_stat_io already.  So we could just limit
> outselves to live database backends, WAL senders and autovacuum
> workers, everything that's not auxiliary and spawned on request?

I think that's a fair starting point and that we will not lose any informations
doing so (as you said there is only one of each auxiliary process at a time,
so that one could already see their stats from pg_stat_io). 

The only cons that I can see is that we will not be able to merge the flush cb
but I don't think that's a blocker (the flush are done in shared memory so the
impact on performance should not be that much of an issue).

I'll come back with a new version implementing the above.

[1]: https://www.postgresql.org/message-id/Zz9sno%2BJJbWqdXhQ%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: Andrei Lepikhov
Date:
Subject: Re: POC, WIP: OR-clause support for indexes