Re: per backend I/O statistics - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: per backend I/O statistics
Date
Msg-id Zz9sno+JJbWqdXhQ@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: per backend I/O statistics  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
Hi,

On Thu, Nov 21, 2024 at 07:18:24AM +0900, Michael Paquier wrote:
> On Wed, Nov 20, 2024 at 02:20:18PM +0000, Bertrand Drouvot wrote:
> > Right. I did not had in mind to go that far here (for the per backend stats
> > needs). My idea was "just" to move the new pgstat_create_backend_stat() (which
> > is related to per backend stats only) call at the right place in StartupXLOG()
> > for the startup process only. As that's the startup process that will reset
> > or restore the stats I think that makes sense.
> > 
> > It looks like that what you have in mind is much more generic, what about:
> > 
> > - Focus on this thread first and then move the call as proposed above
> > - Think about a more generic idea later on (on the per-backend I/O stats is
> > in).
> 
> Moving pgstat_create_backend_stat() may be OK in the long-term, at
> least that would document why we need to care about the startup
> process.

Yeap.

> Still, moving only the call feels incomplete, so how about adding a
> boolean field in PgStat_ShmemControl that defaults to false when the
> shmem area is initialized in StatsShmemInit(), then switched to true
> once we know that the stats have been restored or reset by the startup
> process.  Something like that should work to control the dshash
> inserts, I guess?

I did some tests (on a primary and on a standby) and currently there is no call
to pgstat_get_entry_ref() at all before we reach the stats reset or restore
in StartupXLOG(). The first ones appear after "we really open for business"
(quoting process_pm_child_exit()).

Now, with the per-backend I/O stats patch in place, there is 3 processes that
could call pgstat_get_entry_ref() before we reach the stats reset or restore in
StartupXLOG() (observed by setting a breakpoint on the startup process before
stats are reset or restored):

- checkpointer
- background writer
- startup process

All calls are for the new backend stat kind. It also makes sense to see non
"regular" backends as the "regulars" are not allowed to connect yet.

Results: they could collect some stats for nothing since a reset or restore will
happen after.

Now, if we want to prevent new entries to be created before the stats are
reset or restored:

- then the end results will be the same (means starting the instance with reset
or restored stats). The only difference would be that they would not collect stats
for "nothing" during this small time window.

- it looks like that would lead to some non negligible refactoring: I started the
coding and observed the potential impact. The reason is that all the code (at least
the one I looked at) does not expect to receive a NULL value from pgstat_get_entry_ref()
(when asking for creation) and we would need to take care of all the cases
(all the stats, all backend type) for safety reason.

So, given that:

- the end result would be the same
- the code changes would be non negligible (unless we have a better idea than
pgstat_get_entry_ref() returning a NULL value).

I think that might be valid to try to implement it (to avoid those processes to
collect the stats for nothing during this small time window) but I am not sure
that's worth it (as the end result would be the same and the code change non
negligible).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_rewind WAL segments deletion pitfall
Next
From: Yogesh Sharma
Date:
Subject: Re: Add a write_to_file member to PgStat_KindInfo