Thread: Re: per backend I/O statistics

Re: per backend I/O statistics

From
Kyotaro Horiguchi
Date:
At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in 
> Hi hackers,
> 
> Please find attached a patch to implement $SUBJECT.
> 
> While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
> pg_my_stat_io view to display "my" backend I/O statistics and a new
> pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
> backend pid.
> 
> By having the per backend level of granularity, one could for example identify
> which running backend is responsible for most of the reads, most of the extends
> and so on... The pg_my_stat_io view could also be useful to check the
> impact on the I/O made by some operations, queries,... in the current session.
> 
> Some remarks:
> 
> - it is split in 2 sub patches: 0001 introducing the necessary changes to provide
> the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function.
> - the idea of having per backend I/O statistics has already been mentioned in
> [1] by Andres.
> 
> Some implementation choices:
> 
> - The KIND_IO stats are still "fixed amount" ones as the maximum number of
> backend is fixed.
> - The statistics snapshot is made for the global stats (the aggregated ones) and
> for my backend stats. The snapshot is not build for all the backend stats (that
> could be memory expensive depending on the number of max connections and given
> the fact that PgStat_IO is 16KB long).
> - The above point means that pg_stat_get_backend_io() behaves as if
> stats_fetch_consistency is set to none (each execution re-fetches counters
> from shared memory).
> - The above 2 points are also the reasons why the pg_my_stat_io view has been
> added (as its results takes care of the stats_fetch_consistency setting). I think
> that makes sense to rely on it in that case, while I'm not sure that would make
> a lot of sense to retrieve other's backend I/O stats and taking care of
> stats_fetch_consistency.
> 
> 
> [1]: https://www.postgresql.org/message-id/20230309003438.rectf7xo7pw5t5cj%40awork3.anarazel.de

I'm not sure about the usefulness of having the stats only available
from the current session. Since they are stored in shared memory,
shouldn't we make them accessible to all backends? However, this would
introduce permission considerations and could become complex.

When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.

Just to be clear, the above comments are not meant to oppose the
current implementation approach. They are purely for the sake of
discussing comparisons with other possible approaches.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: per backend I/O statistics

From
Kyotaro Horiguchi
Date:
At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> When I first looked at this patch, my initial thought was whether we
> should let these stats stay "fixed." The reason why the current
> PGSTAT_KIND_IO is fixed is that there is only one global statistics
> storage for the entire database. If we have stats for a flexible
> number of backends, it would need to be non-fixed, perhaps with the
> entry for INVALID_PROC_NUMBER storing the global I/O stats, I
> suppose. However, one concern with that approach would be the impact
> on performance due to the frequent creation and deletion of stats
> entries caused by high turnover of backends.

As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Sep 03, 2024 at 03:37:49PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in 
> > Hi hackers,
> > 
> > Please find attached a patch to implement $SUBJECT.
> > 
> > While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
> > pg_my_stat_io view to display "my" backend I/O statistics and a new
> > pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
> > backend pid.
> > 
> 
> I'm not sure about the usefulness of having the stats only available
> from the current session. Since they are stored in shared memory,
> shouldn't we make them accessible to all backends?

Thanks for the feedback!

The stats are accessible to all backends thanks to 0002 and the introduction
of the pg_stat_get_backend_io() function.

> However, this would
> introduce permission considerations and could become complex.

Not sure that the data exposed here is sensible enough to consider permission
restriction.

> When I first looked at this patch, my initial thought was whether we
> should let these stats stay "fixed." The reason why the current
> PGSTAT_KIND_IO is fixed is that there is only one global statistics
> storage for the entire database. If we have stats for a flexible
> number of backends, it would need to be non-fixed, perhaps with the
> entry for INVALID_PROC_NUMBER storing the global I/O stats, I
> suppose. However, one concern with that approach would be the impact
> on performance due to the frequent creation and deletion of stats
> entries caused by high turnover of backends.
>

The pros of using the fixed amount are:

- less code change (I think as I did not write the non fixed counterpart)
- probably better performance and less scalabilty impact (in case of high rate
of backends creation/ deletion)

Cons is probably allocating shared memory space that might not be used (
sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
unused connection). OTOH, if a high number of connections is not used that might
be worth to reduce the max_connections setting.

"Conceptually" speaking, we know what the maximum number of backend is, so I
think that using the fixed amount approach makes sense (somehow I think it can
be compared to PGSTAT_KIND_SLRU which relies on SLRU_NUM_ELEMENTS).

> Just to be clear, the above comments are not meant to oppose the
> current implementation approach. They are purely for the sake of
> discussing comparisons with other possible approaches.

No problem at all, thanks for your feedback and sharing your thoughts!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:
> At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > When I first looked at this patch, my initial thought was whether we
> > should let these stats stay "fixed." The reason why the current
> > PGSTAT_KIND_IO is fixed is that there is only one global statistics
> > storage for the entire database. If we have stats for a flexible
> > number of backends, it would need to be non-fixed, perhaps with the
> > entry for INVALID_PROC_NUMBER storing the global I/O stats, I
> > suppose. However, one concern with that approach would be the impact
> > on performance due to the frequent creation and deletion of stats
> > entries caused by high turnover of backends.
> 
> As an additional benefit of this approach, the client can set a
> connection variable, for example, no_backend_iostats to true, or set
> its inverse variable to false, to restrict memory usage to only the
> required backends.

Thanks for the feedback!

If we were to add an on/off switch button, I think I'd vote for a global one
instead. Indeed, I see this feature more like an "Administrator" one, where
the administrator wants to be able to find out which session is reponsible of
what (from an I/O point of view): like being able to anwser "which session is
generating this massive amount of reads"?

If we allow each session to disable the feature then the administrator
would lost this ability.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Alvaro Herrera
Date:
On 2024-Sep-03, Bertrand Drouvot wrote:

> Cons is probably allocating shared memory space that might not be used (
> sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
> unused connection). OTOH, if a high number of connections is not used that might
> be worth to reduce the max_connections setting.

I was surprised by the size you mention so I went to look for the
definition of that struct:

typedef struct PgStat_IO
{
    TimestampTz stat_reset_timestamp;
    PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;

(It would be good to have more than zero comments here BTW)

So what's happening is that struct PgStat_IO stores the data for every
single process type, be it regular backends, backend-like auxiliary
processes, and all other potential postmaster children.  So it seems to
me that storing one of these struct for "my backend stats" is wrong: I
think you should be using PgStat_BktypeIO instead (or maybe another
struct which has a reset timestamp plus BktypeIO, if you care about the
reset time).  That struct is only 1024 bytes long, so it seems much more
reasonable to have one of those per backend.

Another way to think about this might be to say that the B_BACKEND
element of the PgStat_Snapshot.io array should really be spread out to
MaxBackends separate elements.  This would make it more difficult to
obtain a total value accumulating ops done by all backends (since it's
require to sum the values of each backend), but it allows you to obtain
backend-specific values, including those of remote backends rather than
only your own, as Kyotaro suggests.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Thu, Sep 05, 2024 at 03:03:32PM +0200, Alvaro Herrera wrote:
> On 2024-Sep-03, Bertrand Drouvot wrote:
> 
> > Cons is probably allocating shared memory space that might not be used (
> > sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
> > unused connection). OTOH, if a high number of connections is not used that might
> > be worth to reduce the max_connections setting.
> 
> I was surprised by the size you mention so I went to look for the
> definition of that struct:
>

Thanks for looking at it!

 
> typedef struct PgStat_IO
> {
>     TimestampTz stat_reset_timestamp;
>     PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
> } PgStat_IO;
> 
> (It would be good to have more than zero comments here BTW)
> 
> So what's happening is that struct PgStat_IO stores the data for every
> single process type, be it regular backends, backend-like auxiliary
> processes, and all other potential postmaster children.

Yeap.

> So it seems to
> me that storing one of these struct for "my backend stats" is wrong: I
> think you should be using PgStat_BktypeIO instead (or maybe another
> struct which has a reset timestamp plus BktypeIO, if you care about the
> reset time).  That struct is only 1024 bytes long, so it seems much more
> reasonable to have one of those per backend.

Yeah, that could be an area of improvement. Thanks, I'll look at it.
Currently the filtering is done when retrieving the per backend stats but better
to do it when storing the stats.

> Another way to think about this might be to say that the B_BACKEND
> element of the PgStat_Snapshot.io array should really be spread out to
> MaxBackends separate elements.  This would make it more difficult to
> obtain a total value accumulating ops done by all backends (since it's
> require to sum the values of each backend), but it allows you to obtain
> backend-specific values, including those of remote backends rather than
> only your own, as Kyotaro suggests.
>

One backend can already see the stats of the other backends thanks to the
pg_stat_get_backend_io() function (that takes a backend pid as parameter)
that is introduced in the 0002 sub-patch.

I'll ensure that's still the case with the next version of the patch.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Fri, Sep 06, 2024 at 03:03:17PM +0000, Bertrand Drouvot wrote:
> The struct size is 1040 Bytes and that's much more reasonable than the size
> needed for per backend I/O stats in v1 (about 16KB).

One could think that's a high increase of shared memory usage with a high
number of connections (due to the fact that it's implemented as "fixed amount"
stats based on the max number of backends).

So out of curiosity, I did some measurements with:

dynamic_shared_memory_type=sysv
shared_memory_type=sysv
max_connections=20000

On my lab, ipcs -m gives me:

- on master

key        shmid      owner      perms      bytes      nattch     status
0x00113a04 51347487   postgres   600        1149394944 6
0x4bc9f2fa 51347488   postgres   600        4006976    6
0x46790800 51347489   postgres   600        1048576    2

- with v3 applied

key        shmid      owner      perms      bytes      nattch     status
0x00113a04 51347477   postgres   600        1170227200 6
0x08e04b0a 51347478   postgres   600        4006976    6
0x74688c9c 51347479   postgres   600        1048576    2

So, with 20000 max_connections (not advocating that's a reasonable number in 
all the cases), that's a 1170227200 - 1149394944 = 20 MB increase of shared
memory (expected with 20K max_connections and the new struct size of 1040 Bytes)
as compare to master which is 1096 MB.  It means that v3 produces about 2% shared
memory increase with 20000 max_connections.

Out of curiosity with max_connections=100000, then:

- on master:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00113a04 52953134   postgres   600        5053915136 6
0x37abf5ce 52953135   postgres   600        20006976   6
0x71c07d5c 52953136   postgres   600        1048576    2

- with v3:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00113a04 52953144   postgres   600        5157945344 6
0x7afb24de 52953145   postgres   600        20006976   6
0x2695af58 52953146   postgres   600        1048576    2

So, with 100000 max_connections (not advocating that's a reasonable number in 
all the cases), that's a 5157945344 - 5053915136 = 100 MB increase of shared
memory (expected with 100K max_connections and the new struct size of 1040 Bytes)
as compare to master which is about 4800 MB. It means that v3 produces about
2% shared memory increase with 100000 max_connections.


Then, based on those numbers, I think that the "fixed amount" approach is a good
one as 1/ the amount of shared memory increase is "relatively" small and 2/ most
probably provides performance benefits as compare to the "non fixed" approach.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Nazir Bilal Yavuz
Date:
Hi,

Thanks for working on this!

Your patch applies and builds cleanly.

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> - As stated up-thread, the pg_stat_get_backend_io() behaves as if
> stats_fetch_consistency is set to none (each execution re-fetches counters
> from shared memory). Indeed, the snapshot is not build in each backend to copy
> all the others backends stats, as 1/ there is no use case (there is no need to
> get others backends I/O statistics while taking care of the stats_fetch_consistency)
> and 2/ that could be memory expensive depending of the number of max connections.

I believe this is the correct approach.

I manually tested your patches, and they work as expected. Here is
some feedback:

- The stats_reset column is NULL in both pg_my_stat_io and
pg_stat_get_backend_io() until the first call to reset io statistics.
While I'm not sure if it's necessary, it might be useful to display
the more recent of the two times in the stats_reset column: the
statistics reset time or the backend creation time.

- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: per backend I/O statistics

From
Nazir Bilal Yavuz
Date:
Hi,

On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
> > - The pgstat_reset_io_counter_internal() is called in the
> > pgstat_shutdown_hook(). This causes the stats_reset column showing the
> > termination time of the old backend when its proc num is reassigned to
> > a new backend.
>
> doh! Nice catch, thanks!
>
> And also new backends that are not re-using a previous "existing" process slot
> are getting the last reset time (if any). So the right place to fix this is in
> pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
> 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
> right "conceptually" speaking).

Thanks, I confirm that it is fixed.

You mentioned some refactoring upthread:

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> - If we agree on the current proposal then I'll do  some refactoring in
> pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
> duplicated code (it's not done yet to ease the review).

Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats? If you meant the same thing, please don't
mind it.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
> > > - The pgstat_reset_io_counter_internal() is called in the
> > > pgstat_shutdown_hook(). This causes the stats_reset column showing the
> > > termination time of the old backend when its proc num is reassigned to
> > > a new backend.
> >
> > doh! Nice catch, thanks!
> >
> > And also new backends that are not re-using a previous "existing" process slot
> > are getting the last reset time (if any). So the right place to fix this is in
> > pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
> > 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
> > right "conceptually" speaking).
> 
> Thanks, I confirm that it is fixed.

Thanks!

> You mentioned some refactoring upthread:
> 
> On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > - If we agree on the current proposal then I'll do  some refactoring in
> > pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
> > duplicated code (it's not done yet to ease the review).
> 
> Could we remove pg_stat_get_my_io() completely and use
> pg_stat_get_backend_io() with the current backend's pid to get the
> current backend's stats?

The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the
statistics snapshot is build for "my backend stats" (means it depends of the
stats_fetch_consistency value). I can see use cases for that.

OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
none (each execution re-fetches counters from shared memory) (see [2]). Indeed,
the snapshot is not build in each backend to copy all the others backends stats,
as 1/ I think that there is no use case (there is no need to get others backends
I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
could be memory expensive depending of the number of max connections.

So I think it's better to keep both functions as they behave differently.

Thoughts?

> If you meant the same thing, please don't
> mind it.

What I meant to say is to try to remove the duplicate code that we can find in
those 3 functions (say creating a new function that would contain the duplicate
code and make use of this new function in the 3 others). Did not look at it in
details yet.

[1]: https://www.postgresql.org/message-id/ZtXR%2BCtkEVVE/LHF%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://www.postgresql.org/message-id/ZtsZtaRza9bFFeF8%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Nazir Bilal Yavuz
Date:
Hi,

On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:
> > Could we remove pg_stat_get_my_io() completely and use
> > pg_stat_get_backend_io() with the current backend's pid to get the
> > current backend's stats?
>
> The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the
> statistics snapshot is build for "my backend stats" (means it depends of the
> stats_fetch_consistency value). I can see use cases for that.
>
> OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
> none (each execution re-fetches counters from shared memory) (see [2]). Indeed,
> the snapshot is not build in each backend to copy all the others backends stats,
> as 1/ I think that there is no use case (there is no need to get others backends
> I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
> could be memory expensive depending of the number of max connections.
>
> So I think it's better to keep both functions as they behave differently.
>
> Thoughts?

Yes, that is correct. Sorry, you already had explained it and I had
agreed with it but I forgot.

> > If you meant the same thing, please don't
> > mind it.
>
> What I meant to say is to try to remove the duplicate code that we can find in
> those 3 functions (say creating a new function that would contain the duplicate
> code and make use of this new function in the 3 others). Did not look at it in
> details yet.

I got it, thanks for the explanation.

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Sep 17, 2024 at 04:47:51PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > So I think it's better to keep both functions as they behave differently.
> >
> > Thoughts?
> 
> Yes, that is correct. Sorry, you already had explained it and I had
> agreed with it but I forgot.

No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Fri, Sep 20, 2024 at 12:53:43PM +0900, Michael Paquier wrote:
> On Wed, Sep 04, 2024 at 04:45:24AM +0000, Bertrand Drouvot wrote:
> > On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:
> >> As an additional benefit of this approach, the client can set a
> >> connection variable, for example, no_backend_iostats to true, or set
> >> its inverse variable to false, to restrict memory usage to only the
> >> required backends.
> > 
> > Thanks for the feedback!
> > 
> > If we were to add an on/off switch button, I think I'd vote for a global one
> > instead. Indeed, I see this feature more like an "Administrator" one, where
> > the administrator wants to be able to find out which session is reponsible of
> > what (from an I/O point of view): like being able to anwser "which session is
> > generating this massive amount of reads"?
> > 
> > If we allow each session to disable the feature then the administrator
> > would lost this ability.
> 
> Hmm, I've been studying this patch,

Thanks for looking at it!

> and I am not completely sure to
> agree with this feeling of using fixed-numbered stats, actually, after
> reading the whole and seeing the structure of the patch
> (FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't
> know exactly the number of slots we need to know for the
> fixed-numbered stats as MaxBackends may change).

Right, that's a new way of dealing with "unknown" number of slots (and it has
cons as you mentioned in [1]).

> If we make these
> kind of stats variable-numbered, does it have to actually involve many
> creations or removals of the stats entries, though?  One point is that
> the number of entries to know about is capped by max_connections,
> which is a PGC_POSTMASTER.  That's the same kind of control as
> replication slots.  So one approach would be to reuse entries in the
> dshash and use in the hashing key the number in the procarrays.  If a
> new connection spawns and reuses a slot that was used in the past,
> then reset all the existing fields and assign its PID.

Yeah, like it's done currently with the "fixed-numbered" stats proposal. That
sounds reasonable to me, I'll look at this proposed approach and come back with
a new patch version, thanks!

> Another thing is the consistency of the data that we'd like to keep at
> shutdown.  If the connections have a balanced amount of stats shared
> among them, doing decision-making based on them is kind of easy.  But
> that may cause confusion if the activity is unbalanced across the
> sessions.  We could also not flush them to disk as an option, but it
> still seems more useful to me to save this data across restarts if one
> takes frequent snapshots of the new system view reporting everything,
> so as it is possible to get an idea of the deltas across the snapshots
> for each connection slot.

The idea that has been implemented so far in this patch is that we still maintain
an aggregated version of the stats (visible through pg_stat_io) and that only the
aggregated stats are flushed/read to/from disk (means we don't flush the
per-backend stats).

I think that it makes sense that way. The way I see it is that the per-backend
I/O stats is more for current activity instrumentation. So it's not clear to me
what would be the benefits of restoring the per-backend stats at startup knowing
that: 1) we restored the aggregated stats and 2) the sessions that were responsible
for the the restored stats are gone.

[1]: https://www.postgresql.org/message-id/Zuz5iQ4AjcuOMx_w%40paquier.xyz

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:
> On Tue, Sep 17, 2024 at 01:56:34PM +0000, Bertrand Drouvot wrote:
> > No problem at all! (I re-explained because I'm not always 100% sure that my
> > explanations are crystal clear ;-) )
> 
> We've discussed a bit this patch offline, but after studying the patch
> I doubt that this part is a good idea:
> 
> +    /* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
> +    PgStatShared_IO io;
>  } PgStat_ShmemControl;
> 
> We are going to be in trouble if we introduce a second member in this
> routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl
> relies on the fact that all its members after deterministic offset
> positions in this structure.

Agree that it would be an issue should we have to add a new FLEXIBLE_ARRAY_MEMBER.

> So this lacks flexibility.  This choice
> is caused by the fact that we don't exactly know the number of
> backends because that's controlled by the PGC_POSTMASTER GUC
> max_connections so the size of the structure would be undefined.

Right.

> There is a parallel with replication slot statistics here, where we
> save the replication slot data in the dshash based on their index
> number in shmem.  Hence, wouldn't it be better to do like replication
> slot stats, where we use the dshash and a key based on the procnum of
> each backend or auxiliary process (ProcNumber in procnumber.h)?  If at
> restart max_connections is lower than what was previously used, we
> could discard entries that would not fit anymore into the charts.
> This is probably not something that happens often, so I'm not really
> worried about forcing the removal of these stats depending on how the
> upper-bound of ProcNumber evolves.

Yeah, I'll look at implementing the dshash based on their procnum and see
where it goes.

> So, using a new kind of ID and making this kind variable-numbered may
> ease the implementation quite a bit, while avoiding any extensibility
> issues with the shmem portion of the patch if these are
> fixed-numbered.

Agree.

> This would rely on the fact that we would use the ProcNumber for the
> dshash key, and this information is not provided in pg_stat_activity.
> Perhaps we should add this information in pg_stat_activity so as it
> would be easily possible to do joins with a SQL function that returns
> a SRF with all the stats associated with a given connection slot
> (auxiliary or backend process)?

I'm not sure that's needed. What has been done in the previous versions is
to get the stats based on the pid (see pg_stat_get_backend_io()) where the
procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)).

Do you see an issue with that approach?

> The active PIDs of the live sessions are not stored in the active
> stats, why not? 

That was not needed. We can still retrieve the stats based on the pid thanks
to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having
to actually store the pid in the stats. I think that's fine because the pid
only matters at "display" time (pg_stat_get_backend_io()).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Oct 08, 2024 at 01:46:23PM +0900, Michael Paquier wrote:
> On Mon, Oct 07, 2024 at 09:54:21AM +0000, Bertrand Drouvot wrote:
> > On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:
> >> This would rely on the fact that we would use the ProcNumber for the
> >> dshash key, and this information is not provided in pg_stat_activity.
> >> Perhaps we should add this information in pg_stat_activity so as it
> >> would be easily possible to do joins with a SQL function that returns
> >> a SRF with all the stats associated with a given connection slot
> >> (auxiliary or backend process)?
> > 
> > I'm not sure that's needed. What has been done in the previous versions is
> > to get the stats based on the pid (see pg_stat_get_backend_io()) where the
> > procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)).
> 
> Ah, I see.  So you could just have the proc number in the key to
> control the upper-bound on the number of possible stats entries in the
> dshash.

Yes.

> Assuming that none of this data is persisted to the stats file at
> shutdown and that the stats of a single entry are reset each time a
> new backend reuses a previous proc slot, that would be OK by me, I
> guess.
> 
> >> The active PIDs of the live sessions are not stored in the active
> >> stats, why not? 
> > 
> > That was not needed. We can still retrieve the stats based on the pid thanks
> > to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having
> > to actually store the pid in the stats. I think that's fine because the pid
> > only matters at "display" time (pg_stat_get_backend_io()).
> 
> Okay, per the above and the persistency of the stats.

Great, I'll work on an updated patch version then.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Oct 08, 2024 at 04:28:39PM +0000, Bertrand Drouvot wrote:
> > > On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:
> > 
> > Okay, per the above and the persistency of the stats.
> 
> Great, I'll work on an updated patch version then.
> 

I spend some time on this during the last 2 days and I think we have 3 design
options.

=== GOALS ===

But first let's sump up the goals that I think we agreed on:

- Keep pg_stat_io as it is today: give the whole server picture and serialize
the stats to disk.

- Introduce per-backend IO stats and 2 new APIs to:

   1. Provide the IO stats for "my backend" (through say pg_my_stat_io), this
      would take care of the stats_fetch_consistency.

   2. Retrieve the IO stats for another backend (through say pg_stat_get_backend_io(pid))
      that would _not_ take care of stats_fetch_consistency, as:

      2.1/ I think that there is no use case (there is no need to get others
           backends I/O statistics while taking care of the stats_fetch_consistency)

      2.2/ That could be memory expensive to store a snapshot for all the backends
           (depending of the number of backend created)

- There is no need to serialize the per-backend IO stats to disk (no point to
see stats for backends that do not exist anymore after a re-start).

- The per-backend IO stats should be variable-numbered (not fixed), as per 
up-thread discussion.

=== OPTIONS ===

So, based on this, I think that we could:

Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this
KIND take care of the aggregated view (pg_stat_io) and the per-backend stats.

Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO
that would be variable-numbered.

Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that
would be variable-numbered and store the "aggregated stats aka pg_stat_io" in
shared memory (not part of the variable-numbered hash). Per-backend stats
could be aggregated into "pg_stat_io" during the flush_pending_cb call for example.

=== BEST OPTION? ===

I would opt for Option 2 as:

- The stats system is currently not designed for Option 1 and our goals (for
example the shared_data_len is used to serialize but also to fetch the entries,
see pgstat_fetch_entry()) so that would need some hack to serialize only a part
of them and still be able to fetch them all).

- Mixing "fixed" and "variable" in the same KIND does not sound like a good idea
(though that might be possible with some hacks, I don't think that would be 
easy to maintain).

- Having the per-backend as "variable" in its dedicated kind looks more reasonable
and less error-prone.

- I don't think there is a stats design similar to option 3 currently, so I'm
not sure there is a need to develop something new while Option 2 could be done.

- Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization
part.

- Option 2 seems to offer more flexibility (as compare to Option 1 and 3).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Thu, Oct 31, 2024 at 05:09:56AM +0000, Bertrand Drouvot wrote:
> === OPTIONS ===
> 
> So, based on this, I think that we could:
> 
> Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this
> KIND take care of the aggregated view (pg_stat_io) and the per-backend stats.
> 
> Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO
> that would be variable-numbered.
> 
> Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that
> would be variable-numbered and store the "aggregated stats aka pg_stat_io" in
> shared memory (not part of the variable-numbered hash). Per-backend stats
> could be aggregated into "pg_stat_io" during the flush_pending_cb call for example.
> 
> === BEST OPTION? ===
> 
> I would opt for Option 2 as:
> 
> - The stats system is currently not designed for Option 1 and our goals (for
> example the shared_data_len is used to serialize but also to fetch the entries,
> see pgstat_fetch_entry()) so that would need some hack to serialize only a part
> of them and still be able to fetch them all).
> 
> - Mixing "fixed" and "variable" in the same KIND does not sound like a good idea
> (though that might be possible with some hacks, I don't think that would be 
> easy to maintain).
> 
> - Having the per-backend as "variable" in its dedicated kind looks more reasonable
> and less error-prone.
> 
> - I don't think there is a stats design similar to option 3 currently, so I'm
> not sure there is a need to develop something new while Option 2 could be done.
> 
> - Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization
> part.
> 
> - Option 2 seems to offer more flexibility (as compare to Option 1 and 3).
> 
> Thoughts?

And why not add more per-backend stats in the future? (once the I/O part is done).

I think that's one more reason to go with option 2 (and implementing a brand new
PGSTAT_KIND_BACKEND kind).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Mon, Nov 04, 2024 at 10:01:50AM +0000, Bertrand Drouvot wrote:
> And why not add more per-backend stats in the future? (once the I/O part is done).
> 
> I think that's one more reason to go with option 2 (and implementing a brand new
> PGSTAT_KIND_BACKEND kind).

I'm starting working on option 2, I think it will be easier to discuss with
a patch proposal to look at.

If in the meantime, one strongly disagree with option 2 (means implement a brand
new PGSTAT_KIND_BACKEND and keep PGSTAT_KIND_IO), please let me know.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Wed, Nov 06, 2024 at 08:39:07AM +0900, Michael Paquier wrote:
> On Tue, Nov 05, 2024 at 05:37:15PM +0000, Bertrand Drouvot wrote:
> > I'm starting working on option 2, I think it will be easier to discuss with
> > a patch proposal to look at.
> > 
> > If in the meantime, one strongly disagree with option 2 (means implement a brand
> > new PGSTAT_KIND_BACKEND and keep PGSTAT_KIND_IO), please let me know.
> 
> Sorry for the late reply, catching up a bit.

No problem at all, thanks for looking at it!

> As you are quoting in [1], you do not expect the backend-io stats and
> the more global pg_stat_io to achieve the same level of consistency as
> the backend stats would be gone at restart, and wiped out when a
> backend shuts down.

Yes.

> So, splitting them with a different stats kind
> feels more natural because it would be possible to control how each
> stat kind behaves depending on the code shutdown and reset paths
> within their own callbacks rather than making the callbacks of
> PGSTAT_KIND_IO more complex than they already are.

Yeah, thanks for sharing your thoughts.

> And pg_stat_io is
> a fixed-numbered stats kind because of the way it aggregates its stats
> with a number states defined at compile-time.
> 
> Is the structure you have in mind different than PgStat_BktypeIO?

Very close.

> Perhaps a split is better anyway with that in mind.

The in-progress patch (not shared yet) is using the following:

"
typedef struct PgStat_Backend
{
       TimestampTz stat_reset_timestamp;
       BackendType bktype;
       PgStat_BktypeIO stats;
} PgStat_Backend;
"

The bktype is used to be able to filter the stats correctly when we display them.

> The amount of memory required to store the snapshots of backend-IO
> does not worry me much, TBH, but you are worried about a high turnover
> of connections that could cause a lot of bloat in the backend-IO
> snapshots because of the persistency that these stats would have,
> right?

Not only a high turnover but also a high number of entries created in the hash.

Furthermore I don't see any use case of relying on stats_fetch_consistency
while querying other backend's stats.

> If possible, supporting snapshots would be
> more consistent with the other stats.

I have I mind to support the snapshots _only_ when querying our own stats. I can
measure the memory impact if we use them also when querying other backends stats
too (though I don't see a use case).

> Just to be clear, I am not in favor of making PgStat_HashKey larger
> than it already is.

That's not needed, the patch I'm working on stores the proc number in the
objid field of the key.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Thu, Nov 07, 2024 at 09:50:59AM +0900, Michael Paquier wrote:
> On Wed, Nov 06, 2024 at 01:51:02PM +0000, Bertrand Drouvot wrote:
> > That's not needed, the patch I'm working on stores the proc number in the
> > objid field of the key.
> 
> Relying on the procnumber for the object ID gets a +1 here.

Thanks!

> That
> provides an automatic cap on the maximum number of entries that can
> exist at once for this new stats kind.

Yeah.

POC patch is now working, will wear a reviewer hat now and will share in one 
or 2 days.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote:
> On Fri, Nov 08, 2024 at 02:09:30PM +0000, Bertrand Drouvot wrote:
> > 1. one assertion in pgstat_drop_entry_internal() is not necessary true anymore 
> > with this new stat kind. So, adding an extra bool as parameter to take care of it.
> 
> Why?  I have to admit that the addition of this argument at this level
> specific to a new stats kind feels really weird and it looks like a
> layer violation, while this happens only when resetting the whole
> pgstats state because we have loaded a portion of the stats but the
> file loaded was in such a state that we don't have a consistent
> picture.

Yes. I agree that looks weird and I don't like it that much. But the assertion
is not true anymore. If you:

- change the arguments to false in the pgstat_drop_entry_internal() call in 
pgstat_drop_all_entries()
- start the engine
- kill -9 postgres
- restart the engine

You'll see the assert failing due to the startup process. I don't think it is
doing something wrong though, just populating its backend stats. Do you have
another view on it?

> > 2. a new struct "PgStat_Backend" is created and does contain the backend type.
> > The backend type is used for filtering purpose when the stats are displayed.
> 
> Is that necessary?  We can guess that from the PID with a join in
> pg_stat_activity.  pg_stat_get_backend_io() from 0004 returns a set of
> tuples for a specific PID, as well..

How would that work for someone doing "select * from pg_stat_get_backend_io(<pid>)"?

Do you mean copy/paste part of the code from pg_stat_get_activity() into
pg_stat_get_backend_io() to get the backend type? That sounds like an idea,
I'll have a look at it.

> +    /* save the bktype */
> +    if (kind == PGSTAT_KIND_PER_BACKEND)
> +        bktype = ((PgStatShared_Backend *) header)->stats.bktype;
> [...]
> +    /* restore the bktype */
> +    if (kind == PGSTAT_KIND_PER_BACKEND)
> +        ((PgStatShared_Backend *) header)->stats.bktype = bktype;
> 
> Including the backend type results in these blips in
> shared_stat_reset_contents() which should not have anything related 
> to stats kinds and should remain neutral, as well.

Yeah, we can simply get rid of it if we remove the backend type in PgStat_Backend.

> pgstat_prep_per_backend_pending() is used only in pgstat_io.c, so I
> guess that it should be static in this file rather than declared in
> pgstat.h?

Good catch, thanks!

> +typedef struct PgStat_PendingIO
> 
> Perhaps this part should use a separate structure named
> "BackendPendingIO"?  The definition of the structure has to be in
> pgstat.h as this is the pending_size of the new stats kind.  It looks
> like it would be cleaner to keep PgStat_PendingIO local to
> pgstat_io.c, and define PgStat_PendingIO based on
> PgStat_BackendPendingIO?

I see what you meean, what about simply "PgStat_BackendPending" in pgstat.h?

> +    /*
> +     * Do serialize or not this kind of stats.
> +     */
> +    bool        to_serialize:1;
> 
> Not sure that "serialize" is the best term that applies here.  For
> pgstats entries, serialization refers to the matter of writing their
> entries with a "serialized" name because they have an undefined number
> when stored locally after a reload.  I'd suggest to split this concept
> into its own patch, rename the flag as "write_to_file" (or what you
> think is suited), and also apply the flag in the fixed-numbered loop
> done in pgstat_write_statsfile() before going through the dshash.

Makes sense to create its own patch, will have a look.

> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        <indexterm>
> +         <primary>pg_stat_reset_single_backend_io_counters</primary>
> +        </indexterm>
> +        <function>pg_stat_reset_single_backend_io_counters</function> ( <type>int4</type> )
> +        <returnvalue>void</returnvalue>
> 
> This should document that the input argument is a PID.

Yeap, will add.

> 
> Is pg_my_stat_io() the best name ever? 

Not 100% sure, but there is already "pg_my_temp_schema" so I thought that
pg_my_stat_io would not be that bad. Open to suggestions though.

> I'd suggest to just merge 0004 with 0001.

Sure.

> Structurally, it may be cleaner to keep all the callbacks and the
> backend-I/O specific logic into a separate file, perhaps
> pgstat_io_backend.c or pgstat_backend_io?

Yeah, I was wondering the same. What about pgstat_backend.c (that would contain
only one "section" dedicated to IO currently)?

> > ==== 0002
> > 
> > Merge both IO stats flush callback. There is no need to keep both callbacks.
> > 
> > Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES,
> > IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES.
> 
> Not sure to be a fan of that, TBH, still I get the performance
> argument of the matter.  Each stats kind has its own flush path, and
> this assumes that we'll never going to diverge in terms of the
> information maintained for each backend and the existing pg_stat_io.
> Perhaps there could be a divergence at some point?

Yeah perhaps, so in case of divergence let's split "again"? I mean for the moment
I don't see any reason to keep both and we have pros related to efficiency during
flush.

> > === Remarks
> > 
> > R1. The key field is still "objid" even if it stores the proc number. I think
> > that's fine and there is no need to rename it as it's related comment states:
> 
> Let's not change the object ID and the hash key.  The approach you are
> using here with a variable-numbered stats kinds and a lookup with the
> procnumber is sensible here.

Agree, thanks for confirming.

> > R2. pg_stat_get_io() and pg_stat_get_backend_io() could probably be merged (
> > duplicated code). That's not mandatory to provide the new per backend I/O stats
> > feature. So let's keep it as future work to ease the review.
> 
> Not sure about that.

I don't have a strong opinion about this one, so I'm ok to not merge those.

> > R3. There is no use case of retrieving other backend's IO stats with
> > stats_fetch_consistency set to 'snapshot'. Avoiding this behavior allows us to
> > save memory (that could be non negligeable as pgstat_build_snapshot() would add
> > _all_ the other backend's stats in the snapshot). I choose to document it and to
> > not return any rows: I think that's better than returning some data (that would
> > not "ensure" the snapshot consistency) or an error.
> 
> I'm pretty sure that there is a sensible argument about being able to
> get a snapshot of the I/O stat of another backend and consider that a
> feature.  For example, keeping a look at the activity of the
> checkpointer while doing a scan at a certain point in time?

hm, not sure how the stats_fetch_consistency set to 'snapshot' could help here:

- the checkpointer could write stuff for reason completly unrelated to "your"
backend

- if the idea is to 1. record checkpointer stats, 2. do stuff in the backend and
3. check again the checkpointer stats then I don't think setting stats_fetch_consistency
to snapshot is needed at all. In fact it's quite the opposite as 1. and 3. would
report the same values with stats_fetch_consistency set to 'snapshot' (if done 
in the same transaction).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Nov 19, 2024 at 10:47:53AM +0900, Michael Paquier wrote:
> On Thu, Nov 14, 2024 at 01:30:11PM +0000, Bertrand Drouvot wrote:
> > - change the arguments to false in the pgstat_drop_entry_internal() call in 
> > pgstat_drop_all_entries()
> > - start the engine
> > - kill -9 postgres
> > - restart the engine
> > 
> > You'll see the assert failing due to the startup process. I don't think it is
> > doing something wrong though, just populating its backend stats. Do you have
> > another view on it?
> 
> It feels sad that we have to plug in the internals at this level for
> this particular case.  Perhaps there is something to do with more
> callbacks.  Or perhaps there is just no point in tracking the stats of
> auxiliary processes because we already have this data in the existing
> pg_stat_io already?

We can not discard the "per backend" stats collection for auxiliary processes
because those are the source for pg_stat_io too (once the 2 flush callbacks are
merged).

I have another idea: after a bit more of investigation it turns out that
only the startup process is generating the failed assertion.

So, for the startup process only, what about?

- don't call pgstat_create_backend_stat() in pgstat_beinit()...
- but call it in StartupXLOG() instead (after the stats are discarded or restored).

That way we could get rid of the changes linked to the assertion and still handle
the startup process particular case. Thoughts?

> > How would that work for someone doing "select * from pg_stat_get_backend_io(<pid>)"?
> > 
> > Do you mean copy/paste part of the code from pg_stat_get_activity() into
> > pg_stat_get_backend_io() to get the backend type? That sounds like an idea,
> > I'll have a look at it.
> 
> I was under the impression that we should keep PgStat_Backend for the
> reset_timestamp part, but drop BackendType as it can be guessed from
> pg_stat_activity itself.

Removing BackendType sounds doable, I'll look at it.

> >> Structurally, it may be cleaner to keep all the callbacks and the
> >> backend-I/O specific logic into a separate file, perhaps
> >> pgstat_io_backend.c or pgstat_backend_io?
> > 
> > Yeah, I was wondering the same. What about pgstat_backend.c (that would contain
> > only one "section" dedicated to IO currently)?
> 
> pgstat_backend.c looks good to me.  Could there be other stats than
> just IO, actually?  Perhaps not, just asking..

Yeah that's the reason why I suggested "pgstat_backend.c". I would not be
surprised if we add more "per backend" stats (that are not I/O related) in the
future as the main machinery would be there.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote:
> +    /*
> +     * Do serialize or not this kind of stats.
> +     */
> +    bool        to_serialize:1;
> 
> Not sure that "serialize" is the best term that applies here.  For
> pgstats entries, serialization refers to the matter of writing their
> entries with a "serialized" name because they have an undefined number
> when stored locally after a reload.  I'd suggest to split this concept
> into its own patch, rename the flag as "write_to_file" 

Yeah, also this could useful for custom statistics. So I created a dedicated
thread and a patch proposal (see [1]).

[1]: https://www.postgresql.org/message-id/Zz3skBqzBncSFIuY%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Wed, Nov 20, 2024 at 09:01:26AM +0900, Michael Paquier wrote:
> On Tue, Nov 19, 2024 at 04:28:55PM +0000, Bertrand Drouvot wrote:
> > So, for the startup process only, what about?
> > 
> > - don't call pgstat_create_backend_stat() in pgstat_beinit()...
> > - but call it in StartupXLOG() instead (after the stats are discarded or restored).
> > 
> > That way we could get rid of the changes linked to the assertion and still handle
> > the startup process particular case. Thoughts?
> 
> Hmm.  That may prove to be a good idea in the long-term.  The startup
> process is a specific path kicked in at a very early stage, so it is
> also a bit weird that we'd try to insert statistics while we are going
> to reset them anyway a bit later.

Exactly.

> That may also be relevant for
> custom statistics, actually, especially if some calls happen in some
> hook paths taken before the reset is done.  This could happen for
> injection point stats when loading it with shared_preload_libraries, 
> actually, if you load, attach or run a callback at this early stage.

Right. I did not had in mind to go that far here (for the per backend stats
needs). My idea was "just" to move the new pgstat_create_backend_stat() (which
is related to per backend stats only) call at the right place in StartupXLOG()
for the startup process only. As that's the startup process that will reset
or restore the stats I think that makes sense.

It looks like that what you have in mind is much more generic, what about:

- Focus on this thread first and then move the call as proposed above
- Think about a more generic idea later on (on the per-backend I/O stats is
in).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Thu, Nov 21, 2024 at 07:18:24AM +0900, Michael Paquier wrote:
> On Wed, Nov 20, 2024 at 02:20:18PM +0000, Bertrand Drouvot wrote:
> > Right. I did not had in mind to go that far here (for the per backend stats
> > needs). My idea was "just" to move the new pgstat_create_backend_stat() (which
> > is related to per backend stats only) call at the right place in StartupXLOG()
> > for the startup process only. As that's the startup process that will reset
> > or restore the stats I think that makes sense.
> > 
> > It looks like that what you have in mind is much more generic, what about:
> > 
> > - Focus on this thread first and then move the call as proposed above
> > - Think about a more generic idea later on (on the per-backend I/O stats is
> > in).
> 
> Moving pgstat_create_backend_stat() may be OK in the long-term, at
> least that would document why we need to care about the startup
> process.

Yeap.

> Still, moving only the call feels incomplete, so how about adding a
> boolean field in PgStat_ShmemControl that defaults to false when the
> shmem area is initialized in StatsShmemInit(), then switched to true
> once we know that the stats have been restored or reset by the startup
> process.  Something like that should work to control the dshash
> inserts, I guess?

I did some tests (on a primary and on a standby) and currently there is no call
to pgstat_get_entry_ref() at all before we reach the stats reset or restore
in StartupXLOG(). The first ones appear after "we really open for business"
(quoting process_pm_child_exit()).

Now, with the per-backend I/O stats patch in place, there is 3 processes that
could call pgstat_get_entry_ref() before we reach the stats reset or restore in
StartupXLOG() (observed by setting a breakpoint on the startup process before
stats are reset or restored):

- checkpointer
- background writer
- startup process

All calls are for the new backend stat kind. It also makes sense to see non
"regular" backends as the "regulars" are not allowed to connect yet.

Results: they could collect some stats for nothing since a reset or restore will
happen after.

Now, if we want to prevent new entries to be created before the stats are
reset or restored:

- then the end results will be the same (means starting the instance with reset
or restored stats). The only difference would be that they would not collect stats
for "nothing" during this small time window.

- it looks like that would lead to some non negligible refactoring: I started the
coding and observed the potential impact. The reason is that all the code (at least
the one I looked at) does not expect to receive a NULL value from pgstat_get_entry_ref()
(when asking for creation) and we would need to take care of all the cases
(all the stats, all backend type) for safety reason.

So, given that:

- the end result would be the same
- the code changes would be non negligible (unless we have a better idea than
pgstat_get_entry_ref() returning a NULL value).

I think that might be valid to try to implement it (to avoid those processes to
collect the stats for nothing during this small time window) but I am not sure
that's worth it (as the end result would be the same and the code change non
negligible).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Fri, Nov 22, 2024 at 10:36:29AM +0900, Michael Paquier wrote:
> On Thu, Nov 21, 2024 at 05:23:42PM +0000, Bertrand Drouvot wrote:
> > So, given that:
> > 
> > - the end result would be the same
> > - the code changes would be non negligible (unless we have a better idea than
> > pgstat_get_entry_ref() returning a NULL value).
> 
> Hmm.  created_entry only matters for pgstat_init_function_usage().
> All the other callers of pgstat_prep_pending_entry() pass a NULL
> value. 

I meant to say all the calls that passe "create" as true in pgstat_get_entry_ref().

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com