Thread: Re: per backend I/O statistics

Re: per backend I/O statistics

From
Kyotaro Horiguchi
Date:
At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in 
> Hi hackers,
> 
> Please find attached a patch to implement $SUBJECT.
> 
> While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
> pg_my_stat_io view to display "my" backend I/O statistics and a new
> pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
> backend pid.
> 
> By having the per backend level of granularity, one could for example identify
> which running backend is responsible for most of the reads, most of the extends
> and so on... The pg_my_stat_io view could also be useful to check the
> impact on the I/O made by some operations, queries,... in the current session.
> 
> Some remarks:
> 
> - it is split in 2 sub patches: 0001 introducing the necessary changes to provide
> the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function.
> - the idea of having per backend I/O statistics has already been mentioned in
> [1] by Andres.
> 
> Some implementation choices:
> 
> - The KIND_IO stats are still "fixed amount" ones as the maximum number of
> backend is fixed.
> - The statistics snapshot is made for the global stats (the aggregated ones) and
> for my backend stats. The snapshot is not build for all the backend stats (that
> could be memory expensive depending on the number of max connections and given
> the fact that PgStat_IO is 16KB long).
> - The above point means that pg_stat_get_backend_io() behaves as if
> stats_fetch_consistency is set to none (each execution re-fetches counters
> from shared memory).
> - The above 2 points are also the reasons why the pg_my_stat_io view has been
> added (as its results takes care of the stats_fetch_consistency setting). I think
> that makes sense to rely on it in that case, while I'm not sure that would make
> a lot of sense to retrieve other's backend I/O stats and taking care of
> stats_fetch_consistency.
> 
> 
> [1]: https://www.postgresql.org/message-id/20230309003438.rectf7xo7pw5t5cj%40awork3.anarazel.de

I'm not sure about the usefulness of having the stats only available
from the current session. Since they are stored in shared memory,
shouldn't we make them accessible to all backends? However, this would
introduce permission considerations and could become complex.

When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.

Just to be clear, the above comments are not meant to oppose the
current implementation approach. They are purely for the sake of
discussing comparisons with other possible approaches.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: per backend I/O statistics

From
Kyotaro Horiguchi
Date:
At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> When I first looked at this patch, my initial thought was whether we
> should let these stats stay "fixed." The reason why the current
> PGSTAT_KIND_IO is fixed is that there is only one global statistics
> storage for the entire database. If we have stats for a flexible
> number of backends, it would need to be non-fixed, perhaps with the
> entry for INVALID_PROC_NUMBER storing the global I/O stats, I
> suppose. However, one concern with that approach would be the impact
> on performance due to the frequent creation and deletion of stats
> entries caused by high turnover of backends.

As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Sep 03, 2024 at 03:37:49PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in 
> > Hi hackers,
> > 
> > Please find attached a patch to implement $SUBJECT.
> > 
> > While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
> > pg_my_stat_io view to display "my" backend I/O statistics and a new
> > pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
> > backend pid.
> > 
> 
> I'm not sure about the usefulness of having the stats only available
> from the current session. Since they are stored in shared memory,
> shouldn't we make them accessible to all backends?

Thanks for the feedback!

The stats are accessible to all backends thanks to 0002 and the introduction
of the pg_stat_get_backend_io() function.

> However, this would
> introduce permission considerations and could become complex.

Not sure that the data exposed here is sensible enough to consider permission
restriction.

> When I first looked at this patch, my initial thought was whether we
> should let these stats stay "fixed." The reason why the current
> PGSTAT_KIND_IO is fixed is that there is only one global statistics
> storage for the entire database. If we have stats for a flexible
> number of backends, it would need to be non-fixed, perhaps with the
> entry for INVALID_PROC_NUMBER storing the global I/O stats, I
> suppose. However, one concern with that approach would be the impact
> on performance due to the frequent creation and deletion of stats
> entries caused by high turnover of backends.
>

The pros of using the fixed amount are:

- less code change (I think as I did not write the non fixed counterpart)
- probably better performance and less scalabilty impact (in case of high rate
of backends creation/ deletion)

Cons is probably allocating shared memory space that might not be used (
sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
unused connection). OTOH, if a high number of connections is not used that might
be worth to reduce the max_connections setting.

"Conceptually" speaking, we know what the maximum number of backend is, so I
think that using the fixed amount approach makes sense (somehow I think it can
be compared to PGSTAT_KIND_SLRU which relies on SLRU_NUM_ELEMENTS).

> Just to be clear, the above comments are not meant to oppose the
> current implementation approach. They are purely for the sake of
> discussing comparisons with other possible approaches.

No problem at all, thanks for your feedback and sharing your thoughts!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:
> At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > When I first looked at this patch, my initial thought was whether we
> > should let these stats stay "fixed." The reason why the current
> > PGSTAT_KIND_IO is fixed is that there is only one global statistics
> > storage for the entire database. If we have stats for a flexible
> > number of backends, it would need to be non-fixed, perhaps with the
> > entry for INVALID_PROC_NUMBER storing the global I/O stats, I
> > suppose. However, one concern with that approach would be the impact
> > on performance due to the frequent creation and deletion of stats
> > entries caused by high turnover of backends.
> 
> As an additional benefit of this approach, the client can set a
> connection variable, for example, no_backend_iostats to true, or set
> its inverse variable to false, to restrict memory usage to only the
> required backends.

Thanks for the feedback!

If we were to add an on/off switch button, I think I'd vote for a global one
instead. Indeed, I see this feature more like an "Administrator" one, where
the administrator wants to be able to find out which session is reponsible of
what (from an I/O point of view): like being able to anwser "which session is
generating this massive amount of reads"?

If we allow each session to disable the feature then the administrator
would lost this ability.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Alvaro Herrera
Date:
On 2024-Sep-03, Bertrand Drouvot wrote:

> Cons is probably allocating shared memory space that might not be used (
> sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
> unused connection). OTOH, if a high number of connections is not used that might
> be worth to reduce the max_connections setting.

I was surprised by the size you mention so I went to look for the
definition of that struct:

typedef struct PgStat_IO
{
    TimestampTz stat_reset_timestamp;
    PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;

(It would be good to have more than zero comments here BTW)

So what's happening is that struct PgStat_IO stores the data for every
single process type, be it regular backends, backend-like auxiliary
processes, and all other potential postmaster children.  So it seems to
me that storing one of these struct for "my backend stats" is wrong: I
think you should be using PgStat_BktypeIO instead (or maybe another
struct which has a reset timestamp plus BktypeIO, if you care about the
reset time).  That struct is only 1024 bytes long, so it seems much more
reasonable to have one of those per backend.

Another way to think about this might be to say that the B_BACKEND
element of the PgStat_Snapshot.io array should really be spread out to
MaxBackends separate elements.  This would make it more difficult to
obtain a total value accumulating ops done by all backends (since it's
require to sum the values of each backend), but it allows you to obtain
backend-specific values, including those of remote backends rather than
only your own, as Kyotaro suggests.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Thu, Sep 05, 2024 at 03:03:32PM +0200, Alvaro Herrera wrote:
> On 2024-Sep-03, Bertrand Drouvot wrote:
> 
> > Cons is probably allocating shared memory space that might not be used (
> > sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
> > unused connection). OTOH, if a high number of connections is not used that might
> > be worth to reduce the max_connections setting.
> 
> I was surprised by the size you mention so I went to look for the
> definition of that struct:
>

Thanks for looking at it!

 
> typedef struct PgStat_IO
> {
>     TimestampTz stat_reset_timestamp;
>     PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
> } PgStat_IO;
> 
> (It would be good to have more than zero comments here BTW)
> 
> So what's happening is that struct PgStat_IO stores the data for every
> single process type, be it regular backends, backend-like auxiliary
> processes, and all other potential postmaster children.

Yeap.

> So it seems to
> me that storing one of these struct for "my backend stats" is wrong: I
> think you should be using PgStat_BktypeIO instead (or maybe another
> struct which has a reset timestamp plus BktypeIO, if you care about the
> reset time).  That struct is only 1024 bytes long, so it seems much more
> reasonable to have one of those per backend.

Yeah, that could be an area of improvement. Thanks, I'll look at it.
Currently the filtering is done when retrieving the per backend stats but better
to do it when storing the stats.

> Another way to think about this might be to say that the B_BACKEND
> element of the PgStat_Snapshot.io array should really be spread out to
> MaxBackends separate elements.  This would make it more difficult to
> obtain a total value accumulating ops done by all backends (since it's
> require to sum the values of each backend), but it allows you to obtain
> backend-specific values, including those of remote backends rather than
> only your own, as Kyotaro suggests.
>

One backend can already see the stats of the other backends thanks to the
pg_stat_get_backend_io() function (that takes a backend pid as parameter)
that is introduced in the 0002 sub-patch.

I'll ensure that's still the case with the next version of the patch.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Fri, Sep 06, 2024 at 03:03:17PM +0000, Bertrand Drouvot wrote:
> The struct size is 1040 Bytes and that's much more reasonable than the size
> needed for per backend I/O stats in v1 (about 16KB).

One could think that's a high increase of shared memory usage with a high
number of connections (due to the fact that it's implemented as "fixed amount"
stats based on the max number of backends).

So out of curiosity, I did some measurements with:

dynamic_shared_memory_type=sysv
shared_memory_type=sysv
max_connections=20000

On my lab, ipcs -m gives me:

- on master

key        shmid      owner      perms      bytes      nattch     status
0x00113a04 51347487   postgres   600        1149394944 6
0x4bc9f2fa 51347488   postgres   600        4006976    6
0x46790800 51347489   postgres   600        1048576    2

- with v3 applied

key        shmid      owner      perms      bytes      nattch     status
0x00113a04 51347477   postgres   600        1170227200 6
0x08e04b0a 51347478   postgres   600        4006976    6
0x74688c9c 51347479   postgres   600        1048576    2

So, with 20000 max_connections (not advocating that's a reasonable number in 
all the cases), that's a 1170227200 - 1149394944 = 20 MB increase of shared
memory (expected with 20K max_connections and the new struct size of 1040 Bytes)
as compare to master which is 1096 MB.  It means that v3 produces about 2% shared
memory increase with 20000 max_connections.

Out of curiosity with max_connections=100000, then:

- on master:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00113a04 52953134   postgres   600        5053915136 6
0x37abf5ce 52953135   postgres   600        20006976   6
0x71c07d5c 52953136   postgres   600        1048576    2

- with v3:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00113a04 52953144   postgres   600        5157945344 6
0x7afb24de 52953145   postgres   600        20006976   6
0x2695af58 52953146   postgres   600        1048576    2

So, with 100000 max_connections (not advocating that's a reasonable number in 
all the cases), that's a 5157945344 - 5053915136 = 100 MB increase of shared
memory (expected with 100K max_connections and the new struct size of 1040 Bytes)
as compare to master which is about 4800 MB. It means that v3 produces about
2% shared memory increase with 100000 max_connections.


Then, based on those numbers, I think that the "fixed amount" approach is a good
one as 1/ the amount of shared memory increase is "relatively" small and 2/ most
probably provides performance benefits as compare to the "non fixed" approach.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Nazir Bilal Yavuz
Date:
Hi,

Thanks for working on this!

Your patch applies and builds cleanly.

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> - As stated up-thread, the pg_stat_get_backend_io() behaves as if
> stats_fetch_consistency is set to none (each execution re-fetches counters
> from shared memory). Indeed, the snapshot is not build in each backend to copy
> all the others backends stats, as 1/ there is no use case (there is no need to
> get others backends I/O statistics while taking care of the stats_fetch_consistency)
> and 2/ that could be memory expensive depending of the number of max connections.

I believe this is the correct approach.

I manually tested your patches, and they work as expected. Here is
some feedback:

- The stats_reset column is NULL in both pg_my_stat_io and
pg_stat_get_backend_io() until the first call to reset io statistics.
While I'm not sure if it's necessary, it might be useful to display
the more recent of the two times in the stats_reset column: the
statistics reset time or the backend creation time.

- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: per backend I/O statistics

From
Nazir Bilal Yavuz
Date:
Hi,

On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
> > - The pgstat_reset_io_counter_internal() is called in the
> > pgstat_shutdown_hook(). This causes the stats_reset column showing the
> > termination time of the old backend when its proc num is reassigned to
> > a new backend.
>
> doh! Nice catch, thanks!
>
> And also new backends that are not re-using a previous "existing" process slot
> are getting the last reset time (if any). So the right place to fix this is in
> pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
> 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
> right "conceptually" speaking).

Thanks, I confirm that it is fixed.

You mentioned some refactoring upthread:

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> - If we agree on the current proposal then I'll do  some refactoring in
> pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
> duplicated code (it's not done yet to ease the review).

Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats? If you meant the same thing, please don't
mind it.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
> > > - The pgstat_reset_io_counter_internal() is called in the
> > > pgstat_shutdown_hook(). This causes the stats_reset column showing the
> > > termination time of the old backend when its proc num is reassigned to
> > > a new backend.
> >
> > doh! Nice catch, thanks!
> >
> > And also new backends that are not re-using a previous "existing" process slot
> > are getting the last reset time (if any). So the right place to fix this is in
> > pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
> > 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
> > right "conceptually" speaking).
> 
> Thanks, I confirm that it is fixed.

Thanks!

> You mentioned some refactoring upthread:
> 
> On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > - If we agree on the current proposal then I'll do  some refactoring in
> > pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
> > duplicated code (it's not done yet to ease the review).
> 
> Could we remove pg_stat_get_my_io() completely and use
> pg_stat_get_backend_io() with the current backend's pid to get the
> current backend's stats?

The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the
statistics snapshot is build for "my backend stats" (means it depends of the
stats_fetch_consistency value). I can see use cases for that.

OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
none (each execution re-fetches counters from shared memory) (see [2]). Indeed,
the snapshot is not build in each backend to copy all the others backends stats,
as 1/ I think that there is no use case (there is no need to get others backends
I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
could be memory expensive depending of the number of max connections.

So I think it's better to keep both functions as they behave differently.

Thoughts?

> If you meant the same thing, please don't
> mind it.

What I meant to say is to try to remove the duplicate code that we can find in
those 3 functions (say creating a new function that would contain the duplicate
code and make use of this new function in the 3 others). Did not look at it in
details yet.

[1]: https://www.postgresql.org/message-id/ZtXR%2BCtkEVVE/LHF%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://www.postgresql.org/message-id/ZtsZtaRza9bFFeF8%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: per backend I/O statistics

From
Nazir Bilal Yavuz
Date:
Hi,

On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:
> > Could we remove pg_stat_get_my_io() completely and use
> > pg_stat_get_backend_io() with the current backend's pid to get the
> > current backend's stats?
>
> The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the
> statistics snapshot is build for "my backend stats" (means it depends of the
> stats_fetch_consistency value). I can see use cases for that.
>
> OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
> none (each execution re-fetches counters from shared memory) (see [2]). Indeed,
> the snapshot is not build in each backend to copy all the others backends stats,
> as 1/ I think that there is no use case (there is no need to get others backends
> I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
> could be memory expensive depending of the number of max connections.
>
> So I think it's better to keep both functions as they behave differently.
>
> Thoughts?

Yes, that is correct. Sorry, you already had explained it and I had
agreed with it but I forgot.

> > If you meant the same thing, please don't
> > mind it.
>
> What I meant to say is to try to remove the duplicate code that we can find in
> those 3 functions (say creating a new function that would contain the duplicate
> code and make use of this new function in the 3 others). Did not look at it in
> details yet.

I got it, thanks for the explanation.

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: per backend I/O statistics

From
Bertrand Drouvot
Date:
Hi,

On Tue, Sep 17, 2024 at 04:47:51PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > So I think it's better to keep both functions as they behave differently.
> >
> > Thoughts?
> 
> Yes, that is correct. Sorry, you already had explained it and I had
> agreed with it but I forgot.

No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com