Re: Track IO times in pg_stat_io - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Track IO times in pg_stat_io |
Date | |
Msg-id | 20230309003438.rectf7xo7pw5t5cj@awork3.anarazel.de Whole thread Raw |
In response to | Re: Track IO times in pg_stat_io ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Track IO times in pg_stat_io
Re: Track IO times in pg_stat_io |
List | pgsql-hackers |
Hi, On 2023-03-08 12:55:34 +0100, Drouvot, Bertrand wrote: > On 3/7/23 7:47 PM, Andres Freund wrote: > > On 2023-03-07 13:43:28 -0500, Melanie Plageman wrote: > > > > Now I've a second thought: what do you think about resetting the related number > > > > of operations and *_time fields when enabling/disabling track_io_timing? (And mention it in the doc). > > > > > > > > That way it'd prevent bad interpretation (at least as far the time per operation metrics are concerned). > > > > > > > > Thinking that way as we'd loose some (most?) benefits of the new *_time columns > > > > if one can't "trust" their related operations and/or one is not sampling pg_stat_io frequently enough (to discardthe samples > > > > where the track_io_timing changes occur). > > > > > > > > But well, resetting the operations could also lead to bad interpretation about the operations... > > > > > > > > Not sure about which approach I like the most yet, what do you think? > > > > > > Oh, this is an interesting idea. I think you are right about the > > > synchronization issues making the statistics untrustworthy and, thus, > > > unuseable. > > > > No, I don't think we can do that. It can be enabled on a per-session basis. > > Oh right. So it's even less clear to me to get how one would make use of those new *_time fields, given that: > > - pg_stat_io is "global" across all sessions. So, even if one session is doing some "testing" and needs to turn track_io_timingon, then it > is even not sure it's only reflecting its own testing (as other sessions may have turned it on too). I think for 17 we should provide access to per-existing-connection pg_stat_io stats, and also provide a database aggregated version. Neither should be particularly hard. > - There is the risk mentioned above of bad interpretations for the "time per operation" metrics. > > - Even if there is frequent enough sampling of it pg_stat_io, one does not know which samples contain track_io_timing changes(at the cluster or session level). You'd just make the same use of them you do with pg_stat_database.blks_read etc today. I don't think it's particularly useful to use the time to calculate "per IO" costs - they can vary *drastically* due to kernel level buffering. The point of having the time available is that it provides information that the number of operations doesn't provide. > > I think we simply shouldn't do anything here. This is a pre-existing issue. > > Oh, never thought about it. You mean like for pg_stat_database.blks_read and pg_stat_database.blk_read_time for example? Yes. Greetings, Andres Freund
pgsql-hackers by date: