Re: Track IO times in pg_stat_io - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Track IO times in pg_stat_io |
Date | |
Msg-id | CAAKRu_Z1x5FX6NA2Qa2S4w_=uzWTP-QcyomTOVUkgBYOVkxduw@mail.gmail.com Whole thread Raw |
In response to | Re: Track IO times in pg_stat_io (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Track IO times in pg_stat_io
|
List | pgsql-hackers |
v5 attached mostly addresses instr_time persistence issues. On Tue, Mar 14, 2023 at 6:56 PM Andres Freund <andres@anarazel.de> wrote: > On 2023-03-09 11:50:38 -0500, Melanie Plageman wrote: > > On Tue, Mar 7, 2023 at 1:39 PM Andres Freund <andres@anarazel.de> wrote: > > > On 2023-03-06 11:30:13 -0500, Melanie Plageman wrote: > > > > > As pgstat_bktype_io_stats_valid() is called only in Assert(), I think that would be a good idea > > > > > to also check that if counts are not Zero then times are not Zero. > > > > > > > > Yes, I think adding some validation around the relationship between > > > > counts and timing should help prevent developers from forgetting to call > > > > pg_stat_count_io_op() when calling pgstat_count_io_time() (as relevant). > > > > > > > > However, I think that we cannot check that if IO counts are non-zero > > > > that IO times are non-zero, because the user may not have > > > > track_io_timing enabled. We can check that if IO times are not zero, IO > > > > counts are not zero. I've done this in the attached v3. > > > > > > And even if track_io_timing is enabled, the timer granularity might be so low > > > that we *still* get zeroes. > > > > > > I wonder if we should get rid of pgStatBlockReadTime, pgStatBlockWriteTime, > > > > And then have pg_stat_reset_shared('io') reset pg_stat_database IO > > stats? > > Yes. I think this makes sense but I am hesitant to do it in this patchset, because it feels a bit hidden...maybe? But, if you feel strongly, I will make the change. > > > > typedef struct PgStat_BktypeIO > > > > { > > > > - PgStat_Counter data[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; > > > > + PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; > > > > + instr_time times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; > > > > } PgStat_BktypeIO; > > > > > > Ah, you're going to hate me. We can't store instr_time on disk. There's > > > another patch that gets substantial peformance gains by varying the frequency > > > at which instr_time keeps track of time based on the CPU frequency... > > > > What does that have to do with what we can store on disk? > > The frequency can change. Ah, I see. > > If so, would it not be enough to do this when reading/writing the stats > > file? > > Theoretically yes. But to me it seems cleaner to do it when flushing to shared > stats. See also the overflow issue below. > > > > It also just doesn't have enough range to keep track of system wide > > > time on a larger system. A single backend won't run for 293 years, but > > > with a few thousand backends that's a whole different story. > > > > > > I think we need to accumulate in instr_time, but convert to floating point > > > when flushing stats. > > > > Hmmm. So, are you saying that we need to read from disk when we query > > the view and add that to what is in shared memory? That we only store > > the delta since the last restart in the instr_time array? > > No, I don't think I am suggesting that. What I am trying to suggest is that > PendingIOStats should contain instr_time, but that PgStat_IO should contain > PgStat_Counter as microseconds, as before. So, I've modified the code to make a union of instr_time and PgStat_Counter in PgStat_BktypeIO. I am not quite sure if this is okay. I store in microsec and then in pg_stat_io, I multiply to get milliseconds for display. I considered refactoring pgstat_io_end() to use INSTR_TIME_ACCUM_DIFF() like [1], but, in the end I actually think I would end up with more operations because of the various different counters needing to be updated. As it is now, I do a single subtract and a few adds (one for each of the different statistics objects tracking IO times (pgBufferUsage, pgStatBlockWrite/ReadTime). Whereas, I would need to do an accum diff for every one of those. - Melanie [1] https://www.postgresql.org/message-id/1feedb83-7aa9-cb4b-5086-598349d3f555%40gmail.com
Attachment
pgsql-hackers by date: