On Mon, Apr 24, 2023 at 02:14:32PM -0700, Andres Freund wrote:
> Hi,
>
> On 2023-04-24 16:39:36 -0400, Melanie Plageman wrote:
> > On Wed, Apr 19, 2023 at 10:23:26AM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > I noticed that the numbers in pg_stat_io dont't quite add up to what I
> > > expected in write heavy workloads. Particularly for checkpointer, the numbers
> > > for "write" in log_checkpoints output are larger than what is visible in
> > > pg_stat_io.
> > >
> > > That partially is because log_checkpoints' "write" covers way too many things,
> > > but there's an issue with pg_stat_io as well:
> > >
> > > Checkpoints, and some other sources of writes, will often end up doing a lot
> > > of smgrwriteback() calls - which pg_stat_io doesn't track. Nor do any
> > > pre-existing forms of IO statistics.
> > >
> > > It seems pretty clear that we should track writeback as well. I wonder if it's
> > > worth doing so for 16? It'd give a more complete picture that way. The
> > > counter-argument I see is that we didn't track the time for it in existing
> > > stats either, and that nobody complained - but I suspect that's mostly because
> > > nobody knew to look.
> >
> > Not complaining about making pg_stat_io more accurate, but what exactly
> > would we be tracking for smgrwriteback()? I assume you are talking about
> > IO timing. AFAICT, on Linux, it does sync_file_range() with
> > SYNC_FILE_RANGE_WRITE, which is asynchronous. Wouldn't we just be
> > tracking the system call overhead time?
>
> It starts blocking once "enough" IO is in flight. For things like an immediate
> checkpoint, that can happen fairly quickly, unless you have a very fast IO
> subsystem. So often it'll not matter whether we track smgrwriteback(), but
> when it matter, it can matter a lot.
I see. So, it sounds like this is most likely to happen for checkpointer
and not likely to happen for other backends who call
ScheduleBufferTagForWriteback(). Also, it seems like this (given the
current code) is only reachable for permanent relations (i.e. not for IO
object temp relation). If other backend types than checkpointer may call
smgrwriteback(), we likely have to consider the IO context. I would
imagine that we want to smgrwriteback() timing to writes/write time for
the relevant IO context and backend type.
- Melanie