Re: pg_stat_io not tracking smgrwriteback() is confusing - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: pg_stat_io not tracking smgrwriteback() is confusing
Date
Msg-id 20230424213748.k6rpddvtjfsn5bfk@liskov
Whole thread Raw
In response to Re: pg_stat_io not tracking smgrwriteback() is confusing  (Andres Freund <andres@anarazel.de>)
Responses Re: pg_stat_io not tracking smgrwriteback() is confusing
List pgsql-hackers
On Mon, Apr 24, 2023 at 02:14:32PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2023-04-24 16:39:36 -0400, Melanie Plageman wrote:
> > On Wed, Apr 19, 2023 at 10:23:26AM -0700, Andres Freund wrote:
> > > Hi,
> > > 
> > > I noticed that the numbers in pg_stat_io dont't quite add up to what I
> > > expected in write heavy workloads. Particularly for checkpointer, the numbers
> > > for "write" in log_checkpoints output are larger than what is visible in
> > > pg_stat_io.
> > > 
> > > That partially is because log_checkpoints' "write" covers way too many things,
> > > but there's an issue with pg_stat_io as well:
> > > 
> > > Checkpoints, and some other sources of writes, will often end up doing a lot
> > > of smgrwriteback() calls - which pg_stat_io doesn't track. Nor do any
> > > pre-existing forms of IO statistics.
> > > 
> > > It seems pretty clear that we should track writeback as well. I wonder if it's
> > > worth doing so for 16? It'd give a more complete picture that way. The
> > > counter-argument I see is that we didn't track the time for it in existing
> > > stats either, and that nobody complained - but I suspect that's mostly because
> > > nobody knew to look.
> > 
> > Not complaining about making pg_stat_io more accurate, but what exactly
> > would we be tracking for smgrwriteback()? I assume you are talking about
> > IO timing. AFAICT, on Linux, it does sync_file_range() with
> > SYNC_FILE_RANGE_WRITE, which is asynchronous. Wouldn't we just be
> > tracking the system call overhead time?
> 
> It starts blocking once "enough" IO is in flight. For things like an immediate
> checkpoint, that can happen fairly quickly, unless you have a very fast IO
> subsystem. So often it'll not matter whether we track smgrwriteback(), but
> when it matter, it can matter a lot.

I see. So, it sounds like this is most likely to happen for checkpointer
and not likely to happen for other backends who call
ScheduleBufferTagForWriteback(). Also, it seems like this (given the
current code) is only reachable for permanent relations (i.e. not for IO
object temp relation). If other backend types than checkpointer may call
smgrwriteback(), we likely have to consider the IO context. I would
imagine that we want to smgrwriteback() timing to writes/write time for
the relevant IO context and backend type.

- Melanie



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Missing update of all_hasnulls in BRIN opclasses
Next
From: Peter Geoghegan
Date:
Subject: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing