Re: pg_stat_io for the startup process - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg_stat_io for the startup process
Date
Msg-id 20230425183914.i66b7q7uinogkyoz@awork3.anarazel.de
Whole thread Raw
In response to Re: pg_stat_io for the startup process  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: pg_stat_io for the startup process
List pgsql-hackers
Hi,

On 2023-04-25 13:54:43 -0400, Melanie Plageman wrote:
> On Tue, Apr 25, 2023 at 10:51:14PM +0900, Fujii Masao wrote:
> > Regarding pg_stat_io for the startup process, I noticed that the counters
> > are only incremented after the startup process exits, not during WAL replay
> > in standby mode. This is because pgstat_flush_io() is only called when
> > the startup process exits. Shouldn't it be called during WAL replay as well
> > to report IO statistics by the startup process even in standby mode?
> 
> Yes, we definitely want stats from the startup process on the standby.
> Elsewhere on the internet where you originally raised this, I mentioned
> that I hacked a pgstat_flush_io() into the redo apply loop in
> PerformWalRecovery() but that I wasn't sure that this was affordable.
> Andres Freund replied saying that it would be too expensive and
> suggested that the set up a regular timeout which sets a flag that's
> checked by HandleStartupProcInterrupts().

It's tempting to try to reuse the STARTUP_PROGRESS_TIMEOUT timer. But it's
controlled by a GUC, so it's not really suitable.


> I'm wondering if this is something we consider a bug and thus would be
> under consideration for 16.

I'm mildly inclined to not consider it a bug, given that this looks to have
been true for other stats for quite a while? But it does still seem worth
improving upon - I'd make the consideration when to apply the relevant patches
depend on the complexity. I'm worried we'd need to introduce sufficiently new
infrastructure that 16 doesn't seem like a good idea. Let's come up with a
patch and judge it after?


> > Also, the pg_stat_io view includes a row with backend_type=startup and
> > context=vacuum, but it seems that the startup process doesn't perform
> > any I/O operations with BAS_VACUUM. If this understanding is right,
> > shouldn't we omit this row from the view? Additionally, I noticed that
> > the view also includes a row with backend_type=startup and
> > context=bulkread / bulkwrite. Do these operations actually occur
> > during startup process?
> 
> Hmm. Yes, I remember posing this question on the thread and not getting
> an answer. I read some code and did some testing and can't see a way we
> would end up with the startup process doing IO in a non-normal context.
> 
> Certainly I can't see how startup process would ever use a BAS_VACUUM
> context given that it executes heap_xlog_vacuum().
> 
> I thought at some point I had encountered an assertion failure when I
> banned the startup process from tracking io operations in bulkread and
> bulkwrite contexts. But, I'm not seeing how that could happen.

It's possible that we decided to not apply such restrictions because the
startup process can be made to execute more code via the extensible
rmgrs.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call
Next
From: Robert Haas
Date:
Subject: Re: pg_stat_io for the startup process