Re: RFC: Allow EXPLAIN to Output Page Fault Information - Mailing list pgsql-hackers

From Jeremy Schneider
Subject Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date
Msg-id 20241227103243.39922304@jeremy-ThinkPad-T430s
Whole thread Raw
In response to Re: RFC: Allow EXPLAIN to Output Page Fault Information  ("Jelte Fennema-Nio" <postgres@jeltef.nl>)
List pgsql-hackers
On Fri, 27 Dec 2024 15:15:40 +0100
"Jelte Fennema-Nio" <postgres@jeltef.nl> wrote:

> On Tue Dec 24, 2024 at 4:52 PM CET, Tom Lane wrote:
> > torikoshia <torikoshia@oss.nttdata.com> writes:  
> >> I have attached a PoC patch that modifies EXPLAIN to include page
> >> fault information during both the planning and execution phases of
> >> a query.   
> >
> > Surely these numbers would be too unstable to be worth anything.  
> 
> What makes you think that? I'd expect them to be similarly stable to
> the numbers we get for BUFFERS. i.e. Sure they won't be completely
> stable, but I expect them to be quite helpful when debugging perf
> issues, because large numbers indicate that the query is disk-bound
> and small numbers indicate that it is not.
> 
> These numbers seem especially useful for setups where shared_buffers
> is significantly smaller than the total memory available to the
> system. In those cases the output from BUFFERS might give the
> impression that that you're disk-bound, but if your working set still
> fits into OS cache then the number of page faults is likely still
> low. Thus telling you that the numbers that you get back from BUFFERS
> are not as big of a problem as they might seem.

We'd probably need to combine both pg_buffercache_evict() and
/proc/sys/vm/drop_caches to get stable numbers - which is something I
have done in the past for testing.

Another thought would be splitting out the IO timing information into
two values - IO timing for reads that triggered major faults, versus
IO timing for reads that did not.

And system views like pg_stat_database seem worth considering too.

-Jeremy



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Connection limits/permissions, slotsync workers, etc
Next
From: James Hunter
Date:
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.