Re: RFC: Allow EXPLAIN to Output Page Fault Information - Mailing list pgsql-hackers

From Atsushi Torikoshi
Subject Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date
Msg-id CAM6-o=BE=oewSsdNKrbbNjR2muzQR49STXsuta+Wpq6CSyxTag@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Allow EXPLAIN to Output Page Fault Information  ("Jelte Fennema-Nio" <postgres@jeltef.nl>)
List pgsql-hackers

On Mon, Jan 6, 2025 at 6:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,

On Mon, Jan 06, 2025 at 06:49:06PM +0900, torikoshia wrote:
>
> > **However, I think the general direction has merit**: Changing this
> > patch to
> > use `ru_inblock`/`ru_oublock` gives very useful insights. `ru_inblock`
> > is 0 when everything is in page cache, and it is very high when stuff is
> > not. I was only hacking around and basically did this:
> >
> > s/ru_minflt/ru_inblock/g
> > s/ru_majflt/ru_oublock/g
> > [...]
> >  Also, maybe there's some better way
> > of getting read/write numbers for the current process than
> > ru_inblock/ru_oublock (but this one seems to work at least reasonably
> > well).
>
> Maybe, but as far as using getrusage(), ru_inblock and ru_outblock seem the
> best.

FWIW that's the counters we've been using in pg_stat_kcache / powa to compute
disk IO and "postgres shared buffers hit VS OS cache hit VS disk read" hit
ratio for many years and we didn't get any problem report for that.

Thanks for sharing the information!

On Tue, Jan 7, 2025 at 5:09 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
On Mon Jan 6, 2025 at 10:49 AM CET, torikoshia wrote:
> On Tue, Dec 31, 2024 at 7:57 AM Jelte Fennema-Nio <postgres@jeltef.nl>
> Updated the PoC patch to calculate them by KB:
>
>    =# EXPLAIN (ANALYZE, STORAGEIO) SELECT * FROM pgbench_accounts;
>                                                               QUERY PLAN
>   
> ---------------------------------------------------------------------------------------------------------------------------------
>     Seq Scan on pgbench_accounts  (cost=0.00..263935.35 rows=10000035
> width=97) (actual time=1.447..3900.279 rows=10000000 loops=1)
>       Buffers: shared hit=2587 read=161348
>     Planning Time: 0.367 ms
>     Execution:
>       Storage I/O: read=1291856 KB write=0 KB
>     Execution Time: 4353.253 ms
>    (6 rows)
>
>

The core functionality works well in my opinion. I think it makes sense
to spend the effort to move this from PoC quality to something
committable.

Thanks for the comment and review!
If there are no other opinions, I will make a patch based on the direction of the current PoC patch.
  
4. I think this setting should be enabled by default for ANALYZE, just
   like BUFFERS is now since c2a4078e[1].

5. I'm wondering whether this option deserves its own EXPLAIN option, or
   if it should simply be made part of BUFFERS.
 
Although this is not information about PostgreSQL buffers, I now feel that making this addition part of BUFFERS is attractive, since people using BUFFERS would also sometimes want to know about the storage I/O. 

Since BUFFERS is now ON by default with EXPLAIN ANALYZE, I am concerned about the performance impact.
However, if it is limited to just twice—once at the start and once at the end—for the planning phase, execution phase, and each parallel worker, I believe the impact is relatively small.
 
9. I think this division by 2 could use some explanation in a comment. I
   understand that you're doing this because linux divides its original
   bytes using 512 bytes[2] and your additional factor of 2 gets that to
   1024 bytes. But that's not clear immediately from the code.

   I'm also not convinced that 512 is the blocksize if this logic is
   even correct on every platform. I'm wondering if maybe we should
   simply show the blocks after all.

Maybe so. I'll look into this and then decide the unit. 

For the other comments, I plan to address them as you suggested.

 
Regards,
Atsushi Torikoshi

pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: More reliable nbtree detection of unsatisfiable RowCompare quals involving a leading NULL key/element
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: Sample rate added to pg_stat_statements