Re: RFC: Allow EXPLAIN to Output Page Fault Information - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date
Msg-id D6V9GJ2LOLC5.1ZF3BWKKEBZWC@jeltef.nl
Whole thread Raw
In response to Re: RFC: Allow EXPLAIN to Output Page Fault Information  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: RFC: Allow EXPLAIN to Output Page Fault Information
List pgsql-hackers
On Mon Jan 6, 2025 at 10:49 AM CET, torikoshia wrote:
> On Tue, Dec 31, 2024 at 7:57 AM Jelte Fennema-Nio <postgres@jeltef.nl>
> Updated the PoC patch to calculate them by KB:
>
>    =# EXPLAIN (ANALYZE, STORAGEIO) SELECT * FROM pgbench_accounts;
>                                                               QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------
>     Seq Scan on pgbench_accounts  (cost=0.00..263935.35 rows=10000035
> width=97) (actual time=1.447..3900.279 rows=10000000 loops=1)
>       Buffers: shared hit=2587 read=161348
>     Planning Time: 0.367 ms
>     Execution:
>       Storage I/O: read=1291856 KB write=0 KB
>     Execution Time: 4353.253 ms
>    (6 rows)
>
>

The core functionality works well in my opinion. I think it makes sense
to spend the effort to move this from PoC quality to something
committable. Below some of the things that are necessary to do that
after an initial pass over the code (and trying it out):

>> One other thing that I noticed when playing around with this, which
>> would need to be addressed: Parallel workers need to pass these values
>> to the main process somehow, otherwise the IO from those processes gets
>> lost.
>
> Yes.
> I haven't developed it yet but I believe we can pass them like
> buffer/WAL usage.

1. Yeah, makes sense to do this the same way as we do for buffers. Let's do
   this now.

> +        if (es->storageio)
> +        {
> +                getrusage(RUSAGE_SELF, &rusage);
> +
> +                storageio.inblock = rusage.ru_inblock - storageio_start.inblock;
> +                storageio.outblock = rusage.ru_oublock - storageio_start.outblock;
> +
> +                if (es->format == EXPLAIN_FORMAT_TEXT)
> +                {
> +                        ExplainIndentText(es);
> +                        appendStringInfoString(es->str, "Execution:\n");
> +                        es->indent++;
> +                }
> +                show_storageio(es, &storageio);
> +
> +                if (es->format == EXPLAIN_FORMAT_TEXT)
> +                        es->indent--;
> +                ExplainCloseGroup("Execution", "Execution", true, es);
> +        }

2. The current code always shows "Execution: " in the explain analyze
   output, even if no storageio is done. I think this should use
   peek_storageio() to check if any storageio was done and only show the
   "Execution: " line if that is the case.


3. FORMAT JSON seems to be broken with this patch. With the following
   simple query:

explain (ANALYZE, STORAGEIO, FORMAT JSON) select max(a), max(b) from t_big;

   I get this this assert failure:

TRAP: failed Assert("es->indent == 0"), File: "../src/backend/commands/explain.c", Line: 375, PID: 199034
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(ExceptionalCondition+0x74)[0x5ad72872b464]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(ExplainQuery+0x75b)[0x5ad7283c87bb]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(standard_ProcessUtility+0x595)[0x5ad7285e97f5]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(+0x4daadf)[0x5ad7285e7adf]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(+0x4dafc4)[0x5ad7285e7fc4]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(PortalRun+0x32d)[0x5ad7285e834d]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(+0x4d70a2)[0x5ad7285e40a2]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(PostgresMain+0x16e9)[0x5ad7285e5b39]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(BackendMain+0x5f)[0x5ad7285e02df]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(postmaster_child_launch+0xe1)[0x5ad72853cde1]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(+0x433758)[0x5ad728540758]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(PostmasterMain+0xddd)[0x5ad72854223d]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(main+0x1d0)[0x5ad72828b600]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x714aa222a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x714aa222a28b]
postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(_start+0x25)[0x5ad72828bc05]

   FORMAT JSON should obviously not crash the server, but apart from
   that it should also actually show this new data in the json output.

4. I think this setting should be enabled by default for ANALYZE, just
   like BUFFERS is now since c2a4078e[1].

5. I'm wondering whether this option deserves its own EXPLAIN option, or
   if it should simply be made part of BUFFERS.

6. Windows compilation is currently failing on the CFbot. Looking at the
   output, that's because rusage does not contain these fields there. I
   think you'll need some #ifdefs

7. The result of getrusage() should be checked for errors and we should
   report the error. (eventhough it's very unexpected to ever fail).

8. This needs docs


> +                appendStringInfo(es->str, " read=%ld KB", (long) usage->inblock / 2);
> +                appendStringInfo(es->str, " write=%ld KB", (long) usage->outblock / 2);

9. I think this division by 2 could use some explanation in a comment. I
   understand that you're doing this because linux divides its original
   bytes using 512 bytes[2] and your additional factor of 2 gets that to
   1024 bytes. But that's not clear immediately from the code.

   I'm also not convinced that 512 is the blocksize if this logic is
   even correct on every platform. I'm wondering if maybe we should
   simply show the blocks after all.


[1]: https://github.com/postgres/postgres/commit/c2a4078ebad71999dd451ae7d4358be3c9290b07
[2]:
https://github.com/torvalds/linux/blob/fbfd64d25c7af3b8695201ebc85efe90be28c5a3/include/linux/task_io_accounting_ops.h#L16-L23



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Proposal: add new API to stringinfo
Next
From: Jeff Davis
Date:
Subject: Re: Incorrect CHUNKHDRSZ in nodeAgg.c