Re: Stack-based tracking of per-node WAL/buffer usage - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Stack-based tracking of per-node WAL/buffer usage
Date
Msg-id rhah5kx5ftfztu532zck4mhc4hxtfis4z552laisd53sf6ycrn@wnbste4hleyg
Whole thread
In response to Re: Stack-based tracking of per-node WAL/buffer usage  (Lukas Fittl <lukas@fittl.com>)
Responses Re: Stack-based tracking of per-node WAL/buffer usage
List pgsql-hackers
Hi,

On 2026-04-07 13:30:11 -0700, Lukas Fittl wrote:
> 0001 is the change to make queryDesc->totaltime be allocated by
> ExecutorStart instead of plugins themselves, and adds a
> queryDesc->totaltime_options to have plugins request which level of
> summary instrumentation they need. This change is pretty simple, and
> could still make sense to get into 19. Because of the earlier
> Instrumentation refactoring that was pushed (thanks!) we're already
> asking extensions allocating queryDesc->totaltime to modify their use
> of InstrAlloc, so I think we might as well clean this up now.

Hm.  That's a fair argument. They indeed would have to again change next
release

 It's not a complicated change and removes more lines than it adds.

I guess one thing I'm not sure is whether the fields shouldn't be renamed at
the same time:

a) To prevent extensions from continuing to set it, most of them do not test
   against assert enabled builds. With a different name they would get a
   compiler error.

b) "totaltime" and "totaltime_options" are pretty poor descriptors of tracking
   query level statistics.  If everyone has to change anyway, this is a good
   occasion.

'query_instr[_options]'?


Any opinions?



> 0002 is just ExecProcNodeInstr moved to instrument.c, as Andres had
> suggested previously. We still get some quick performance wins from
> doing that (see end of email), and again, its a simple change, so
> could be considered if someone has bandwidth remaining. I've added a
> later patch that then does the more complex inlining and gets us the
> full speed up.

Here it needs a few more inlines to get the full performance, otherwise it
doesn't inline all the helpers.  I think on balance I didn't like the
prototype in instrument.h, that's too widely included, and it might even cause
some circularity issues.  It seems better to do the somewhat ugly thing and
have the prototype be in executor.h.


> 0002 measurements (with current master and TSC clock source used for
> timing, best of three):
> 
> CREATE TABLE lotsarows(key int not null);
> INSERT INTO lotsarows SELECT generate_series(1, 50000000);
> VACUUM FREEZE lotsarows;

With the somewhat more extreme benchmark I used in the rdtsc thread and the
added inline mentioned above I see a bit bigger wins. See the attached
explainbench.sql - it doesn't quite cover all the combinations, but I think it
gives a good enough overview.

c=1 pgbench -f ~/tmp/explainbench.sql -P5 -r -t 10

master:
statement latencies in milliseconds and failures:
       200.800           0 SELECT pg_prewarm('pgbench_accounts');
         0.098           0 PREPARE query AS SELECT * FROM pgbench_accounts OFFSET 5000000 LIMIT 1;
       212.010           0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING OFF)
       268.648           0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING ON)
       232.421           0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING OFF)
       283.531           0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING ON)
         0.030           0 DEALLOCATE query;


0002:

statement latencies in milliseconds and failures:
       201.558           0 SELECT pg_prewarm('pgbench_accounts');
         0.103           0 PREPARE query AS SELECT * FROM pgbench_accounts OFFSET 5000000 LIMIT 1;
       188.696           0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING OFF)
       244.479           0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING ON)
       223.773           0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING OFF)
       266.947           0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING ON)
         0.034           0 DEALLOCATE query;

That's something like 4-12%.

Pretty nice for a patch that just adds a few lines around and adds a few
inlines.


> At this point I'd say its safe to say that we should push out later
> changes to PG20, because it needs another good look over, and I don't
> think Andres or Heikki have the capacity for that today (but I really
> appreciate all the effort put in by both of you!).

Indeed.


> @@ -334,6 +334,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
>  
>      if (auto_explain_enabled())
>      {
> +        /* We're always interested in runtime */
> +        queryDesc->totaltime_options |= INSTRUMENT_TIMER;

> -            queryDesc->totaltime = InstrAlloc(INSTRUMENT_ALL);

Not that it's going to make a significant difference, but it is nice that this
now would need to track less.


Kinda wonder about having
  EXPLAIN (ANALYZE BUFFERS totals_only, WAL totals_only) ...;

in plenty cases that'd be all one needs, at substantially lower cost.


Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_plan_advice
Next
From: Andreas Karlsson
Date:
Subject: Re: updates for handling optional argument in system functions