Hi,
Please find attached a patch series that introduces a new paradigm for how per-node WAL/buffer usage is tracked, with two primary goals: (1) reduce overhead of EXPLAIN ANALYZE, (2) enable future work like tracking estimated distinct buffer hits [0].
Currently we utilize pgWalUsage/pgBufferUsage as global counters, and in InstrStopNode we call the rather expensive BufferUsageAccumDiff/WalUsageAccumDiff to know how much activity happened within a given node cycle.
This proposal instead uses a stack, where each time we enter a node (InstrStartNode) we point a new global (pgInstrStack) to the current stack entry. Whilst we're in that node we increment buffer/WAL usage statistics to the stack entry. On exit (InstrStopNode) we restore the previous entry.
This change provides about a 10% performance benefit for EXPLAIN ANALYZE on paths that repeatedly enter InstrStopNode, e.g. SELECT COUNT(*):
CREATE TABLE test(id int);
INSERT INTO test SELECT * FROM generate_series(0, 1000000);
master (124ms, best out of 3):
postgres=# EXPLAIN (ANALYZE) SELECT COUNT(*) FROM test;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=16925.01..16925.02 rows=1 width=8) (actual time=124.910..124.910 rows=1.00 loops=1)
Buffers: shared hit=752 read=3673
-> Seq Scan on test (cost=0.00..14425.01 rows=1000001 width=0) (actual time=0.201..62.228 rows=1000001.00 loops=1)
Buffers: shared hit=752 read=3673
Planning Time: 0.116 ms
Execution Time: 124.961 ms
patched (109ms, best out of 3):
postgres=# EXPLAIN (ANALYZE) SELECT COUNT(*) FROM test;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=16925.01..16925.02 rows=1 width=8) (actual time=109.788..109.788 rows=1.00 loops=1)
Buffers: shared hit=940 read=3485
-> Seq Scan on test (cost=0.00..14425.01 rows=1000001 width=0) (actual time=0.153..69.368 rows=1000001.00 loops=1)
Buffers: shared hit=940 read=3485
Planning Time: 0.134 ms
Execution Time: 109.837 ms
(6 rows)
I have also prototyped a more ambitious approach that completely removes pgWalUsage/pgBufferUsage (utilizing the stack-collected data for e.g. pg_stat_statements), but for now this patch set does not include that change, but instead keeps adding to these legacy globals as well.
Patches attached:
0001: Separate node instrumentation from other use of Instrumentation struct
Previously different places (e.g. query "total time") were repurposing the per-node Instrumentation struct. Instead, simplify the Instrumentation struct to only track time, WAL/buffer usage, and tuple counts. Similarly, drop the use of InstrEndLoop outside of per-node instrumentation. Introduce the NodeInstrumentation struct to carry forward the per-node instrumentation information.
0002: Replace direct changes of pgBufferUsage/pgWalUsage with INSTR_* macros
0003: Introduce stack for tracking per-node WAL/buffer usage
Feedback/thoughts welcome!
CCing Andres since he had expressed interest in this off-list.
Thanks,
Lukas
--