Stack-based tracking of per-node WAL/buffer usage - Mailing list pgsql-hackers

From Lukas Fittl
Subject Stack-based tracking of per-node WAL/buffer usage
Date
Msg-id CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
Whole thread Raw
Responses Re: Stack-based tracking of per-node WAL/buffer usage
List pgsql-hackers
Hi,

Please find attached a patch series that introduces a new paradigm for how per-node WAL/buffer usage is tracked, with two primary goals: (1) reduce overhead of EXPLAIN ANALYZE, (2) enable future work like tracking estimated distinct buffer hits [0].

Currently we utilize pgWalUsage/pgBufferUsage as global counters, and in InstrStopNode we call the rather expensive BufferUsageAccumDiff/WalUsageAccumDiff to know how much activity happened within a given node cycle.

This proposal instead uses a stack, where each time we enter a node (InstrStartNode) we point a new global (pgInstrStack) to the current stack entry. Whilst we're in that node we increment buffer/WAL usage statistics to the stack entry. On exit (InstrStopNode) we restore the previous entry.

This change provides about a 10% performance benefit for EXPLAIN ANALYZE on paths that repeatedly enter InstrStopNode, e.g. SELECT COUNT(*):

CREATE TABLE test(id int);
INSERT INTO test SELECT * FROM generate_series(0, 1000000);

master (124ms, best out of 3):

postgres=# EXPLAIN (ANALYZE) SELECT COUNT(*) FROM test;
                                                       QUERY PLAN                                                      
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16925.01..16925.02 rows=1 width=8) (actual time=124.910..124.910 rows=1.00 loops=1)
   Buffers: shared hit=752 read=3673
   ->  Seq Scan on test  (cost=0.00..14425.01 rows=1000001 width=0) (actual time=0.201..62.228 rows=1000001.00 loops=1)
         Buffers: shared hit=752 read=3673
 Planning Time: 0.116 ms
 Execution Time: 124.961 ms

patched (109ms, best out of 3):

postgres=# EXPLAIN (ANALYZE) SELECT COUNT(*) FROM test;
                                                       QUERY PLAN                                                      
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16925.01..16925.02 rows=1 width=8) (actual time=109.788..109.788 rows=1.00 loops=1)
   Buffers: shared hit=940 read=3485
   ->  Seq Scan on test  (cost=0.00..14425.01 rows=1000001 width=0) (actual time=0.153..69.368 rows=1000001.00 loops=1)
         Buffers: shared hit=940 read=3485
 Planning Time: 0.134 ms
 Execution Time: 109.837 ms
(6 rows)

I have also prototyped a more ambitious approach that completely removes pgWalUsage/pgBufferUsage (utilizing the stack-collected data for e.g. pg_stat_statements), but for now this patch set does not include that change, but instead keeps adding to these legacy globals as well.

Patches attached:

0001: Separate node instrumentation from other use of Instrumentation struct
   
    Previously different places (e.g. query "total time") were repurposing the per-node Instrumentation struct. Instead, simplify the Instrumentation struct to only track time, WAL/buffer usage, and tuple counts. Similarly, drop the use of InstrEndLoop outside of per-node instrumentation. Introduce the NodeInstrumentation struct to carry forward the per-node instrumentation information.

0002: Replace direct changes of pgBufferUsage/pgWalUsage with INSTR_* macros

0003: Introduce stack for tracking per-node WAL/buffer usage

Feedback/thoughts welcome!

CCing Andres since he had expressed interest in this off-list.

[0]: See lightning talk slides from PGConf.Dev discussing an HLL-based EXPLAIN (BUFFERS DISTINCT): https://resources.pganalyze.com/pganalyze_PGConf.dev_2025_shared_blks_hit_distinct.pdf

Thanks,
Lukas

--
Lukas Fittl
Attachment

pgsql-hackers by date:

Previous
From: Atsushi Torikoshi
Date:
Subject: Re: COPY TO: provide hint when WHERE clause is used
Next
From: Michael Paquier
Date:
Subject: Re: Invalid remote sampling test in postgres_fdw.