Re: Stack-based tracking of per-node WAL/buffer usage - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Stack-based tracking of per-node WAL/buffer usage
Date
Msg-id a1edb578-8a54-4f7a-ad74-11ce9cef291a@iki.fi
Whole thread Raw
In response to Re: Stack-based tracking of per-node WAL/buffer usage  (Lukas Fittl <lukas@fittl.com>)
Responses Re: Stack-based tracking of per-node WAL/buffer usage
List pgsql-hackers
On 24/03/2026 08:03, Lukas Fittl wrote:
> Instead I've tried introducing a memory context for instrumentation
> managed as a resource owner, and I am now (for now) convinced that
> this is the right trade-off for the problem at hand.

Yes, that seems better.

This patch could use an overview README file, I'm struggling to 
understand how the this all works. Here's my understanding so far, 
please correct me if I'm wrong:

There are *two* data structures tracking the Instrumentation nodes. The 
patch only talks about a stack, but I think there's also implicitly a 
tree in there.

Tree
----

All Instrumentation nodes are part of a tree. For example, if you have 
two portals open, the tree might look like this:

Session - Query A - NestLoop - Seq Scan A
                              - Seq Scan B

         - Query B - Seq Scan C

When a node is "finalized", its counters are added to its parent.

This tree is a somewhat implicit in the patch. Each QueryInstrumentation 
has a list of child nodes, but only unfinalized ones. Don't we need that 
at the session level too? When a Query is released on abort, its 
counters need to be added to the parent too. If I understand correctly, 
the patch tries to use the stack for that, but it's confusing.

I think it would make the patch more clear to talk explicitly about the 
tree, and represent it explicitly in the Instrumentation nodes. I.e. add 
a "parent" pointer, or a "children" list, or both to the Instrumentation 
struct.


Stack
-----

At all times, there's a stack that tracks what is the Instrumentation in 
the tree that is *currently* executing. For example, while executing the 
Seq Scan B, the stack would look like this:

0: Session
1: Query A
2: NestLoop
3: Seq Scan B

And when the code is sending a result row back to the client, while the 
query is being executed, the stack would be just:

0: Session


In the patch, the stack is represented by an array. It could also be 
implemented with a CurrentInstrumentation global variable, similar to 
CurrentMemoryContext and CurrentResourceOwner.


Abort handling
--------------

On abort, two things need to happen:

1. Reset the stack to the appropriate level. This ensures that any we 
don't later try to update the counters on an Instrumentation nodes that 
is going away with the abort. In the above example, the stack would be 
reset to the 0: Session level.

2. Finalize all the Instrumentation nodes as part of the ResourceOwner 
cleanup. All Instrumentation nodes that are released roll up their 
counters to their parents.


Questions:

Is the stack always a path from the root of the tree, down to some node? 
Or could you have e.g. recursion like A -> B -> C -> A? (I don't know if 
it makes a difference, just wondering)

What happens if you release e.g. the NestLoop before its children? All 
the Instrumentation nodes belonging to a query would usually be part of 
the same ResourceOwner and there's no guarantee on what order the 
resources are released.

- Heikki




pgsql-hackers by date:

Previous
From: Jim Jones
Date:
Subject: Re: VACUUM FULL, CLUSTER, and REPACK block on other sessions' temp tables
Next
From: Ed Behn
Date:
Subject: Re: access numeric data in module