Re: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS
Date
Msg-id faa92755-e7b8-4b0b-8f75-b419e04b091a@vondra.me
Whole thread Raw
In response to Re: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS
List pgsql-hackers

On 2/5/26 01:15, David Rowley wrote:
> On Thu, 5 Feb 2026 at 04:59, Tomas Vondra <tomas@vondra.me> wrote:
>>
>> On 2/3/26 22:53, David Rowley wrote:
>>> I think generally, offloading more work into the planner for looking
>>> up things that are always the same for every invocation of a plan is
>>> generally good. For anything that moves things more in that direction,
>>> you'd need to ensure that the plan is correctly invalidated when
>>> something changes about the extra thing you looked up in planning.
>>> I've not looked into the specifics of this instance, other than the
>>> function being called which causes the buffer usage is
>>> get_opfamily_proc(). I don't see that called during planning for this
>>> query, so your statement about "already resolved opfamily from the
>>> planner" doesn't seem true.
>>>
>>
>> Yes, that's essentially what I explained yesterday.
>>
>> I'm not sure about moving all these lookups to the planner (from the
>> executor). Could this easily hurt some cases? We'd need to do this for
>> all fields the final plan might need, i.e. we'd need to initialize all
>> fields for all paths we construct - not just the metadata needed by the
>> final cheapest one. That could be a lot of additional lookups, right?
>> That would make the first execution more expensive. Later executions of
>> a query that would be cached, but we'd be still paying the cost for the
>> lookup (cheap).
> 
> We do look some things up at createplan time. e.g calling
> get_opfamily_member_for_cmptype() from prepare_sort_from_pathkeys().
> However, that could still be an additional lookup as the existing
> lookup for the case in question is in ExecSort rather than in
> ExecInitSort(), so there could be a surplus lookup if the sort node is
> never executed.  I doubt that's worth worrying about too much. It's
> normally large memory allocations we want to defer until fetching the
> first row from the node.
> 

Perhaps, you may be right about this. I'm still skeptical about having
to figure out all the fields we'll need during execution. I doubt we'll
go around and adjust all those places, so the discussion about the cost
of additional lookups are purely theoretical.

>> Of course, there's a lot of such fields - this amproc is just an
>> example. I don't think it's practical to rework all of that to
>> initialize everything early.
>>
>>> However, I imagine this is just one of many things and if you're
>>> aiming to stabilise your tests doing this, then you'll likely be
>>> fixing things for a long time to come. I imagine a better way if
>>> you're just doing this for running test suites would be to invent some
>>> way to prepopulate the sys and cat caches. Maybe that could be a
>>> contrib module. I don't think there's a way to get a relcache miss
>>> during execution, but I guess it could change with prepared statements
>>> if we ever decided to start evicting long unused relcache entries one
>>> day.
>>>
>>
>> I'm not sure there's a good way to warmup the sys/cat caches, short of
>> loading everything. Because how would you know what will be needed by
>> the execution plan? It seems very complex.
> 
> I imagined if it's just for machines running tests then you could just
> load everything. If it was coded in such a way that a tuple fetched by
> doing a Seq Scan on the catalogue table was what went into the cache,
> rather than the Seq Scan drives the normal cache lookup code,
> resulting in a subsequent Index Scan on the catalogue's index, then it
> could be done with fairly low overhead. I imagine in the order of
> <10ms from fresh initdb. That doesn't seem excessively long for
> machines running tests in the background.
> 

So we'd just go through all the caches relcaches/catcaches/... and load
all the stuff that's in pg_catalog? I guess that could work, although
I'm not sure how convenient would it be for objects created in the tests
themselves (I'm not sure if those end up in the cache right away).

>> One option would be to run the queries twice - the first one would
>> warmup caches, the second execution would be the measured one. But that
>> makes the tool 2x as expensive.
> 
> Maybe. I don't know the tool or how people use it.
> 
>> But what if we had a way to *invalidate* all the caches? That'd also
>> make the behavior deterministic - there would be no hits initially. And
>> it seems much simpler to implement.
> 
> There is debug_discard_caches, but that requires an assert-enabled build.
> 

Right. I suppose we could have a SQL function calling
InvalidateSystemCachesExtended to do that. In fact, an extension can
probably add such function already.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Pasword expiration warning
Next
From: Tom Lane
Date:
Subject: Re: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS