Hi,
On 2023-01-13 17:28:31 -0800, Amin wrote:
> Before scanning a relation, in the planner stage, I want to make a call to
> retrieve information about how many pages will be a hit for a specific
> relation. The module pg_buffercache seems to be doing a similar thing.
> Also, pg_statio_all_tables seems to be having that information, but it is
> updated after execution. However, I want the information before execution.
> Also not sure how pg_statio_all_tables is created and how I can access it
> in the code.
There's no cheap way to do that. Currently the only ways are to:
a) Do one probe of the buffer mapping table for each block of the
relation. Thus O(#relation blocks).
b) Scan all of buffer headers, check which are for the relation. Thus
O(#NBuffers)
Neither of which are a good idea during planning.
It might be a bit more realistic to get very rough estimates:
You could compute the table's historic cache hit ratio from pgstats (i.e. use
the data backing pg_statio_all_tables). Of course that's not going to be
specific to your query (for index scans etc), and might have changed more
recently. It'd also be completely wrong after a restart.
If we had information about *recent* cache hit patterns for the relation, it'd
be a lot better, but we don't have the infrastructure for that, and
introducing it would increase the size of the stats entries noticably.
Another way could be to probe the buffer mapping table for a small subset of
the locks and infer the likelihood of other blocks being in shared buffers
that way.
A third way could be to track the cache hit for relations in backend local
memory, likely in the relache entry. The big disadvantage would be that query
plans would differ between connections and that connections would need to
"warm up" to have good plans. But it'd handle restarts nicely.
Greetings,
Andres Freund