Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment - Mailing list pgsql-hackers

From David Rowley
Subject Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date
Msg-id CAApHDvq+uqf3MRFb+Ai3W1k3fMnrwWBLhzUAYj3_+MM8jK32fA@mail.gmail.com
Whole thread Raw
In response to Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment  (Lukas Fittl <lukas@fittl.com>)
List pgsql-hackers
On Fri, 4 Jul 2025 at 20:30, Ilia Evdokimov
<ilya.evdokimov@tantorlabs.com> wrote:
> I attached rebased v10 patch on 5a6c39b.

I've gone over this and made some cosmetic adjustments. A few
adjustments to the comments and used Cardinality rather than double
for some data types. I also moved the MemoizePath.calls field down
below est_entries so that est_entries would take up some padding
space. This saves 8 bytes of struct, and IMO, improves the logical
order of fields. I renamed "calls" to "est_calls" so it's more aligned
with the new fields being added by this patch.

The final thing which I'm not so sure about is the EXPLAIN format.
Currently it looks like:

   ->  Memoize  (cost=0.43..0.46 rows=1 width=4) (actual
time=0.000..0.000 rows=1.00 loops=1000000)
         Cache Key: t.a
         Cache Mode: logical
         Estimates: capacity=10010 distinct keys=10010 lookups=1000000
hit ratio=99.00%
         Hits: 989999  Misses: 10001  Evictions: 0  Overflows: 0
Memory Usage: 1016kB

This format for the Estimates copies the "Buffers:" format for putting
multiple sub-values under a single heading.  However, we don't seem
very consistent with this as JIT has a similar need but formats things
another way, i.e.:

 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.473 ms (Deform 0.112 ms), Inlining 0.000 ms,
Optimization 0.701 ms, Emission 5.778 ms, Total 6.952 ms

We could get rid of the "Estimates" heading and just prefix each value
with "Estimated", as in:

   ->  Memoize  (cost=0.43..0.46 rows=1 width=4) (actual
time=0.000..0.000 rows=1.00 loops=1000000)
         Cache Key: t.a
         Cache Mode: logical
         Estimated capacity: 10010  Estimated distinct keys: 10010
Estimated lookups: 1000000  Estimated hit ratio: 99.00%
         Hits: 989999  Misses: 10001  Evictions: 0  Overflows: 0
Memory Usage: 1016kB
         Buffers: shared hit=30004

That removes the dilemma about which example to follow, but it's more verbose.

Does anyone have any opinions on this?

v11 patch attached.

David

Attachment

pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: [PATCH] Proposal to Enable/Disable Index using ALTER INDEX
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] Check for TupleTableSlot nullness before dereferencing