Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment - Mailing list pgsql-hackers

From Ilia Evdokimov
Subject Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date
Msg-id fbf77241-e4c5-4f70-ac71-8bf8cff89f39@tantorlabs.com
Whole thread Raw
In response to Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers


On 20.03.2025 13:37, David Rowley wrote:
est_entries is a uint32, so %u is the correct format character for that type.

I don't think you need to prefix all these properties with "Cache"
just because the other two properties have that prefix. I also don't
think the names you've chosen really reflect the meaning. How about
something like: "Estimated Distinct Lookup Keys: 721  Estimated
Capacity: 655", in that order. I think maybe having that as one line
for format=text is better than 2 lines. The EXPLAIN output is already
often taking up more lines in v18 than in v17, would be good to not
make that even worse unnecessarily.


LGTM

I see the existing code there could use ExplainPropertyText rather
than have a special case for text and non-text formats. That's likely
my fault. If we're touching this code, then we should probably tidy
that up. Do you want to create a precursor fixup patch for that?


I fully agree with this suggestion. Then I'll begin with this on another new thread.

+ double ndistinct; /* Estimated number of distinct memoization keys,
+ * used for cache size evaluation. Kept for EXPLAIN */

Maybe this field in MemoizePath needs a better name. How about
"est_unique_keys"? and also do the rename in struct Memoize.


LGTM

I'm also slightly concerned about making struct Memoize bigger. I had
issues with a performance regression [1] for 908a96861 when increasing
the WindowAgg struct size last year and the only way I found to make
it go away was to shuffle the fields around so that the struct size
didn't increase. I think we'll need to see a benchmark of a query that
hits Memoize quite hard with a small cache size to see if the
performance decreases as a result of adding the ndistinct field. It's
unfortunate that we'll not have the luxury of squeezing this double
into padding if we do see a slowdown.


I tried rearranging the fields in the structure with 'ndistinct' field, but unfortunately, sizeof(Memoize) did not remain the same. Therefore, benchmarking Memoize is definitely necessary. Thanks for the information!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Index AM API cleanup
Next
From: Ashutosh Bapat
Date:
Subject: Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints