Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment - Mailing list pgsql-hackers

From David Rowley
Subject Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date
Msg-id CAApHDvpq2gOt3gnzkd4Ud=wrT7YRGrF3zb29jgWzDsYEhETrOA@mail.gmail.com
Whole thread Raw
In response to Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment  (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
Responses Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
List pgsql-hackers
On Thu, 20 Mar 2025 at 21:48, Ilia Evdokimov
<ilya.evdokimov@tantorlabs.com> wrote:
>          ->  Memoize  (cost=0.30..0.41 rows=1 width=4)
>                Cache Key: t2.thousand
>                Cache Mode: logical
>                Cache Estimated Entries: 655
>                Cache Estimated NDistinct: 721
>                ->  Index Only Scan using tenk1_unique1 on tenk1 t1  (cost=0.29..0.40 rows=1 width=4)
>                      Index Cond: (unique1 = t2.thousand)
> (11 rows)
>
> Additionally, since this information would only be shown in EXPLAIN when costs are enabled, it should not cause any
performanceregression in normal execution. However, reviewers should be especially careful when verifying test outputs,
asthis change could affect plan details in regression tests. 
>
> Any thoughts?

> + ExplainIndentText(es);
> + appendStringInfo(es->str, "Cache Estimated Entries: %d\n", ((Memoize *) plan)->est_entries);
> + ExplainIndentText(es);
> + appendStringInfo(es->str, "Cache Estimated NDistinct: %0.0f\n", ((Memoize *) plan)->ndistinct);

est_entries is a uint32, so %u is the correct format character for that type.

I don't think you need to prefix all these properties with "Cache"
just because the other two properties have that prefix. I also don't
think the names you've chosen really reflect the meaning. How about
something like: "Estimated Distinct Lookup Keys: 721  Estimated
Capacity: 655", in that order. I think maybe having that as one line
for format=text is better than 2 lines. The EXPLAIN output is already
often taking up more lines in v18 than in v17, would be good to not
make that even worse unnecessarily.

I see the existing code there could use ExplainPropertyText rather
than have a special case for text and non-text formats. That's likely
my fault. If we're touching this code, then we should probably tidy
that up. Do you want to create a precursor fixup patch for that?

+ double ndistinct; /* Estimated number of distinct memoization keys,
+ * used for cache size evaluation. Kept for EXPLAIN */

Maybe this field in MemoizePath needs a better name. How about
"est_unique_keys"? and also do the rename in struct Memoize.

I'm also slightly concerned about making struct Memoize bigger. I had
issues with a performance regression [1] for 908a96861 when increasing
the WindowAgg struct size last year and the only way I found to make
it go away was to shuffle the fields around so that the struct size
didn't increase. I think we'll need to see a benchmark of a query that
hits Memoize quite hard with a small cache size to see if the
performance decreases as a result of adding the ndistinct field. It's
unfortunate that we'll not have the luxury of squeezing this double
into padding if we do see a slowdown.

David

[1] https://postgr.es/m/flat/CAHoyFK9n-QCXKTUWT_xxtXninSMEv%2BgbJN66-y6prM3f4WkEHw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
Next
From: Vladlen Popolitov
Date:
Subject: Re: PoC. The saving of the compiled jit-code in the plan cache