Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment - Mailing list pgsql-hackers

From David Rowley
Subject Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date
Msg-id CAApHDvqYL3OVvOsyvEPsDJ7h4zQh=LEygQgcQSN-TemRJLMb6Q@mail.gmail.com
Whole thread Raw
In response to Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
List pgsql-hackers
On Tue, 15 Apr 2025 at 04:31, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't think we should use ANALYZE for this because, IME, that should
> just be about whether the query gets executed. Since this looks like
> information that is available at plan time, I think it should be
> displayed all the time or made contingent on VERBOSE. It would be sad
> if you had to run the query to get information that was available
> without needing to run the query. Personally, I think we can probably
> just display it all the time. I mean, the typical plan is not going to
> contain enough Memoize nodes that a little extra chatter for each one
> impedes readability significantly.

You might be right there. I just offered that as a possible solution
to the concern of bloating the EXPLAIN output.

> Having looked at v6, I think it would help, at least if the reader is
> sufficiently knowledgeable. From the values displayed, it looks like
> someone could reconstruct the evict_ratio value in
> cost_memoize_rescan(), and if they know the loop count, also the
> hit_ratio. But that seems hard: if you weren't reading the code, how
> would you know how to do it? Even if you are reading the code, are you
> sure you'd reconstruct it correctly? I wonder why we think it's better
> to display this than a more "cooked" number like the estimated hit
> ratio that was proposed in v1?

I think the problem with only showing the estimated hit ratio is that
it's difficult to then know what you might change to improve that in
cases where you might have other join types disabled and are
experimenting to figure out why Nested Loop / Memoize isn't being
chosen.  If you see the estimated capacity isn't big enough to store
all the rows for all distinct keys then you know you can increase
work_mem to assist. If the Estimated Unique Keys is too high, then you
might be able to adjust some ndistinct estimates or increase stats
targets somewhere to improve that.  If we only showed the estimated
hit ratio, then you wouldn't know where to start to get the planner to
pick the Memoize plan.

As I mentioned in [1], I wouldn't object to having the estimated hit
ratio shown in addition to the components that are used to calculate
it. Per [2], Andrei didn't seem to like the idea due to the
duplication. FWIW, I do agree with you that the hit ratio isn't
exactly easy to calculate on the fly when looking at the input numbers
that are used to calculate it. I just wasn't 100% certain that it
would be needed in its calculated form. Maybe it depends on the reason
you're looking at EXPLAIN. Maybe the reason I mentioned above for
looking at EXPLAIN is just one of many and I'm not thinking hard
enough to consider the other ones.

If we can't get consensus for everything people want to add at once
then maybe the patch could be broken into two, with 0001 being pretty
much the v4 patch and then have 0002 add the Estimated Hit Ratio.
Having the physical patches and being able to try it out or view the
regression test changes might lower the bar on people chipping in with
their views.

David

[1] https://postgr.es/m/CAApHDvoE5S5FkvEq+N3-J9LfaVUpWLOnczOYAOEvBMCY20=pdg@mail.gmail.com
[2] https://postgr.es/m/e76f1635-a933-47bf-a826-045e20d5cfff@gmail.com



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Call for Posters: PGConf.dev 2025
Next
From: Tom Lane
Date:
Subject: Re: Fix a resource leak (src/backend/utils/adt/rowtypes.c)