Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment |
Date | |
Msg-id | CAApHDvqYL3OVvOsyvEPsDJ7h4zQh=LEygQgcQSN-TemRJLMb6Q@mail.gmail.com Whole thread Raw |
In response to | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
|
List | pgsql-hackers |
On Tue, 15 Apr 2025 at 04:31, Robert Haas <robertmhaas@gmail.com> wrote: > I don't think we should use ANALYZE for this because, IME, that should > just be about whether the query gets executed. Since this looks like > information that is available at plan time, I think it should be > displayed all the time or made contingent on VERBOSE. It would be sad > if you had to run the query to get information that was available > without needing to run the query. Personally, I think we can probably > just display it all the time. I mean, the typical plan is not going to > contain enough Memoize nodes that a little extra chatter for each one > impedes readability significantly. You might be right there. I just offered that as a possible solution to the concern of bloating the EXPLAIN output. > Having looked at v6, I think it would help, at least if the reader is > sufficiently knowledgeable. From the values displayed, it looks like > someone could reconstruct the evict_ratio value in > cost_memoize_rescan(), and if they know the loop count, also the > hit_ratio. But that seems hard: if you weren't reading the code, how > would you know how to do it? Even if you are reading the code, are you > sure you'd reconstruct it correctly? I wonder why we think it's better > to display this than a more "cooked" number like the estimated hit > ratio that was proposed in v1? I think the problem with only showing the estimated hit ratio is that it's difficult to then know what you might change to improve that in cases where you might have other join types disabled and are experimenting to figure out why Nested Loop / Memoize isn't being chosen. If you see the estimated capacity isn't big enough to store all the rows for all distinct keys then you know you can increase work_mem to assist. If the Estimated Unique Keys is too high, then you might be able to adjust some ndistinct estimates or increase stats targets somewhere to improve that. If we only showed the estimated hit ratio, then you wouldn't know where to start to get the planner to pick the Memoize plan. As I mentioned in [1], I wouldn't object to having the estimated hit ratio shown in addition to the components that are used to calculate it. Per [2], Andrei didn't seem to like the idea due to the duplication. FWIW, I do agree with you that the hit ratio isn't exactly easy to calculate on the fly when looking at the input numbers that are used to calculate it. I just wasn't 100% certain that it would be needed in its calculated form. Maybe it depends on the reason you're looking at EXPLAIN. Maybe the reason I mentioned above for looking at EXPLAIN is just one of many and I'm not thinking hard enough to consider the other ones. If we can't get consensus for everything people want to add at once then maybe the patch could be broken into two, with 0001 being pretty much the v4 patch and then have 0002 add the Estimated Hit Ratio. Having the physical patches and being able to try it out or view the regression test changes might lower the bar on people chipping in with their views. David [1] https://postgr.es/m/CAApHDvoE5S5FkvEq+N3-J9LfaVUpWLOnczOYAOEvBMCY20=pdg@mail.gmail.com [2] https://postgr.es/m/e76f1635-a933-47bf-a826-045e20d5cfff@gmail.com
pgsql-hackers by date: