On Thu, 30 Oct 2025 at 07:29, Jacob Jackson <jej.jackson.08@gmail.com> wrote: > Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joins a table of questions with results for each question (using a table with a composite primary key of question id and a user id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primary keys, and yet Postgres still memoizes the inner loop results. Any ideas why?
I agree that when the outer side of the join has unique values that Memoize does not make any sense. The planner currenrly puts quite a bit of faith in the row estimates for this and if you're getting this plan, then the estimates came back indicating there'd be fewer unique values of "QuestionUserStatus".question in the input than there are input rows to the Memoize node. If you delve into cost_memoize_rescan(), you'll see the code for this (look for where "hit_ratio" is calculated).
There are also a few prechecks in get_memoize_path() to try to avoid this sort of thing, but unfortunately, the information to avoid Memoize when the outer side of the join is unique isn't available. We do have an "inner_unique" in JoinPathExtraData, but what we'd need for this and don't have is "outer_unique". If we had that, we could just exit early in get_memoize_path() if that's set to true. Whether or not going to the trouble of calculating "outer_unique" is worth the trouble, I'm not sure. There was some work on UniqueKeys a few years ago, which could have helped in this scenario as we could have more easily identified uniqueness at different join levels. That's no longer being worked on, as I understand it.
On the other hand, it may be better to somehow enhance estimate_num_groups() so it can be given more details about the context of the request, i.e the set of Relids that are joined already for the input_rows. That way the code could do more analysis into the RelOptInfo base quals for the relevant relations. Extended statistics for n_distinct could also be applied in some cases too by looking for baserestrictinfo with equality quals or EquivalenceClasses with ec_has_const = true and a member for other Vars/Exprs in the extended statistics.