Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique? - Mailing list pgsql-general

From Jacob Jackson
Subject Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?
Date
Msg-id CAAiQw3w-uEZkz13Y2iU76CucRXBoR=kw6wOfq6gocT63dZLfYg@mail.gmail.com
Whole thread Raw
In response to Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-general
Thanks! This is very helpful.

On Wed, Oct 29, 2025 at 8:01 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 30 Oct 2025 at 07:29, Jacob Jackson <jej.jackson.08@gmail.com> wrote:
> Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joins a table of questions with results for each question (using a table with a composite primary key of question id and a user id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primary keys, and yet Postgres still memoizes the inner loop results. Any ideas why?

I agree that when the outer side of the join has unique values that
Memoize does not make any sense. The planner currenrly puts quite a
bit of faith in the row estimates for this and if you're getting this
plan, then the estimates came back indicating there'd be fewer unique
values of "QuestionUserStatus".question in the input than there are
input rows to the Memoize node. If you delve into
cost_memoize_rescan(), you'll see the code for this (look for where
"hit_ratio" is calculated).

There are also a few prechecks in get_memoize_path() to try to avoid
this sort of thing, but unfortunately, the information to avoid
Memoize when the outer side of the join is unique isn't available.  We
do have an "inner_unique" in JoinPathExtraData, but what we'd need for
this and don't have is "outer_unique". If we had that, we could just
exit early in get_memoize_path() if that's set to true. Whether or not
going to the trouble of calculating "outer_unique" is worth the
trouble, I'm not sure. There was some work on UniqueKeys a few years
ago, which could have helped in this scenario as we could have more
easily identified uniqueness at different join levels. That's no
longer being worked on, as I understand it.

On the other hand, it may be better to somehow enhance
estimate_num_groups() so it can be given more details about the
context of the request, i.e the set of Relids that are joined already
for the input_rows. That way the code could do more analysis into the
RelOptInfo base quals for the relevant relations. Extended statistics
for n_distinct could also be applied in some cases too by looking for
baserestrictinfo with equality quals or EquivalenceClasses with
ec_has_const = true and a member for other Vars/Exprs in the extended
statistics.

Unfortunately, neither of these is a trivial fix.

David

pgsql-general by date:

Previous
From: Jacob Jackson
Date:
Subject: Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?
Next
From: "Colin 't Hart"
Date:
Subject: Re: Two sequences associated with one identity column