Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize - Mailing list pgsql-bugs

From David Rowley
Subject Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
Date
Msg-id CAApHDvotKW3wt36gqw2M0yjhN0jm9yQsEzLu6BCpUXwurH8Gpg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-bugs
On Tue, 5 Oct 2021 at 20:05, David Rowley <dgrowleyml@gmail.com> wrote:
> I'm currently thinking over what the best fix for this should be. My
> current thinking is that all parameters that are from above the
> Memoize that are required below the Memoize must be part of the Cache
> Key. Currently, it seems, that's not the case and in the plan from the
> attached plan, i1.id is being used below the Memoize but originates
> from above it.
>
> I'll need to look into how exactly to determine around when
> get_memoize_path() is called how we determine if there's any subplan
> parameters that will be needed in the Memoize path's sub path that
> originates above the memoize node.

I was looking at fixes for this bug. I started out with seeing what
would be required to make paraminfo_get_equal_hashops() find all the
SubPlan parameters that are on the inner side of the join that come
from a location other than the inner side of the join.   The only way
I could think on how to do that would be to invent some way to
traverse Paths to look for vars from a given set of Relids. Right now
we just have ways to walk expression trees (expression_tree_walker).
That's not going to work here. Let's call this way "option 1".

An alternative way to fix the problem would be just to flush the cache
whenever a Param changes that is not part of the cache key. The
Materalize node works a bit like this, except it'll just flush the
cache whenever a param changes, since the Material cache is not
parameterized.   One fairly hefty drawback with doing it this way is
that the planner does not get a chance to take into account the fact
that the cache could be flushed quite often.  This could result in a
Nested Loop / Memoize join being chosen instead of a Hash or Merge
Join when the hash or merge join would be much more favourable. Let's
call this "option 2".

For option 1, I don't really like the sound of adding ways to traverse
paths in back branches. I'm also not too sure what the implication
would be for Custom scans.  With option 2, I'm a bit worried that this
would cause Memoize to be costed much cheaper than it actually is due
to the planner thinking the cache hit ratio will be much higher than
it actually is.

Can anyone think of a 3rd option?

I've attached a patch for option 2.

David

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: conchuela timeouts since 2021-10-09 system upgrade
Next
From: Juan José Santamaría Flecha
Date:
Subject: Re: BUG #17247: How to avoid crating multiple Foreign keys on same column on same table.