Thread: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17213 Logged by: Elvis Pranskevichus Email address: elprans@gmail.com PostgreSQL version: 14.0 Operating system: Linux Description: It appears a combination of Merge Semi Join and Memoize in PostgreSQL 14 produces incorrect results on a particular query. The bug might be present in earlier versions, but I was only able to reproduce a particular plan under version 14. ====== Schema ====== CREATE TABLE issues ( id int PRIMARY KEY ); CREATE TABLE users ( id int PRIMARY KEY ); CREATE TABLE watchers ( issue_id int, user_id int, CONSTRAINT unique_pair UNIQUE (issue_id, user_id) ); CREATE INDEX watchers_user_idx ON watchers(user_id); INSERT INTO issues (id) VALUES (1); INSERT INTO issues (id) VALUES (2); INSERT INTO issues (id) VALUES (3); INSERT INTO issues (id) VALUES (4); INSERT INTO users (id) VALUES (1); INSERT INTO users (id) VALUES (2); INSERT INTO watchers (issue_id, user_id) VALUES (1, 2); INSERT INTO watchers (issue_id, user_id) VALUES (2, 1); INSERT INTO watchers (issue_id, user_id) VALUES (3, 1); ===== Query ===== SELECT i1.id FROM issues AS i1 WHERE EXISTS ( SELECT FROM (SELECT i3.id FROM issues AS i3 WHERE i3.id IN (SELECT w3.issue_id FROM (SELECT u1.id FROM users AS u1 WHERE EXISTS ( SELECT FROM watchers AS w1 INNER JOIN users AS u2 ON (w1.user_id = u2.id) INNER JOIN watchers AS w2 ON (u1.id = w2.user_id) WHERE (i1.id = w1.issue_id) AND (w2.issue_id != i1.id) AND u1.id = u2.id ) ) AS q INNER JOIN users AS superfluous ON (q.id = superfluous.id) INNER JOIN watchers AS w3 ON (q.id = w3.user_id) ) ) AS q ); =============== Expected Output =============== id ---- 2 3 (2 rows) =============================== Actual Output (with below plan) =============================== id ---- (0 rows) ================= Plan with Memoize ================= Seq Scan on issues i1 (cost=0.00..11577.47 rows=145 width=248) (actual time=0.091..0.093 rows=0 loops=1) Filter: (SubPlan 2) Rows Removed by Filter: 4 SubPlan 2 -> Merge Semi Join (cost=0.61..5694.44 rows=145 width=0) (actual time=0.020..0.020 rows=0 loops=4) Merge Cond: (i3.id = w3.issue_id) -> Index Only Scan using issues_pkey on issues i3 (cost=0.15..52.50 rows=290 width=16) (actual time=0.002..0.003 rows=1 loops=4) Heap Fetches: 4 -> Nested Loop (cost=0.46..5632.72 rows=680 width=16) (actual time=0.017..0.017 rows=0 loops=4) -> Nested Loop (cost=0.30..344.25 rows=1360 width=48) (actual time=0.004..0.010 rows=3 loops=4) -> Index Only Scan using unique_pair on watchers w3 (cost=0.15..68.55 rows=1360 width=32) (actual time=0.001..0.002 rows=3 loops=4) Heap Fetches: 12 -> Index Only Scan using users_pkey on users superfluous (cost=0.15..0.20 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=12) Index Cond: (id = w3.user_id) Heap Fetches: 12 -> Memoize (cost=0.16..7.02 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=12) Cache Key: superfluous.id Hits: 10 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage: 1kB -> Index Only Scan using users_pkey on users u1 (cost=0.15..7.01 rows=1 width=16) (actual time=0.008..0.009 rows=0 loops=2) Index Cond: (id = superfluous.id) Filter: (SubPlan 1) Rows Removed by Filter: 1 Heap Fetches: 2 SubPlan 1 -> Nested Loop (cost=0.45..44.73 rows=7 width=0) (actual time=0.006..0.006 rows=0 loops=2) -> Index Scan using watchers_user_idx on watchers w2 (cost=0.15..28.29 rows=7 width=0) (actual time=0.003..0.003 rows=1 loops=2) Index Cond: (user_id = u1.id) Filter: (issue_id <> i1.id) Rows Removed by Filter: 0 -> Materialize (cost=0.30..16.36 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=2) -> Nested Loop (cost=0.30..16.35 rows=1 width=0) (actual time=0.002..0.003 rows=0 loops=1) -> Index Only Scan using unique_pair on watchers w1 (cost=0.15..8.17 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: ((issue_id = i1.id) AND (user_id = u1.id)) Heap Fetches: 0 -> Index Only Scan using users_pkey on users u2 (cost=0.15..8.17 rows=1 width=16) (never executed) Index Cond: (id = u1.id) Heap Fetches: 0 Planning Time: 0.755 ms Execution Time: 0.146 ms ============================= With SET enable_memoize = off ============================= Seq Scan on issues i1 (cost=0.00..19705.90 rows=145 width=32) (actual time=0.076..0.116 rows=2 loops=1) Filter: (SubPlan 2) Rows Removed by Filter: 2 SubPlan 2 -> Merge Semi Join (cost=0.60..9760.10 rows=145 width=0) (actual time=0.025..0.025 rows=0 loops=4) Merge Cond: (i3.id = w3.issue_id) -> Index Only Scan using issues_pkey on issues i3 (cost=0.15..52.50 rows=290 width=16) (actual time=0.003..0.003 rows=2 loops=4) Heap Fetches: 6 -> Nested Loop (cost=0.45..9698.38 rows=680 width=16) (actual time=0.021..0.022 rows=0 loops=4) Join Filter: (u1.id = superfluous.id) -> Nested Loop (cost=0.30..9552.04 rows=680 width=48) (actual time=0.020..0.020 rows=0 loops=4) -> Index Only Scan using unique_pair on watchers w3 (cost=0.15..68.55 rows=1360 width=32) (actual time=0.001..0.002 rows=2 loops=4) Heap Fetches: 10 -> Index Only Scan using users_pkey on users u1 (cost=0.15..6.98 rows=1 width=16) (actual time=0.007..0.007 rows=0 loops=10) Index Cond: (id = w3.user_id) Filter: (SubPlan 1) Rows Removed by Filter: 1 Heap Fetches: 10 SubPlan 1 -> Nested Loop (cost=0.45..44.73 rows=7 width=0) (actual time=0.005..0.005 rows=0 loops=10) -> Index Scan using watchers_user_idx on watchers w2 (cost=0.15..28.29 rows=7 width=0) (actual time=0.001..0.002 rows=1 loops=10) Index Cond: (user_id = u1.id) Filter: (issue_id <> i1.id) Rows Removed by Filter: 0 -> Materialize (cost=0.30..16.36 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=13) -> Nested Loop (cost=0.30..16.35 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=9) -> Index Only Scan using unique_pair on watchers w1 (cost=0.15..8.17 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=9) Index Cond: ((issue_id = i1.id) AND (user_id = u1.id)) Heap Fetches: 2 -> Index Only Scan using users_pkey on users u2 (cost=0.15..8.17 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=2) Index Cond: (id = u1.id) Heap Fetches: 2 -> Index Only Scan using users_pkey on users superfluous (cost=0.15..0.20 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=2) Index Cond: (id = w3.user_id) Heap Fetches: 2 Planning Time: 0.706 ms Execution Time: 0.164 ms
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
David Rowley
Date:
On Tue, 5 Oct 2021 at 12:22, PG Bug reporting form <noreply@postgresql.org> wrote: > It appears a combination of Merge Semi Join and Memoize in PostgreSQL 14 > produces incorrect results on a particular query. The bug might be > present > in earlier versions, but I was only able to reproduce a particular plan > under version 14. Are you able to send the output of EXPLAIN (ANALYZE, SETTINGS) from the problem query with SET enable_memoize = ON; ? My local setup here does not produce the same plan as you're getting and that might be because you have some non-standard cost settings. It does look a bit like memoize is not properly taking into account the fact that there's a subplan below the memoize node with parameters from above the memoize node. Namely the i1.id in issue_id != i1.id. David
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
Elvis Pranskevichus
Date:
On Monday, October 4, 2021 5:14:47 PM PDT David Rowley wrote: > On Tue, 5 Oct 2021 at 12:22, PG Bug reporting form > > <noreply@postgresql.org> wrote: > > It appears a combination of Merge Semi Join and Memoize in > > PostgreSQL 14 produces incorrect results on a particular query. > > The bug might be present > > in earlier versions, but I was only able to reproduce a particular > > plan under version 14. > > Are you able to send the output of EXPLAIN (ANALYZE, SETTINGS) from > the problem query with SET enable_memoize = ON; ? My local setup here > does not produce the same plan as you're getting and that might be > because you have some non-standard cost settings. > > It does look a bit like memoize is not properly taking into account > the fact that there's a subplan below the memoize node with parameters > from above the memoize node. Namely the i1.id in issue_id != i1.id. All cost and planner settings are set to defaults on the cluster I'm getting this plan on (only `search_path` shows up in SETTINGS). It seems like a very particular set of stats is causing the plan there, as running ANALYZE turns the query into a Nested Loop Semi Join. Elvis
Elvis Pranskevichus <elprans@gmail.com> writes: > On Monday, October 4, 2021 5:14:47 PM PDT David Rowley wrote: >>> It appears a combination of Merge Semi Join and Memoize in >>> PostgreSQL 14 produces incorrect results on a particular query. > It seems like a very particular set of stats is causing the plan there, > as running ANALYZE turns the query into a Nested Loop Semi Join. Surely this plan tree is broken on its face? The inner side of a mergejoin has to be able to support mark/restore, and nestloop doesn't (cf ExecSupportsMarkRestore, as well as assertions in nodeNestloop). It looks to me like somebody removed an essential plan-time check. regards, tom lane
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
David Rowley
Date:
On Tue, 5 Oct 2021 at 14:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Elvis Pranskevichus <elprans@gmail.com> writes: > > On Monday, October 4, 2021 5:14:47 PM PDT David Rowley wrote: > >>> It appears a combination of Merge Semi Join and Memoize in > >>> PostgreSQL 14 produces incorrect results on a particular query. > > > It seems like a very particular set of stats is causing the plan there, > > as running ANALYZE turns the query into a Nested Loop Semi Join. > > Surely this plan tree is broken on its face? The inner side of > a mergejoin has to be able to support mark/restore, and nestloop > doesn't (cf ExecSupportsMarkRestore, as well as assertions in > nodeNestloop). It looks to me like somebody removed an essential > plan-time check. Couldn't that just be because it's a Merge Semi Join and there's not really any need to rewind the inner side if all join clauses are merge join clauses, aka: if ((path->jpath.jointype == JOIN_SEMI || path->jpath.jointype == JOIN_ANTI || extra->inner_unique) && (list_length(path->jpath.joinrestrictinfo) == list_length(path->path_mergeclauses))) path->skip_mark_restore = true; ? David
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
David Rowley
Date:
On Tue, 5 Oct 2021 at 13:14, David Rowley <dgrowleyml@gmail.com> wrote: > It does look a bit like memoize is not properly taking into account > the fact that there's a subplan below the memoize node with parameters > from above the memoize node. Namely the i1.id in issue_id != i1.id. It looks like that's exactly what the problem is here. I've made a small adjustment to your script to call analyze on the watchers table. This does not result in the exact plan that you reported, but it does result in a Memoize plan that shows the same bug. I'm currently thinking over what the best fix for this should be. My current thinking is that all parameters that are from above the Memoize that are required below the Memoize must be part of the Cache Key. Currently, it seems, that's not the case and in the plan from the attached plan, i1.id is being used below the Memoize but originates from above it. I'll need to look into how exactly to determine around when get_memoize_path() is called how we determine if there's any subplan parameters that will be needed in the Memoize path's sub path that originates above the memoize node. David
Attachment
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
David Rowley
Date:
On Tue, 5 Oct 2021 at 20:05, David Rowley <dgrowleyml@gmail.com> wrote: > I'm currently thinking over what the best fix for this should be. My > current thinking is that all parameters that are from above the > Memoize that are required below the Memoize must be part of the Cache > Key. Currently, it seems, that's not the case and in the plan from the > attached plan, i1.id is being used below the Memoize but originates > from above it. > > I'll need to look into how exactly to determine around when > get_memoize_path() is called how we determine if there's any subplan > parameters that will be needed in the Memoize path's sub path that > originates above the memoize node. I was looking at fixes for this bug. I started out with seeing what would be required to make paraminfo_get_equal_hashops() find all the SubPlan parameters that are on the inner side of the join that come from a location other than the inner side of the join. The only way I could think on how to do that would be to invent some way to traverse Paths to look for vars from a given set of Relids. Right now we just have ways to walk expression trees (expression_tree_walker). That's not going to work here. Let's call this way "option 1". An alternative way to fix the problem would be just to flush the cache whenever a Param changes that is not part of the cache key. The Materalize node works a bit like this, except it'll just flush the cache whenever a param changes, since the Material cache is not parameterized. One fairly hefty drawback with doing it this way is that the planner does not get a chance to take into account the fact that the cache could be flushed quite often. This could result in a Nested Loop / Memoize join being chosen instead of a Hash or Merge Join when the hash or merge join would be much more favourable. Let's call this "option 2". For option 1, I don't really like the sound of adding ways to traverse paths in back branches. I'm also not too sure what the implication would be for Custom scans. With option 2, I'm a bit worried that this would cause Memoize to be costed much cheaper than it actually is due to the planner thinking the cache hit ratio will be much higher than it actually is. Can anyone think of a 3rd option? I've attached a patch for option 2. David
Attachment
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
David Rowley
Date:
On Tue, 26 Oct 2021 at 20:50, David Rowley <dgrowleyml@gmail.com> wrote: > An alternative way to fix the problem would be just to flush the cache > whenever a Param changes that is not part of the cache key. The > Materalize node works a bit like this, except it'll just flush the > cache whenever a param changes, since the Material cache is not > parameterized. One fairly hefty drawback with doing it this way is > that the planner does not get a chance to take into account the fact > that the cache could be flushed quite often. This could result in a > Nested Loop / Memoize join being chosen instead of a Hash or Merge > Join when the hash or merge join would be much more favourable. Let's > call this "option 2". I'm planning on giving the patch I submitted another look with the intention of pushing it tomorrow. David
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
David Rowley
Date:
On Mon, 22 Nov 2021 at 23:36, David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 26 Oct 2021 at 20:50, David Rowley <dgrowleyml@gmail.com> wrote: > > An alternative way to fix the problem would be just to flush the cache > > whenever a Param changes that is not part of the cache key. The > > Materalize node works a bit like this, except it'll just flush the > > cache whenever a param changes, since the Material cache is not > > parameterized. One fairly hefty drawback with doing it this way is > > that the planner does not get a chance to take into account the fact > > that the cache could be flushed quite often. This could result in a > > Nested Loop / Memoize join being chosen instead of a Hash or Merge > > Join when the hash or merge join would be much more favourable. Let's > > call this "option 2". > > I'm planning on giving the patch I submitted another look with the > intention of pushing it tomorrow. Pushed with the addition of a test to exercise the cache flushing code. Thanks for reporting this. David
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
From
Elvis Pranskevichus
Date:
On Tuesday, November 23, 2021 5:58:30 PM PST David Rowley wrote: > On Mon, 22 Nov 2021 at 23:36, David Rowley <dgrowleyml@gmail.com> > > I'm planning on giving the patch I submitted another look with the > > intention of pushing it tomorrow. > > Pushed with the addition of a test to exercise the cache flushing > code. > > Thanks for reporting this. Excellent! Thank you for tackling the bug, David! Elvis