Re: bug in update tuple routing with foreign partitions - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: bug in update tuple routing with foreign partitions
Date
Msg-id 5CADAB76.9060506@lab.ntt.co.jp
Whole thread Raw
In response to bug in update tuple routing with foreign partitions  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: bug in update tuple routing with foreign partitions
List pgsql-hackers
(2019/03/06 18:33), Amit Langote wrote:
> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
> to use for partition routing targets.  Specifically, the bug occurs when
> UPDATE targets include a foreign partition that is locally modified (as
> opposed to being modified directly on the remove server) and its
> ResultRelInfo (called subplan result rel in the source code) is reused for
> tuple routing:
>
> -- setup
> create extension postgres_fdw ;
> create server loopback foreign data wrapper postgres_fdw;
> create user mapping for current_user server loopback;
> create table p (a int) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2base (a int check (a = 2));
> create foreign table p2 partition of p for values in (2) server loopback
> options (table_name 'p2base');
> insert into p values (1), (2);
>
> -- see in the plan that foreign partition p2 is locally modified
> explain verbose update p set a = 2 from (values (1), (2)) s(x) where a =
> s.x returning *;
>                                     QUERY PLAN
>
> ─────────────────────────────────────────────────────────────────────────────────
>   Update on public.p  (cost=0.05..236.97 rows=50 width=42)
>     Output: p1.a, "*VALUES*".column1
>     Update on public.p1
>     Foreign Update on public.p2
>       Remote SQL: UPDATE public.p2base SET a = $2 WHERE ctid = $1 RETURNING a
>     ->   Hash Join  (cost=0.05..45.37 rows=26 width=42)
>           Output: 2, p1.ctid, "*VALUES*".*, "*VALUES*".column1
>           Hash Cond: (p1.a = "*VALUES*".column1)
>           ->   Seq Scan on public.p1  (cost=0.00..35.50 rows=2550 width=10)
>                 Output: p1.ctid, p1.a
>           ->   Hash  (cost=0.03..0.03 rows=2 width=32)
>                 Output: "*VALUES*".*, "*VALUES*".column1
>                 ->   Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2
> width=32)
>                       Output: "*VALUES*".*, "*VALUES*".column1
>     ->   Hash Join  (cost=100.05..191.59 rows=24 width=42)
>           Output: 2, p2.ctid, "*VALUES*".*, "*VALUES*".column1
>           Hash Cond: (p2.a = "*VALUES*".column1)
>           ->   Foreign Scan on public.p2  (cost=100.00..182.27 rows=2409
> width=10)
>                 Output: p2.ctid, p2.a
>                 Remote SQL: SELECT a, ctid FROM public.p2base FOR UPDATE
>           ->   Hash  (cost=0.03..0.03 rows=2 width=32)
>                 Output: "*VALUES*".*, "*VALUES*".column1
>                 ->   Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2
> width=32)
>                       Output: "*VALUES*".*, "*VALUES*".column1
>
>
> -- as opposed to directly on remote side (because there's no local join)
> explain verbose update p set a = 2 returning *;
>                                   QUERY PLAN
> ─────────────────────────────────────────────────────────────────────────────
>   Update on public.p  (cost=0.00..227.40 rows=5280 width=10)
>     Output: p1.a
>     Update on public.p1
>     Foreign Update on public.p2
>     ->   Seq Scan on public.p1  (cost=0.00..35.50 rows=2550 width=10)
>           Output: 2, p1.ctid
>     ->   Foreign Update on public.p2  (cost=100.00..191.90 rows=2730 width=10)
>           Remote SQL: UPDATE public.p2base SET a = 2 RETURNING a
> (8 rows)
>
> Running the first update query crashes:
>
> update p set a = 2 from (values (1), (2)) s(x) where a = s.x returning
> tableoid::regclass, *;
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> The problem seems to occur because ExecSetupPartitionTupleRouting thinks
> it can reuse p2's ResultRelInfo for tuple routing.  In this case, it can't
> be used, because its ri_FdwState contains information that will be needed
> for postgresExecForeignUpdate to work, but it's still used today.  Because
> it's assigned to be used for tuple routing, its ri_FdwState will be
> overwritten by postgresBeginForeignInsert that's invoked by the tuple
> routing code using the aforementioned ResultRelInfo.  Crash occurs when
> postgresExecForeignUpdate () can't find the information it's expecting in
> the ri_FdwState.

Agreed, as I said before in another thread.

> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
> subplan result rel if its ri_FdwState is already set.

I'm not sure that is a good idea, because that requires to create a new 
ResultRelInfo, which is not free; I think it requires a lot of work. 
Another solution to avoid that would be to store the fmstate created by 
postgresBeginForeignInsert() into the ri_FdwState, not overwrite the 
ri_FdwState, like the attached.  This would not need any changes to the 
core, and this would not cause any overheads either, IIUC.  What do you 
think about that?

> I have attached 2 patches, one for PG 11 where the problem first appeared
> and another for HEAD.  The patch for PG 11 is significantly bigger due to
> having to handle the complexities of mapping of subplan result rel indexes
> to leaf partition indexes.  Most of that complexity is gone in HEAD due to
> 3f2393edefa5, so the patch for HEAD is much simpler.

Thanks for the patches!

> I've also added a
> test in postgres_fdw.sql to exercise this test case.

Thanks again, but the test case you added works well without any fix:

+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x 
returning *;
+                                  QUERY PLAN 

+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 
RETURNING a, b
+   Update on public.locp
+   ->  Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)

In this test case, the foreign partition is updated before ri_FdwState 
is overwritten by postgresBeginForeignInsert() that's invoked by the 
tuple routing code, so it would work without any fix.  So I modified 
this test case as such.

Sorry for the long delay, again.

Best regards,
Etsuro Fujita

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Problem with default partition pruning
Next
From: Haribabu Kommi
Date:
Subject: Re: Transaction commits VS Transaction commits (with parallel) VSquery mean time