Re: bug in update tuple routing with foreign partitions - Mailing list pgsql-hackers

From Amit Langote
Subject Re: bug in update tuple routing with foreign partitions
Date
Msg-id 390bfb5d-4b82-18e8-8e42-dca94cd55341@lab.ntt.co.jp
Whole thread Raw
In response to Re: bug in update tuple routing with foreign partitions  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: bug in update tuple routing with foreign partitions
List pgsql-hackers
Fujita-san,

Thanks for the review.

On 2019/04/10 17:38, Etsuro Fujita wrote:
> (2019/03/06 18:33), Amit Langote wrote:
>> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
>> to use for partition routing targets.  Specifically, the bug occurs when
>> UPDATE targets include a foreign partition that is locally modified (as
>> opposed to being modified directly on the remove server) and its
>> ResultRelInfo (called subplan result rel in the source code) is reused for
>> tuple routing:
> 
>> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
>> subplan result rel if its ri_FdwState is already set.
> 
> I'm not sure that is a good idea, because that requires to create a new
> ResultRelInfo, which is not free; I think it requires a lot of work.

After considering this a bit more and studying your proposed solution, I
tend to agree.  Beside the performance considerations you point out that I
too think are valid, I realize that going with my approach would mean
embedding some assumptions in the core code about ri_FdwState; in this
case, assuming that a subplan result rel's ri_FdwState is not to be
re-purposed for tuple routing insert, which is only based on looking at
postgres_fdw's implementation.  Not to mention the ensuing complexity in
the core tuple routing code -- it now needs to account for the fact that
not all subplan result rels may have been re-purposed for tuple-routing,
something that's too complicated to handle in PG 11.

IOW, let's call this a postgres_fdw bug and fix it there as you propose.

> Another solution to avoid that would be to store the fmstate created by
> postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
> ri_FdwState, like the attached.  This would not need any changes to the
> core, and this would not cause any overheads either, IIUC.  What do you
> think about that?

+1.  Just one comment:

+    /*
+     * If the given resultRelInfo already has PgFdwModifyState set, it means
+     * the foreign table is an UPDATE subplan resultrel; in which case, store
+     * the resulting state into the PgFdwModifyState.
+     */

The last "PgFdwModifyState" should be something else?  Maybe,
sub-PgModifyState or sub_fmstate?


>> I've also added a
>> test in postgres_fdw.sql to exercise this test case.
> 
> Thanks again, but the test case you added works well without any fix:

Oops, I should really adopt a habit of adding a test case before code.

> +-- Check case where the foreign partition is a subplan target rel and
> +-- foreign parttion is locally modified (target table being joined
> +-- locally prevents a direct/remote modification plan).
> +explain (verbose, costs off)
> +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
> returning *;
> +                                  QUERY PLAN
> +------------------------------------------------------------------------------
> 
> + Update on public.utrtest
> +   Output: remp.a, remp.b, "*VALUES*".column1
> +   Foreign Update on public.remp
> +     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING
> a, b
> +   Update on public.locp
> +   ->  Hash Join
> +         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
> +         Hash Cond: (remp.a = "*VALUES*".column1)
> +         ->  Foreign Scan on public.remp
> +               Output: remp.b, remp.ctid, remp.a
> +               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
> +         ->  Hash
> +               Output: "*VALUES*".*, "*VALUES*".column1
> +               ->  Values Scan on "*VALUES*"
> +                     Output: "*VALUES*".*, "*VALUES*".column1
> +   ->  Hash Join
> +         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
> +         Hash Cond: (locp.a = "*VALUES*".column1)
> +         ->  Seq Scan on public.locp
> +               Output: locp.b, locp.ctid, locp.a
> +         ->  Hash
> +               Output: "*VALUES*".*, "*VALUES*".column1
> +               ->  Values Scan on "*VALUES*"
> +                     Output: "*VALUES*".*, "*VALUES*".column1
> +(24 rows)
> 
> In this test case, the foreign partition is updated before ri_FdwState is
> overwritten by postgresBeginForeignInsert() that's invoked by the tuple
> routing code, so it would work without any fix.  So I modified this test
> case as such.

Thanks for fixing that.   I hadn't noticed that foreign partition remp is
updated before local partition locp; should've added a locp0 for values
(0) so that remp appears second in the ModifyTable's subplans.  Or, well,
make the foreign table remp appear later by making it a partition for
values in (3) as your patch does.

BTW, have you noticed that the RETURNING clause returns the same row twice?

+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
returning *;
+ a |         b         | x
+---+-------------------+---
+ 3 | qux triggered !   | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered !   | 3
+(3 rows)

You can see that the row that's moved into remp is returned twice (one
with "qux triggered!" in b column), whereas it should've been only once?
I checked the behavior with moving into a local partition just to be sure
and the changed row is returned only once.

create table p (a int, b text) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
insert into p values (1, 'p1'), (2, 'p2');

select tableoid::regclass, * from p;
 tableoid │ a │ b
──────────┼───┼────
 p1       │ 1 │ p1
 p2       │ 2 │ p2
(2 rows)

update p set a = 2 returning tableoid::regclass, *;
 tableoid │ a │ b
──────────┼───┼────
 p2       │ 2 │ p1
 p2       │ 2 │ p2
(2 rows)

Add a foreign table partition in the mix and that's no longer true.

create extension postgres_fdw ;
create server loopback foreign data wrapper postgres_fdw;
create user mapping for current_user server loopback;
create table p3base (a int check (a = 3), b text);
create foreign table p3 partition of p for values in (3) server loopback
options (table_name 'p3base');
delete from p;
insert into p values (1, 'p1'), (2, 'p2'), (3, 'p3');


select tableoid::regclass, * from p;
 tableoid │ a │ b
──────────┼───┼────
 p1       │ 1 │ p1
 p2       │ 2 │ p2
 p3       │ 3 │ p3
(3 rows)

update p set a = 3 returning tableoid::regclass, *;
 tableoid │ a │ b
──────────┼───┼────
 p3       │ 3 │ p1
 p3       │ 3 │ p2
 p3       │ 3 │ p3
 p3       │ 3 │ p1
 p3       │ 3 │ p2
(5 rows)

select tableoid::regclass, * from p;
 tableoid │ a │ b
──────────┼───┼────
 p3       │ 3 │ p3
 p3       │ 3 │ p1
 p3       │ 3 │ p2
(3 rows)

I'm not sure if it's a bug that ought to be fixed, but thought I should
highlight it just in case.

Thanks,
Amit




pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: speeding up planning with partitions
Next
From: Justin Pryzby
Date:
Subject: Re: Should we add GUCs to allow partition pruning to be disabled?