Re: bug in update tuple routing with foreign partitions - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: bug in update tuple routing with foreign partitions |
Date | |
Msg-id | 390bfb5d-4b82-18e8-8e42-dca94cd55341@lab.ntt.co.jp Whole thread Raw |
In response to | Re: bug in update tuple routing with foreign partitions (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: bug in update tuple routing with foreign partitions
|
List | pgsql-hackers |
Fujita-san, Thanks for the review. On 2019/04/10 17:38, Etsuro Fujita wrote: > (2019/03/06 18:33), Amit Langote wrote: >> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos >> to use for partition routing targets. Specifically, the bug occurs when >> UPDATE targets include a foreign partition that is locally modified (as >> opposed to being modified directly on the remove server) and its >> ResultRelInfo (called subplan result rel in the source code) is reused for >> tuple routing: > >> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a >> subplan result rel if its ri_FdwState is already set. > > I'm not sure that is a good idea, because that requires to create a new > ResultRelInfo, which is not free; I think it requires a lot of work. After considering this a bit more and studying your proposed solution, I tend to agree. Beside the performance considerations you point out that I too think are valid, I realize that going with my approach would mean embedding some assumptions in the core code about ri_FdwState; in this case, assuming that a subplan result rel's ri_FdwState is not to be re-purposed for tuple routing insert, which is only based on looking at postgres_fdw's implementation. Not to mention the ensuing complexity in the core tuple routing code -- it now needs to account for the fact that not all subplan result rels may have been re-purposed for tuple-routing, something that's too complicated to handle in PG 11. IOW, let's call this a postgres_fdw bug and fix it there as you propose. > Another solution to avoid that would be to store the fmstate created by > postgresBeginForeignInsert() into the ri_FdwState, not overwrite the > ri_FdwState, like the attached. This would not need any changes to the > core, and this would not cause any overheads either, IIUC. What do you > think about that? +1. Just one comment: + /* + * If the given resultRelInfo already has PgFdwModifyState set, it means + * the foreign table is an UPDATE subplan resultrel; in which case, store + * the resulting state into the PgFdwModifyState. + */ The last "PgFdwModifyState" should be something else? Maybe, sub-PgModifyState or sub_fmstate? >> I've also added a >> test in postgres_fdw.sql to exercise this test case. > > Thanks again, but the test case you added works well without any fix: Oops, I should really adopt a habit of adding a test case before code. > +-- Check case where the foreign partition is a subplan target rel and > +-- foreign parttion is locally modified (target table being joined > +-- locally prevents a direct/remote modification plan). > +explain (verbose, costs off) > +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x > returning *; > + QUERY PLAN > +------------------------------------------------------------------------------ > > + Update on public.utrtest > + Output: remp.a, remp.b, "*VALUES*".column1 > + Foreign Update on public.remp > + Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING > a, b > + Update on public.locp > + -> Hash Join > + Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1 > + Hash Cond: (remp.a = "*VALUES*".column1) > + -> Foreign Scan on public.remp > + Output: remp.b, remp.ctid, remp.a > + Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE > + -> Hash > + Output: "*VALUES*".*, "*VALUES*".column1 > + -> Values Scan on "*VALUES*" > + Output: "*VALUES*".*, "*VALUES*".column1 > + -> Hash Join > + Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1 > + Hash Cond: (locp.a = "*VALUES*".column1) > + -> Seq Scan on public.locp > + Output: locp.b, locp.ctid, locp.a > + -> Hash > + Output: "*VALUES*".*, "*VALUES*".column1 > + -> Values Scan on "*VALUES*" > + Output: "*VALUES*".*, "*VALUES*".column1 > +(24 rows) > > In this test case, the foreign partition is updated before ri_FdwState is > overwritten by postgresBeginForeignInsert() that's invoked by the tuple > routing code, so it would work without any fix. So I modified this test > case as such. Thanks for fixing that. I hadn't noticed that foreign partition remp is updated before local partition locp; should've added a locp0 for values (0) so that remp appears second in the ModifyTable's subplans. Or, well, make the foreign table remp appear later by making it a partition for values in (3) as your patch does. BTW, have you noticed that the RETURNING clause returns the same row twice? +update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; + a | b | x +---+-------------------+--- + 3 | qux triggered ! | 2 + 3 | xyzzy triggered ! | 3 + 3 | qux triggered ! | 3 +(3 rows) You can see that the row that's moved into remp is returned twice (one with "qux triggered!" in b column), whereas it should've been only once? I checked the behavior with moving into a local partition just to be sure and the changed row is returned only once. create table p (a int, b text) partition by list (a); create table p1 partition of p for values in (1); create table p2 partition of p for values in (2); insert into p values (1, 'p1'), (2, 'p2'); select tableoid::regclass, * from p; tableoid │ a │ b ──────────┼───┼──── p1 │ 1 │ p1 p2 │ 2 │ p2 (2 rows) update p set a = 2 returning tableoid::regclass, *; tableoid │ a │ b ──────────┼───┼──── p2 │ 2 │ p1 p2 │ 2 │ p2 (2 rows) Add a foreign table partition in the mix and that's no longer true. create extension postgres_fdw ; create server loopback foreign data wrapper postgres_fdw; create user mapping for current_user server loopback; create table p3base (a int check (a = 3), b text); create foreign table p3 partition of p for values in (3) server loopback options (table_name 'p3base'); delete from p; insert into p values (1, 'p1'), (2, 'p2'), (3, 'p3'); select tableoid::regclass, * from p; tableoid │ a │ b ──────────┼───┼──── p1 │ 1 │ p1 p2 │ 2 │ p2 p3 │ 3 │ p3 (3 rows) update p set a = 3 returning tableoid::regclass, *; tableoid │ a │ b ──────────┼───┼──── p3 │ 3 │ p1 p3 │ 3 │ p2 p3 │ 3 │ p3 p3 │ 3 │ p1 p3 │ 3 │ p2 (5 rows) select tableoid::regclass, * from p; tableoid │ a │ b ──────────┼───┼──── p3 │ 3 │ p3 p3 │ 3 │ p1 p3 │ 3 │ p2 (3 rows) I'm not sure if it's a bug that ought to be fixed, but thought I should highlight it just in case. Thanks, Amit
pgsql-hackers by date: