Re: bug in update tuple routing with foreign partitions - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: bug in update tuple routing with foreign partitions
Date
Msg-id 5CAF257A.9050604@lab.ntt.co.jp
Whole thread Raw
In response to Re: bug in update tuple routing with foreign partitions  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: bug in update tuple routing with foreign partitions
List pgsql-hackers
(2019/04/11 14:33), Amit Langote wrote:
> On 2019/04/10 17:38, Etsuro Fujita wrote:
>> (2019/03/06 18:33), Amit Langote wrote:
>>> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
>>> to use for partition routing targets.  Specifically, the bug occurs when
>>> UPDATE targets include a foreign partition that is locally modified (as
>>> opposed to being modified directly on the remove server) and its
>>> ResultRelInfo (called subplan result rel in the source code) is reused for
>>> tuple routing:
>>
>>> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
>>> subplan result rel if its ri_FdwState is already set.
>>
>> I'm not sure that is a good idea, because that requires to create a new
>> ResultRelInfo, which is not free; I think it requires a lot of work.
>
> After considering this a bit more and studying your proposed solution, I
> tend to agree.  Beside the performance considerations you point out that I
> too think are valid, I realize that going with my approach would mean
> embedding some assumptions in the core code about ri_FdwState; in this
> case, assuming that a subplan result rel's ri_FdwState is not to be
> re-purposed for tuple routing insert, which is only based on looking at
> postgres_fdw's implementation.

Yeah, that assumption might not apply to some other FDWs.

> Not to mention the ensuing complexity in
> the core tuple routing code -- it now needs to account for the fact that
> not all subplan result rels may have been re-purposed for tuple-routing,
> something that's too complicated to handle in PG 11.

I think so too.

> IOW, let's call this a postgres_fdw bug and fix it there as you propose.

Agreed.

>> Another solution to avoid that would be to store the fmstate created by
>> postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
>> ri_FdwState, like the attached.  This would not need any changes to the
>> core, and this would not cause any overheads either, IIUC.  What do you
>> think about that?
>
> +1.  Just one comment:
>
> +    /*
> +     * If the given resultRelInfo already has PgFdwModifyState set, it means
> +     * the foreign table is an UPDATE subplan resultrel; in which case, store
> +     * the resulting state into the PgFdwModifyState.
> +     */
>
> The last "PgFdwModifyState" should be something else?  Maybe,
> sub-PgModifyState or sub_fmstate?

OK, will revise.  My favorite would be the latter.

>>> I've also added a
>>> test in postgres_fdw.sql to exercise this test case.
>>
>> Thanks again, but the test case you added works well without any fix:
>
> Oops, I should really adopt a habit of adding a test case before code.
>
>> +-- Check case where the foreign partition is a subplan target rel and
>> +-- foreign parttion is locally modified (target table being joined
>> +-- locally prevents a direct/remote modification plan).
>> +explain (verbose, costs off)
>> +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
>> returning *;
>> +                                  QUERY PLAN
>> +------------------------------------------------------------------------------
>>
>> + Update on public.utrtest
>> +   Output: remp.a, remp.b, "*VALUES*".column1
>> +   Foreign Update on public.remp
>> +     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING
>> a, b
>> +   Update on public.locp
>> +   ->   Hash Join
>> +         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
>> +         Hash Cond: (remp.a = "*VALUES*".column1)
>> +         ->   Foreign Scan on public.remp
>> +               Output: remp.b, remp.ctid, remp.a
>> +               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
>> +         ->   Hash
>> +               Output: "*VALUES*".*, "*VALUES*".column1
>> +               ->   Values Scan on "*VALUES*"
>> +                     Output: "*VALUES*".*, "*VALUES*".column1
>> +   ->   Hash Join
>> +         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
>> +         Hash Cond: (locp.a = "*VALUES*".column1)
>> +         ->   Seq Scan on public.locp
>> +               Output: locp.b, locp.ctid, locp.a
>> +         ->   Hash
>> +               Output: "*VALUES*".*, "*VALUES*".column1
>> +               ->   Values Scan on "*VALUES*"
>> +                     Output: "*VALUES*".*, "*VALUES*".column1
>> +(24 rows)
>>
>> In this test case, the foreign partition is updated before ri_FdwState is
>> overwritten by postgresBeginForeignInsert() that's invoked by the tuple
>> routing code, so it would work without any fix.  So I modified this test
>> case as such.
>
> Thanks for fixing that.   I hadn't noticed that foreign partition remp is
> updated before local partition locp; should've added a locp0 for values
> (0) so that remp appears second in the ModifyTable's subplans.  Or, well,
> make the foreign table remp appear later by making it a partition for
> values in (3) as your patch does.
>
> BTW, have you noticed that the RETURNING clause returns the same row twice?

I noticed this, but I didn't think it hard.  :(

> +update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
> returning *;
> + a |         b         | x
> +---+-------------------+---
> + 3 | qux triggered !   | 2
> + 3 | xyzzy triggered ! | 3
> + 3 | qux triggered !   | 3
> +(3 rows)
>
> You can see that the row that's moved into remp is returned twice (one
> with "qux triggered!" in b column), whereas it should've been only once?

Yeah, this is unexpected behavior, so will look into this.  Thanks for 
pointing that out!

Best regards,
Etsuro Fujita




pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: Failure in contrib test _int on loach
Next
From: Heikki Linnakangas
Date:
Subject: Re: Pluggable Storage - Andres's take