Thread: Oddity in tuple routing for foreign partitions

Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
Hi,

While updating the fix-postgres_fdw-WCO-handling patch, I noticed that
the patch for tuple routing for foreign partitions doesn't work well
with remote triggers.  Here is an example:

postgres=# create table loct1 (a int check (a in (1)), b text);
postgres=# create function br_insert_trigfunc() returns trigger as
$$begin new.b := new.b || ' triggered !'; return new; end$$ language
plpgsql;
postgres=# create trigger loct1_br_insert_trigger before insert on loct1
for each row execute procedure br_insert_trigfunc();
postgres=# create table itrtest (a int, b text) partition by list (a);
postgres=# create foreign table remp1 (a int check (a in (1)), b text)
server loopback options (table_name 'loct1');
postgres=# alter table itrtest attach partition remp1 for values in (1);

This behaves oddly:

postgres=# insert into itrtest values (1, 'foo') returning *;
 a |  b
---+-----
 1 | foo
(1 row)

INSERT 0 1

The new value of b in the RETURNING result should be concatenated with '
triggered !', as shown below:

postgres=# select tableoid::regclass, * from itrtest ;
 tableoid | a |        b
----------+---+-----------------
 remp1    | 1 | foo triggered !
(1 row)

The reason for that is: since that in ExecInitPartitionInfo, the core
calls BeginForeignInsert via ExecInitRoutingInfo before initializing
ri_returningList of ResultRelInfo for the partition, BeginForeignInsert
sees that the ri_returningList is NULL and incorrectly recognizes that
the local query doesn't have a RETURNING list.  So, I moved the
ExecInitRoutingInfo call after building the ri_returningList (and before
handling ON CONFLICT because we reference the conversion map created by
that when handling that clause).  The first version of the patch called
BeginForeignInsert that order, but I updated the patch incorrectly.  :(
 Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
and added that to ExecInitPartitionInfo, right after the
InitResultRelInfo call, because it would be better to abort the
operation as soon as we find the partition to be non-routable.  Another
thing I noticed is the RT index of the foreign partition set to 1 in
postgresBeginForeignInsert to create a remote query; that would not work
for cases where the partitioned table has an RT index larger than 1 (eg,
the partitioned table is not the primary ModifyTable node); in which
cases, since the varno of Vars belonging to the foreign partition in the
RETURNING expressions is the same as the partitioned table, calling
deparseReturningList with the RT index set to 1 against the RETURNING
expressions would produce attrs_used to be NULL, leading to postgres_fdw
not retrieving actually inserted data from the remote, even if remote
triggers might change those data.  So, I fixed this as well, by setting
the RT index accordingly to the partitioned table.  Attached is a patch
for fixing these issues.  I'll add this to the open items list for PG11.
 (After addressing this, I'll post an updated version of the
fix-postgres_fdw-WCO-handling patch.)

Best regards,
Etsuro Fujita

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
Fujita-san,

On 2018/04/16 20:25, Etsuro Fujita wrote:
> Hi,
> 
> While updating the fix-postgres_fdw-WCO-handling patch, I noticed that
> the patch for tuple routing for foreign partitions doesn't work well
> with remote triggers.  Here is an example:
> 
> postgres=# create table loct1 (a int check (a in (1)), b text);
> postgres=# create function br_insert_trigfunc() returns trigger as
> $$begin new.b := new.b || ' triggered !'; return new; end$$ language
> plpgsql;
> postgres=# create trigger loct1_br_insert_trigger before insert on loct1
> for each row execute procedure br_insert_trigfunc();
> postgres=# create table itrtest (a int, b text) partition by list (a);
> postgres=# create foreign table remp1 (a int check (a in (1)), b text)
> server loopback options (table_name 'loct1');
> postgres=# alter table itrtest attach partition remp1 for values in (1);
> 
> This behaves oddly:
> 
> postgres=# insert into itrtest values (1, 'foo') returning *;
>  a |  b
> ---+-----
>  1 | foo
> (1 row)
> 
> INSERT 0 1
> 
> The new value of b in the RETURNING result should be concatenated with '
> triggered !', as shown below:
> 
> postgres=# select tableoid::regclass, * from itrtest ;
>  tableoid | a |        b
> ----------+---+-----------------
>  remp1    | 1 | foo triggered !
> (1 row)
> 
> The reason for that is: since that in ExecInitPartitionInfo, the core
> calls BeginForeignInsert via ExecInitRoutingInfo before initializing
> ri_returningList of ResultRelInfo for the partition, BeginForeignInsert
> sees that the ri_returningList is NULL and incorrectly recognizes that
> the local query doesn't have a RETURNING list.

Good catch!

> So, I moved the
> ExecInitRoutingInfo call after building the ri_returningList (and before
> handling ON CONFLICT because we reference the conversion map created by
> that when handling that clause).  The first version of the patch called
> BeginForeignInsert that order, but I updated the patch incorrectly.  :(

I didn't notice that when reviewing either.  Actually, I was under the
impression that BeginForeignInsert consumes returningList from the
ModifyTable node itself, not the ResultRelInfo, but now I see that
ri_returningList was added exactly for BeginForeignInsert to consume.
Good thing about that is that we get a Var-translated version instead of
the original one that contains parent's attnos.

> Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
> and added that to ExecInitPartitionInfo right after the> InitResultRelInfo call, because it would be better to abort
the
> operation as soon as we find the partition to be non-routable.

Sounds fine.

> Another
> thing I noticed is the RT index of the foreign partition set to 1 in
> postgresBeginForeignInsert to create a remote query; that would not work
> for cases where the partitioned table has an RT index larger than 1 (eg,
> the partitioned table is not the primary ModifyTable node); in which
> cases, since the varno of Vars belonging to the foreign partition in the
> RETURNING expressions is the same as the partitioned table, calling
> deparseReturningList with the RT index set to 1 against the RETURNING
> expressions would produce attrs_used to be NULL, leading to postgres_fdw
> not retrieving actually inserted data from the remote, even if remote
> triggers might change those data.  So, I fixed this as well, by setting
> the RT index accordingly to the partitioned table.

Check.

> Attached is a patch
> for fixing these issues.  I'll add this to the open items list for PG11.
>  (After addressing this, I'll post an updated version of the
> fix-postgres_fdw-WCO-handling patch.)

Your patch seems to fix the issue and code changes seem fine to me.

Thanks,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
Hi Amit,

(2018/04/17 10:10), Amit Langote wrote:
> On 2018/04/16 20:25, Etsuro Fujita wrote:
>> While updating the fix-postgres_fdw-WCO-handling patch, I noticed that
>> the patch for tuple routing for foreign partitions doesn't work well
>> with remote triggers.  Here is an example:
>>
>> postgres=# create table loct1 (a int check (a in (1)), b text);
>> postgres=# create function br_insert_trigfunc() returns trigger as
>> $$begin new.b := new.b || ' triggered !'; return new; end$$ language
>> plpgsql;
>> postgres=# create trigger loct1_br_insert_trigger before insert on loct1
>> for each row execute procedure br_insert_trigfunc();
>> postgres=# create table itrtest (a int, b text) partition by list (a);
>> postgres=# create foreign table remp1 (a int check (a in (1)), b text)
>> server loopback options (table_name 'loct1');
>> postgres=# alter table itrtest attach partition remp1 for values in (1);
>>
>> This behaves oddly:
>>
>> postgres=# insert into itrtest values (1, 'foo') returning *;
>>   a |  b
>> ---+-----
>>   1 | foo
>> (1 row)
>>
>> INSERT 0 1
>>
>> The new value of b in the RETURNING result should be concatenated with '
>> triggered !', as shown below:
>>
>> postgres=# select tableoid::regclass, * from itrtest ;
>>   tableoid | a |        b
>> ----------+---+-----------------
>>   remp1    | 1 | foo triggered !
>> (1 row)
>>
>> The reason for that is: since that in ExecInitPartitionInfo, the core
>> calls BeginForeignInsert via ExecInitRoutingInfo before initializing
>> ri_returningList of ResultRelInfo for the partition, BeginForeignInsert
>> sees that the ri_returningList is NULL and incorrectly recognizes that
>> the local query doesn't have a RETURNING list.
> 
> Good catch!
> 
>> So, I moved the
>> ExecInitRoutingInfo call after building the ri_returningList (and before
>> handling ON CONFLICT because we reference the conversion map created by
>> that when handling that clause).  The first version of the patch called
>> BeginForeignInsert that order, but I updated the patch incorrectly.  :(
> 
> I didn't notice that when reviewing either.  Actually, I was under the
> impression that BeginForeignInsert consumes returningList from the
> ModifyTable node itself, not the ResultRelInfo, but now I see that
> ri_returningList was added exactly for BeginForeignInsert to consume.
> Good thing about that is that we get a Var-translated version instead of
> the original one that contains parent's attnos.
> 
>> Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
>> and added that to ExecInitPartitionInfo right after the>  InitResultRelInfo call, because it would be better to
abortthe
 
>> operation as soon as we find the partition to be non-routable.
> 
> Sounds fine.
> 
>> Another
>> thing I noticed is the RT index of the foreign partition set to 1 in
>> postgresBeginForeignInsert to create a remote query; that would not work
>> for cases where the partitioned table has an RT index larger than 1 (eg,
>> the partitioned table is not the primary ModifyTable node); in which
>> cases, since the varno of Vars belonging to the foreign partition in the
>> RETURNING expressions is the same as the partitioned table, calling
>> deparseReturningList with the RT index set to 1 against the RETURNING
>> expressions would produce attrs_used to be NULL, leading to postgres_fdw
>> not retrieving actually inserted data from the remote, even if remote
>> triggers might change those data.  So, I fixed this as well, by setting
>> the RT index accordingly to the partitioned table.
> 
> Check.
> 
>> Attached is a patch
>> for fixing these issues.  I'll add this to the open items list for PG11.
>>   (After addressing this, I'll post an updated version of the
>> fix-postgres_fdw-WCO-handling patch.)
> 
> Your patch seems to fix the issue and code changes seem fine to me.

Thanks for the review!

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Kyotaro HORIGUCHI
Date:
Hello.

At Tue, 17 Apr 2018 10:10:38 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<fefb9510-e304-27da-e984-a2b0ac77927a@lab.ntt.co.jp>
> Fujita-san,
> 
> On 2018/04/16 20:25, Etsuro Fujita wrote:
> > Hi,
> > 
> > While updating the fix-postgres_fdw-WCO-handling patch, I noticed that
> > the patch for tuple routing for foreign partitions doesn't work well
> > with remote triggers.  Here is an example:
> > 
> > postgres=# create table loct1 (a int check (a in (1)), b text);
> > postgres=# create function br_insert_trigfunc() returns trigger as
> > $$begin new.b := new.b || ' triggered !'; return new; end$$ language
> > plpgsql;
> > postgres=# create trigger loct1_br_insert_trigger before insert on loct1
> > for each row execute procedure br_insert_trigfunc();
> > postgres=# create table itrtest (a int, b text) partition by list (a);
> > postgres=# create foreign table remp1 (a int check (a in (1)), b text)
> > server loopback options (table_name 'loct1');
> > postgres=# alter table itrtest attach partition remp1 for values in (1);
> > 
> > This behaves oddly:
> > 
> > postgres=# insert into itrtest values (1, 'foo') returning *;
> >  a |  b
> > ---+-----
> >  1 | foo
> > (1 row)
> > 
> > INSERT 0 1
> > 
> > The new value of b in the RETURNING result should be concatenated with '
> > triggered !', as shown below:
> > 
> > postgres=# select tableoid::regclass, * from itrtest ;
> >  tableoid | a |        b
> > ----------+---+-----------------
> >  remp1    | 1 | foo triggered !
> > (1 row)
> > 
> > The reason for that is: since that in ExecInitPartitionInfo, the core
> > calls BeginForeignInsert via ExecInitRoutingInfo before initializing
> > ri_returningList of ResultRelInfo for the partition, BeginForeignInsert
> > sees that the ri_returningList is NULL and incorrectly recognizes that
> > the local query doesn't have a RETURNING list.
> 
> Good catch!
> 
> > So, I moved the
> > ExecInitRoutingInfo call after building the ri_returningList (and before
> > handling ON CONFLICT because we reference the conversion map created by
> > that when handling that clause).  The first version of the patch called
> > BeginForeignInsert that order, but I updated the patch incorrectly.  :(
> 
> I didn't notice that when reviewing either.  Actually, I was under the
> impression that BeginForeignInsert consumes returningList from the
> ModifyTable node itself, not the ResultRelInfo, but now I see that
> ri_returningList was added exactly for BeginForeignInsert to consume.
> Good thing about that is that we get a Var-translated version instead of
> the original one that contains parent's attnos.
> 
> > Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
> > and added that to ExecInitPartitionInfo right after the> InitResultRelInfo call, because it would be better to
abortthe
 
> > operation as soon as we find the partition to be non-routable.
> 
> Sounds fine.

If I'm reading this correctly, ExecInitParititionInfo calls
ExecInitRoutingInfo so currently CheckValidityResultRel is called
for the child when partrel is created in ExecPrepareTupleRouting.
But the move of CheckValidResultRel results in letting partrel
just created in ExecPrepareTupleRouting not be checked at all
since ri_ParititionReadyForRouting is always set true in
ExecInitPartitionInfo.

> > Another
> > thing I noticed is the RT index of the foreign partition set to 1 in
> > postgresBeginForeignInsert to create a remote query; that would not work
> > for cases where the partitioned table has an RT index larger than 1 (eg,
> > the partitioned table is not the primary ModifyTable node); in which
> > cases, since the varno of Vars belonging to the foreign partition in the
> > RETURNING expressions is the same as the partitioned table, calling
> > deparseReturningList with the RT index set to 1 against the RETURNING
> > expressions would produce attrs_used to be NULL, leading to postgres_fdw
> > not retrieving actually inserted data from the remote, even if remote
> > triggers might change those data.  So, I fixed this as well, by setting
> > the RT index accordingly to the partitioned table.
> 
> Check.

I'm not sure but is it true that the parent is always at
resultRelation - 1?

> > Attached is a patch
> > for fixing these issues.  I'll add this to the open items list for PG11.
> >  (After addressing this, I'll post an updated version of the
> > fix-postgres_fdw-WCO-handling patch.)
> 
> Your patch seems to fix the issue and code changes seem fine to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/04/17 11:13, Kyotaro HORIGUCHI wrote:
>>> Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
>>> and added that to ExecInitPartitionInfo right after the> InitResultRelInfo call,
>>> because it would be better to abort the
>>> operation as soon as we find the partition to be non-routable.
>>
>> Sounds fine.
> 
> If I'm reading this correctly, ExecInitParititionInfo calls
> ExecInitRoutingInfo so currently CheckValidityResultRel is called
> for the child when partrel is created in ExecPrepareTupleRouting.
> But the move of CheckValidResultRel results in letting partrel
> just created in ExecPrepareTupleRouting not be checked at all
> since ri_ParititionReadyForRouting is always set true in
> ExecInitPartitionInfo.

I thought the same upon reading the email, but it seems that the patch
does add CheckValidResultRel() in ExecInitPartitionInfo() as well:

@@ -335,6 +335,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
                       estate->es_instrument);

     /*
+     * Verify result relation is a valid target for an INSERT.  An UPDATE
of a
+     * partition-key becomes a DELETE+INSERT operation, so this check is
still
+     * required when the operation is CMD_UPDATE.
+     */
+    CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+    /*
      * Since we've just initialized this ResultRelInfo, it's not in any list
      * attached to the estate as yet.  Add it, so that it can be found later.
      *


>>> Another
>>> thing I noticed is the RT index of the foreign partition set to 1 in
>>> postgresBeginForeignInsert to create a remote query; that would not work
>>> for cases where the partitioned table has an RT index larger than 1 (eg,
>>> the partitioned table is not the primary ModifyTable node); in which
>>> cases, since the varno of Vars belonging to the foreign partition in the
>>> RETURNING expressions is the same as the partitioned table, calling
>>> deparseReturningList with the RT index set to 1 against the RETURNING
>>> expressions would produce attrs_used to be NULL, leading to postgres_fdw
>>> not retrieving actually inserted data from the remote, even if remote
>>> triggers might change those data.  So, I fixed this as well, by setting
>>> the RT index accordingly to the partitioned table.
>>
>> Check.
> 
> I'm not sure but is it true that the parent is always at
> resultRelation - 1?

Unless I'm missing something, EState.es_range_table contains *all* range
table entries that were created by the planner.  So, if the planner
assigned resultRelation as the RT index for the parent, then it seems we
can rely on getting the same entry back with
list_nth_cell(estate->es_range_table, resultRelation - 1).  That seems
true even if the parent wasn't the *primary* target relation.  For
example, the following works with the patch:

-- in addition to the tables shown in Fujita-san's email, define one more
-- local partition
create table locp2 partition of itrtest for values in (2);

-- simple case
insert into itrtest values (1, 'foo') returning *;
 a |        b
---+-----------------
 1 | foo triggered !
(1 row)

-- itrtest (the parent) appears as both the primary target relation and
-- otherwise.  The latter in the form of being mentioned in the with query

with ins (a, b) as (insert into itrtest values (1, 'foo') returning *)
insert into itrtest select a + 1, b from ins returning *;
 a |        b
---+-----------------
 2 | foo triggered !
(1 row)

But maybe I misunderstood your question.

Thanks,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/17 16:10), Amit Langote wrote:
> On 2018/04/17 11:13, Kyotaro HORIGUCHI wrote:
>> If I'm reading this correctly, ExecInitParititionInfo calls
>> ExecInitRoutingInfo so currently CheckValidityResultRel is called
>> for the child when partrel is created in ExecPrepareTupleRouting.
>> But the move of CheckValidResultRel results in letting partrel
>> just created in ExecPrepareTupleRouting not be checked at all
>> since ri_ParititionReadyForRouting is always set true in
>> ExecInitPartitionInfo.
>
> I thought the same upon reading the email, but it seems that the patch
> does add CheckValidResultRel() in ExecInitPartitionInfo() as well:
>
> @@ -335,6 +335,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
>                         estate->es_instrument);
>
>       /*
> +     * Verify result relation is a valid target for an INSERT.  An UPDATE
> of a
> +     * partition-key becomes a DELETE+INSERT operation, so this check is
> still
> +     * required when the operation is CMD_UPDATE.
> +     */
> +    CheckValidResultRel(leaf_part_rri, CMD_INSERT);
> +
> +    /*
>        * Since we've just initialized this ResultRelInfo, it's not in any list
>        * attached to the estate as yet.  Add it, so that it can be found later.
>        *

That's right.  So, the validity check would be applied to a partition 
created in ExecPrepareTupleRouting as well.

>>>> Another
>>>> thing I noticed is the RT index of the foreign partition set to 1 in
>>>> postgresBeginForeignInsert to create a remote query; that would not work
>>>> for cases where the partitioned table has an RT index larger than 1 (eg,
>>>> the partitioned table is not the primary ModifyTable node); in which
>>>> cases, since the varno of Vars belonging to the foreign partition in the
>>>> RETURNING expressions is the same as the partitioned table, calling
>>>> deparseReturningList with the RT index set to 1 against the RETURNING
>>>> expressions would produce attrs_used to be NULL, leading to postgres_fdw
>>>> not retrieving actually inserted data from the remote, even if remote
>>>> triggers might change those data.  So, I fixed this as well, by setting
>>>> the RT index accordingly to the partitioned table.
>>>
>>> Check.
>>
>> I'm not sure but is it true that the parent is always at
>> resultRelation - 1?
>
> Unless I'm missing something, EState.es_range_table contains *all* range
> table entries that were created by the planner.  So, if the planner
> assigned resultRelation as the RT index for the parent, then it seems we
> can rely on getting the same entry back with
> list_nth_cell(estate->es_range_table, resultRelation - 1).  That seems
> true even if the parent wasn't the *primary* target relation.  For
> example, the following works with the patch:
>
> -- in addition to the tables shown in Fujita-san's email, define one more
> -- local partition
> create table locp2 partition of itrtest for values in (2);
>
> -- simple case
> insert into itrtest values (1, 'foo') returning *;
>   a |        b
> ---+-----------------
>   1 | foo triggered !
> (1 row)
>
> -- itrtest (the parent) appears as both the primary target relation and
> -- otherwise.  The latter in the form of being mentioned in the with query
>
> with ins (a, b) as (insert into itrtest values (1, 'foo') returning *)
> insert into itrtest select a + 1, b from ins returning *;
>   a |        b
> ---+-----------------
>   2 | foo triggered !
> (1 row)

I agree with that.  Thanks for the explanation and the example, Amit!

Maybe my explanation (or the naming parent_rte/child_rte in the patch) 
was not necessarily good, so I guess that that confused horiguchi-san. 
Let me explain.  In the INSERT/COPY-tuple-routing case, as explained by 
Amit, the RTE at that position in the EState's range table is the one 
for the partitioned table of a given partition, so the statement would 
be true.  BUT in the UPDATE-tuple-routing case, the RTE is the one for 
the given partition, not for the partitioned table, so the statement 
would not be true.  In the latter case, we don't need to create a child 
RTE and replace the original RTE with it, but I handled both cases the 
same way for simplicity.

Thanks for the comments, Horiguchi-san!

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/04/17 16:41, Etsuro Fujita wrote:
> In the INSERT/COPY-tuple-routing case, as explained by Amit, the
> RTE at that position in the EState's range table is the one for the
> partitioned table of a given partition, so the statement would be true. 
> BUT in the UPDATE-tuple-routing case, the RTE is the one for the given
> partition, not for the partitioned table, so the statement would not be
> true.  In the latter case, we don't need to create a child RTE and replace
> the original RTE with it, but I handled both cases the same way for
> simplicity.

Oh, I didn't really consider this part carefully.  That the resultRelInfo
received by BeginForeignInsert (when called by ExecInitRoutingInfo) could
be a reused UPDATE result relation.  It might be possible to justify the
parent_rte/child_rte terminology by explaining it a bit better.  I see
three cases that arise during tuple routing:

1. This is INSERT and so the resultRelation that's received in
   BeginForeignInsert has been freshly created in ExecInitPartitionInfo
   and it bears node->nominalRelation or 1 as its ri_RangeTableIndex

2. This is UPDATE and the resultRelInfo that's received in
   BeginForeignInsert has been freshly created in ExecInitPartitionInfo
   and it bears node->nominalRelation or 1 as its ri_RangeTableIndex

3. This is UPDATE and the resultRelInfo that's received in
   BeginForeignInsert is a reused one, in which case, it bears the planner
   assigned ri_RangeTableIndex

In all three cases, I think we can rely on using ri_RangeTableIndex to
fetch a valid "parent" RTE from es_range_table.

Do you think we need to clarify this in a comment?

Thanks,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/18 14:44), Amit Langote wrote:
> On 2018/04/17 16:41, Etsuro Fujita wrote:
>> In the INSERT/COPY-tuple-routing case, as explained by Amit, the
>> RTE at that position in the EState's range table is the one for the
>> partitioned table of a given partition, so the statement would be true.
>> BUT in the UPDATE-tuple-routing case, the RTE is the one for the given
>> partition, not for the partitioned table, so the statement would not be
>> true.  In the latter case, we don't need to create a child RTE and replace
>> the original RTE with it, but I handled both cases the same way for
>> simplicity.
>
> Oh, I didn't really consider this part carefully.  That the resultRelInfo
> received by BeginForeignInsert (when called by ExecInitRoutingInfo) could
> be a reused UPDATE result relation.  It might be possible to justify the
> parent_rte/child_rte terminology by explaining it a bit better.

Yeah, I think that terminology would be confusing, so I changed it to 
old_rte/new_rte.

> I see
> three cases that arise during tuple routing:
>
> 1. This is INSERT and so the resultRelation that's received in
>     BeginForeignInsert has been freshly created in ExecInitPartitionInfo
>     and it bears node->nominalRelation or 1 as its ri_RangeTableIndex

Right.

> 2. This is UPDATE and the resultRelInfo that's received in
>     BeginForeignInsert has been freshly created in ExecInitPartitionInfo
>     and it bears node->nominalRelation or 1 as its ri_RangeTableIndex

In that case, we have a valid plan node, so it would only bear 
node->nominalRelation.

> 3. This is UPDATE and the resultRelInfo that's received in
>     BeginForeignInsert is a reused one, in which case, it bears the planner
>     assigned ri_RangeTableIndex

Right.

> In all three cases, I think we can rely on using ri_RangeTableIndex to
> fetch a valid "parent" RTE from es_range_table.

I slept on this, I noticed there is another bug in case #2.  Here is an 
example with the previous patch:

postgres=# create table utrtest (a int, b text) partition by list (a);
postgres=# create table loct (a int check (a in (1)), b text);
postgres=# create foreign table remp (a int check (a in (1)), b text) 
server loopback options (table_name 'loct');
postgres=# create table locp (a int check (a in (2)), b text);
postgres=# alter table utrtest attach partition remp for values in (1);
postgres=# alter table utrtest attach partition locp for values in (2);
postgres=# create trigger loct_br_insert_trigger before insert on loct 
for each row execute procedure br_insert_trigfunc();

where the trigger function is the one defined in an earlier email.

postgres=# insert into utrtest values (2, 'qux');
INSERT 0 1
postgres=# update utrtest set a = 1 where a = 2 returning *;
  a |  b
---+-----
  1 | qux
(1 row)

UPDATE 1

Wrong result!  The result should be concatenated with ' triggered !'.

In case #2, since we initialize expressions for the partition's 
ResultRelInfo including RETURNING by translating the attnos of the 
corresponding expressions in the ResultRelInfo for the first subplan 
target rel, I think we should replace the RTE for the first subplan 
target rel, not the RTE for the nominal rel, with the new one created 
for the foreign table.  Attached is an updated version for fixing this 
issue.

> Do you think we need to clarify this in a comment?

Yeah, but I updated comments about this a little bit different way in 
the attached.  Does that make sense?

Thanks for the comments!

Best regards,
Etsuro Fujita

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/04/18 21:55, Etsuro Fujita wrote:
> (2018/04/18 14:44), Amit Langote wrote:
>> Oh, I didn't really consider this part carefully.  That the resultRelInfo
>> received by BeginForeignInsert (when called by ExecInitRoutingInfo) could
>> be a reused UPDATE result relation.  It might be possible to justify the
>> parent_rte/child_rte terminology by explaining it a bit better.
> 
>> 2. This is UPDATE and the resultRelInfo that's received in
>>     BeginForeignInsert has been freshly created in ExecInitPartitionInfo
>>     and it bears node->nominalRelation or 1 as its ri_RangeTableIndex

[ ... ]

>> In all three cases, I think we can rely on using ri_RangeTableIndex to
>> fetch a valid "parent" RTE from es_range_table.
> 
> I slept on this, I noticed there is another bug in case #2.  Here is an
> example with the previous patch:
> 
> postgres=# create table utrtest (a int, b text) partition by list (a);
> postgres=# create table loct (a int check (a in (1)), b text);
> postgres=# create foreign table remp (a int check (a in (1)), b text)
> server loopback options (table_name 'loct');
> postgres=# create table locp (a int check (a in (2)), b text);
> postgres=# alter table utrtest attach partition remp for values in (1);
> postgres=# alter table utrtest attach partition locp for values in (2);
> postgres=# create trigger loct_br_insert_trigger before insert on loct for
> each row execute procedure br_insert_trigfunc();
> 
> where the trigger function is the one defined in an earlier email.
> 
> postgres=# insert into utrtest values (2, 'qux');
> INSERT 0 1
> postgres=# update utrtest set a = 1 where a = 2 returning *;
>  a |  b
> ---+-----
>  1 | qux
> (1 row)
> 
> UPDATE 1
> 
> Wrong result!  The result should be concatenated with ' triggered !'.
> 
> In case #2, since we initialize expressions for the partition's
> ResultRelInfo including RETURNING by translating the attnos of the
> corresponding expressions in the ResultRelInfo for the first subplan
> target rel, I think we should replace the RTE for the first subplan target
> rel, not the RTE for the nominal rel, with the new one created for the
> foreign table.

Ah, so in case #2, we use firstVarno (per ExecInitPartitionInfo terms) as
varno throughout.  So, we'd like to put the new RTE in that slot.

Would it be a good idea to explain *why* we need to replace the RTE in the
first place?  Afaics, it's for deparseColumnRef() to find the correct
relation when it uses planner_rt_fetch() to get the RTE.

> Attached is an updated version for fixing this issue.
> 
>> Do you think we need to clarify this in a comment?
> 
> Yeah, but I updated comments about this a little bit different way in the
> attached.  Does that make sense?

It looks generally good, although in the following:

+    /*
+     * If the foreign table is a partition, temporarily replace the parent's
+     * RTE in the range table with a new target RTE describing the foreign
+     * table for use by deparseInsertSql() and create_foreign_modify() below.
+     */

.. it could be mentioned that we don't switch either the RTE or the value
assigned to resultRelation if the RTE currently at resultRelation RT index
is the one created by the planner and refers to the same relation that
resultRelInfo does.

Beside that, I noticed that the patch has a stray white-space at the end
of the following line:

+                /* partition that isn't a subplan target rel */

Thanks,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Kyotaro HORIGUCHI
Date:
At Tue, 17 Apr 2018 16:41:31 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5AD5A52B.7050206@lab.ntt.co.jp>
> (2018/04/17 16:10), Amit Langote wrote:
> > On 2018/04/17 11:13, Kyotaro HORIGUCHI wrote:
> >> If I'm reading this correctly, ExecInitParititionInfo calls
> >> ExecInitRoutingInfo so currently CheckValidityResultRel is called
> >> for the child when partrel is created in ExecPrepareTupleRouting.
> >> But the move of CheckValidResultRel results in letting partrel
> >> just created in ExecPrepareTupleRouting not be checked at all
> >> since ri_ParititionReadyForRouting is always set true in
> >> ExecInitPartitionInfo.
> >
> > I thought the same upon reading the email, but it seems that the patch
> > does add CheckValidResultRel() in ExecInitPartitionInfo() as well:
> >
> > @@ -335,6 +335,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
> >                         estate->es_instrument);
> >
> >       /*
> > + * Verify result relation is a valid target for an INSERT.  An UPDATE
> > of a
> > + * partition-key becomes a DELETE+INSERT operation, so this check is
> > still
> > +     * required when the operation is CMD_UPDATE.
> > +     */
> > +    CheckValidResultRel(leaf_part_rri, CMD_INSERT);
> > +
> > +    /*
> >        * Since we've just initialized this ResultRelInfo, it's not in any
> >        * list
> >        * attached to the estate as yet.  Add it, so that it can be found
> >        * later.
> >        *
> 
> That's right.  So, the validity check would be applied to a partition
> created in ExecPrepareTupleRouting as well.

Yes, that's actually true but it seems quite bug-prone or at
least hard to understand. I understand that the
CheckValidResultRel(INSERT) is just a prerequisite for
ExecInitRoutingInfo but the relationship is obfucated after this
patch. If we have a strong reason to error-out as fast as
possible, the original code seems better to me..

> >>>> Another
> >>>> thing I noticed is the RT index of the foreign partition set to 1 in
> >>>> postgresBeginForeignInsert to create a remote query; that would not
> >>>> work
> >>>> for cases where the partitioned table has an RT index larger than 1
> >>>> (eg,
> >>>> the partitioned table is not the primary ModifyTable node); in which
> >>>> cases, since the varno of Vars belonging to the foreign partition in
> >>>> the
> >>>> RETURNING expressions is the same as the partitioned table, calling
> >>>> deparseReturningList with the RT index set to 1 against the RETURNING
> >>>> expressions would produce attrs_used to be NULL, leading to
> >>>> postgres_fdw
> >>>> not retrieving actually inserted data from the remote, even if remote
> >>>> triggers might change those data.  So, I fixed this as well, by
> >>>> setting
> >>>> the RT index accordingly to the partitioned table.
> >>>
> >>> Check.
> >>
> >> I'm not sure but is it true that the parent is always at
> >> resultRelation - 1?
> >
> > Unless I'm missing something, EState.es_range_table contains *all*
> > range
> > table entries that were created by the planner.  So, if the planner
> > assigned resultRelation as the RT index for the parent, then it seems
> > we
> > can rely on getting the same entry back with
> > list_nth_cell(estate->es_range_table, resultRelation - 1).  That seems
> > true even if the parent wasn't the *primary* target relation.  For
> > example, the following works with the patch:
> >
> > -- in addition to the tables shown in Fujita-san's email, define one more
> > -- local partition
> > create table locp2 partition of itrtest for values in (2);
> >
> > -- simple case
> > insert into itrtest values (1, 'foo') returning *;
> >   a |        b
> > ---+-----------------
> >   1 | foo triggered !
> > (1 row)
> >
> > -- itrtest (the parent) appears as both the primary target relation and
> > -- otherwise.  The latter in the form of being mentioned in the with
> > -- query
> >
> > with ins (a, b) as (insert into itrtest values (1, 'foo') returning *)
> > insert into itrtest select a + 1, b from ins returning *;
> >   a |        b
> > ---+-----------------
> >   2 | foo triggered !
> > (1 row)
> 
> I agree with that.  Thanks for the explanation and the example, Amit!
> 
> Maybe my explanation (or the naming parent_rte/child_rte in the patch)
> was not necessarily good, so I guess that that confused
> horiguchi-san. Let me explain.  In the INSERT/COPY-tuple-routing case,
> as explained by Amit, the RTE at that position in the EState's range
> table is the one for the partitioned table of a given partition, so
> the statement would be true.  BUT in the UPDATE-tuple-routing case,
> the RTE is the one for the given partition, not for the partitioned
> table, so the statement would not be true.  In the latter case, we
> don't need to create a child RTE and replace the original RTE with it,
> but I handled both cases the same way for simplicity.

Thank you, your understanding of my concern is right. Thanks for
the explanation.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/19 16:43), Amit Langote wrote:
> On 2018/04/18 21:55, Etsuro Fujita wrote:
>> (2018/04/18 14:44), Amit Langote wrote:
>>> That the resultRelInfo
>>> received by BeginForeignInsert (when called by ExecInitRoutingInfo) could
>>> be a reused UPDATE result relation.

>>> 2. This is UPDATE and the resultRelInfo that's received in
>>>      BeginForeignInsert has been freshly created in ExecInitPartitionInfo
>>>      and it bears node->nominalRelation or 1 as its ri_RangeTableIndex

>>> In all three cases, I think we can rely on using ri_RangeTableIndex to
>>> fetch a valid "parent" RTE from es_range_table.
>>
>> I slept on this, I noticed there is another bug in case #2.

>> In case #2, since we initialize expressions for the partition's
>> ResultRelInfo including RETURNING by translating the attnos of the
>> corresponding expressions in the ResultRelInfo for the first subplan
>> target rel, I think we should replace the RTE for the first subplan target
>> rel, not the RTE for the nominal rel, with the new one created for the
>> foreign table.
>
> Ah, so in case #2, we use firstVarno (per ExecInitPartitionInfo terms) as
> varno throughout.  So, we'd like to put the new RTE in that slot.
>
> Would it be a good idea to explain *why* we need to replace the RTE in the
> first place?  Afaics, it's for deparseColumnRef() to find the correct
> relation when it uses planner_rt_fetch() to get the RTE.

That might be a good idea, but one thing I'm concerned about is that 
since we might use the RTE in different ways in future developments, 
such a comment might be obsolete rather sooner.  So, I just added *for 
use by deparseInsertSql() and create_foreign_modify() below* to the 
comments shown below.  But I'd like to leave this to the committer.

>> Attached is an updated version for fixing this issue.
>>
>>> Do you think we need to clarify this in a comment?
>>
>> Yeah, but I updated comments about this a little bit different way in the
>> attached.  Does that make sense?
>
> It looks generally good, although in the following:
>
> +    /*
> +     * If the foreign table is a partition, temporarily replace the parent's
> +     * RTE in the range table with a new target RTE describing the foreign
> +     * table for use by deparseInsertSql() and create_foreign_modify() below.
> +     */
>
> .. it could be mentioned that we don't switch either the RTE or the value
> assigned to resultRelation if the RTE currently at resultRelation RT index
> is the one created by the planner and refers to the same relation that
> resultRelInfo does.

Done.

> Beside that, I noticed that the patch has a stray white-space at the end
> of the following line:
>
> +                /* partition that isn't a subplan target rel */

Fixed.

Thanks for the review!  Attached is a new version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/19 19:03), Kyotaro HORIGUCHI wrote:
> At Tue, 17 Apr 2018 16:41:31 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5AD5A52B.7050206@lab.ntt.co.jp>
>> (2018/04/17 16:10), Amit Langote wrote:
>>> On 2018/04/17 11:13, Kyotaro HORIGUCHI wrote:
>>>> If I'm reading this correctly, ExecInitParititionInfo calls
>>>> ExecInitRoutingInfo so currently CheckValidityResultRel is called
>>>> for the child when partrel is created in ExecPrepareTupleRouting.
>>>> But the move of CheckValidResultRel results in letting partrel
>>>> just created in ExecPrepareTupleRouting not be checked at all
>>>> since ri_ParititionReadyForRouting is always set true in
>>>> ExecInitPartitionInfo.
>>>
>>> I thought the same upon reading the email, but it seems that the patch
>>> does add CheckValidResultRel() in ExecInitPartitionInfo() as well:
>>>
>>> @@ -335,6 +335,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
>>>                          estate->es_instrument);
>>>
>>>        /*
>>> + * Verify result relation is a valid target for an INSERT.  An UPDATE
>>> of a
>>> + * partition-key becomes a DELETE+INSERT operation, so this check is
>>> still
>>> +     * required when the operation is CMD_UPDATE.
>>> +     */
>>> +    CheckValidResultRel(leaf_part_rri, CMD_INSERT);
>>> +
>>> +    /*
>>>         * Since we've just initialized this ResultRelInfo, it's not in any
>>>         * list
>>>         * attached to the estate as yet.  Add it, so that it can be found
>>>         * later.
>>>         *
>>
>> That's right.  So, the validity check would be applied to a partition
>> created in ExecPrepareTupleRouting as well.
>
> Yes, that's actually true but it seems quite bug-prone or at
> least hard to understand. I understand that the
> CheckValidResultRel(INSERT) is just a prerequisite for
> ExecInitRoutingInfo but the relationship is obfucated after this
> patch. If we have a strong reason to error-out as fast as
> possible, the original code seems better to me..

Actually, I think that change would make the initialization for a 
partition more consistent with that for a main target rel in 
ExecInitModifyTable, where we first perform the CheckValidResultRel 
check against a target rel, and if valid, then open indices, initializes 
the FDW, initialize expressions such as WITH CHECK OPTION and RETURNING, 
and so on.  That's reasonable, and I like consistency, so I'd vote for 
that change.

Thanks for the review!

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
Fujita-san,

Thanks for the updated patch.

On 2018/04/19 21:42, Etsuro Fujita wrote:
> (2018/04/19 16:43), Amit Langote wrote:
>> Would it be a good idea to explain *why* we need to replace the RTE in the
>> first place?  Afaics, it's for deparseColumnRef() to find the correct
>> relation when it uses planner_rt_fetch() to get the RTE.
> 
> That might be a good idea, but one thing I'm concerned about is that since
> we might use the RTE in different ways in future developments, such a
> comment might be obsolete rather sooner.  So, I just added *for use by
> deparseInsertSql() and create_foreign_modify() below* to the comments
> shown below.  But I'd like to leave this to the committer.

OK, fine by me.

>> It looks generally good, although in the following:
>>
>> +    /*
>> +     * If the foreign table is a partition, temporarily replace the
>> parent's
>> +     * RTE in the range table with a new target RTE describing the foreign
>> +     * table for use by deparseInsertSql() and create_foreign_modify()
>> below.
>> +     */
>>
>> .. it could be mentioned that we don't switch either the RTE or the value
>> assigned to resultRelation if the RTE currently at resultRelation RT index
>> is the one created by the planner and refers to the same relation that
>> resultRelInfo does.
> 
> Done.
> 
>> Beside that, I noticed that the patch has a stray white-space at the end
>> of the following line:
>>
>> +                /* partition that isn't a subplan target rel */
> 
> Fixed.
> 
> Thanks for the review!  Attached is a new version of the patch.

Looks good.

Thanks,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/20 9:48), Amit Langote wrote:
> On 2018/04/19 21:42, Etsuro Fujita wrote:
>> (2018/04/19 16:43), Amit Langote wrote:
>>> Would it be a good idea to explain *why* we need to replace the RTE in the
>>> first place?  Afaics, it's for deparseColumnRef() to find the correct
>>> relation when it uses planner_rt_fetch() to get the RTE.
>>
>> That might be a good idea, but one thing I'm concerned about is that since
>> we might use the RTE in different ways in future developments, such a
>> comment might be obsolete rather sooner.  So, I just added *for use by
>> deparseInsertSql() and create_foreign_modify() below* to the comments
>> shown below.  But I'd like to leave this to the committer.
>
> OK, fine by me.
>
>>> It looks generally good, although in the following:
>>>
>>> +    /*
>>> +     * If the foreign table is a partition, temporarily replace the
>>> parent's
>>> +     * RTE in the range table with a new target RTE describing the foreign
>>> +     * table for use by deparseInsertSql() and create_foreign_modify()
>>> below.
>>> +     */
>>>
>>> .. it could be mentioned that we don't switch either the RTE or the value
>>> assigned to resultRelation if the RTE currently at resultRelation RT index
>>> is the one created by the planner and refers to the same relation that
>>> resultRelInfo does.
>>
>> Done.
>>
>>> Beside that, I noticed that the patch has a stray white-space at the end
>>> of the following line:
>>>
>>> +                /* partition that isn't a subplan target rel */
>>
>> Fixed.
>>
>> Thanks for the review!  Attached is a new version of the patch.
>
> Looks good.

Thank you!

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Alvaro Herrera
Date:
Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
into it?

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Oddity in tuple routing for foreign partitions

From
Robert Haas
Date:
On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
> into it?

Thanks for the ping.  I just had a look over the proposed patch and I
guess I don't like it very much.  Temporarily modifying the range
table in place and then changing it back before we return seems ugly
and error-prone.  I hope we can come up with a solution that doesn't
involve needing to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Oddity in tuple routing for foreign partitions

From
Robert Haas
Date:
On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:
>> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
>> into it?
>
> Thanks for the ping.  I just had a look over the proposed patch and I
> guess I don't like it very much.  Temporarily modifying the range
> table in place and then changing it back before we return seems ugly
> and error-prone.  I hope we can come up with a solution that doesn't
> involve needing to do that.

I have done some refactoring to avoid that.  See attached.

I didn't really get beyond the refactoring stage with this today.
This version still seems to work, but I don't really understand the
logic in postgresBeginForeignInsert which decides whether to use the
RTE from the range table or create our own.  We seem to need to do one
sometimes and the other sometimes, but I don't know why that is,
really.  Nor do I understand why we need the other changes in the
patch.  There's probably a good reason, but I haven't figured it out
yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Kyotaro HORIGUCHI
Date:
Hello.

At Tue, 24 Apr 2018 15:49:20 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+Tgmob+CiHt+9JEUiXzNVmUcUf1zBb-bUeffVmbZWJHV0YVtw@mail.gmail.com>
> On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
> > <alvherre@alvh.no-ip.org> wrote:
> >> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
> >> into it?
> >
> > Thanks for the ping.  I just had a look over the proposed patch and I
> > guess I don't like it very much.  Temporarily modifying the range
> > table in place and then changing it back before we return seems ugly
> > and error-prone.  I hope we can come up with a solution that doesn't
> > involve needing to do that.
> 
> I have done some refactoring to avoid that.  See attached.
> 
> I didn't really get beyond the refactoring stage with this today.
> This version still seems to work, but I don't really understand the
> logic in postgresBeginForeignInsert which decides whether to use the
> RTE from the range table or create our own.  We seem to need to do one
> sometimes and the other sometimes, but I don't know why that is,
> really.  Nor do I understand why we need the other changes in the
> patch.  There's probably a good reason, but I haven't figured it out
> yet.

FWIW, the refactoring drops the condition that the ForeignInsert
is a part of UPDATE.  The code exists just for avoiding temprary
RTE generation in 2 cases out of the 3 possible cases mentioned
in [1]. If it looks too complex for the gain, we can always
create an RTE for deparsing as Fujita-san's first patch in this
thread did. Anyway the condition for "dostuff" + "is update"
might be a bit too arbitrary.

[1] https://www.postgresql.org/message-id/f970d875-9711-b8cb-f270-965fa3e40334@lab.ntt.co.jp

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/04/25 4:49, Robert Haas wrote:
> On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
>> <alvherre@alvh.no-ip.org> wrote:
>>> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
>>> into it?
>>
>> Thanks for the ping.  I just had a look over the proposed patch and I
>> guess I don't like it very much.  Temporarily modifying the range
>> table in place and then changing it back before we return seems ugly
>> and error-prone.  I hope we can come up with a solution that doesn't
>> involve needing to do that.
> 
> I have done some refactoring to avoid that.  See attached.

+1 for getting rid of the PlannerInfo argument of the many functions in
that code.  I have long wondered if we couldn't rid of it and especially
thought of it when reviewing this patch.

Thanks,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Kyotaro HORIGUCHI
Date:
I forgot to mention that.

At Wed, 25 Apr 2018 10:17:02 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<f894af29-345f-678b-480a-9f6e8cc314bd@lab.ntt.co.jp>
> On 2018/04/25 4:49, Robert Haas wrote:
> > On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
> >> <alvherre@alvh.no-ip.org> wrote:
> >>> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
> >>> into it?
> >>
> >> Thanks for the ping.  I just had a look over the proposed patch and I
> >> guess I don't like it very much.  Temporarily modifying the range
> >> table in place and then changing it back before we return seems ugly
> >> and error-prone.  I hope we can come up with a solution that doesn't
> >> involve needing to do that.
> > 
> > I have done some refactoring to avoid that.  See attached.
> 
> +1 for getting rid of the PlannerInfo argument of the many functions in
> that code.  I have long wondered if we couldn't rid of it and especially
> thought of it when reviewing this patch.

+1 from me. Thanks for making things simpler and easy to
understand. I feel the same as Amit:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
Oops, really meant to send the "+1 to the root -> rte refactoring" comment
and the rest in the same email.

On 2018/04/25 4:49, Robert Haas wrote:
> I have done some refactoring to avoid that.  See attached.
> 
> I didn't really get beyond the refactoring stage with this today.
> This version still seems to work, but I don't really understand the
> logic in postgresBeginForeignInsert which decides whether to use the
> RTE from the range table or create our own.  We seem to need to do one
> sometimes and the other sometimes, but I don't know why that is,
> really.  Nor do I understand why we need the other changes in the
> patch.  There's probably a good reason, but I haven't figured it out
> yet.

So, the main part of the patch that fixes the bug is the following per the
latest v4 patch.

+    if (resultRelInfo->ri_PartitionRoot && plan)
+    {
+        bool dostuff = true;
+
+        if (resultRelation != plan->nominalRelation)
+            dostuff = false;
+        else
+            resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+
+        if (dostuff)
+        {
+            rte = list_nth(estate->es_range_table, resultRelation - 1);
+            rte = copyObject(rte);
+            rte->relid = RelationGetRelid(rel);
+            rte->relkind = RELKIND_FOREIGN_TABLE;
+        }
+    }
+
+    if (rte == NULL)
+    {
+        rte = makeNode(RangeTblEntry);
+        rte->rtekind = RTE_RELATION;
+        rte->relid = RelationGetRelid(rel);
+        rte->relkind = RELKIND_FOREIGN_TABLE;
+    }

After the refactoring, it appears to me that we only need this much:

+    rte = makeNode(RangeTblEntry);
+    rte->rtekind = RTE_RELATION;
+    rte->relid = RelationGetRelid(rel);
+    rte->relkind = RELKIND_FOREIGN_TABLE;

that is, *after* teaching ExecInitPartitionInfo to correctly set the RT
index of the ResultRelInfo it creates.  After the refactoring, the only
thing this patch needs to do is to choose the RT index of the result
relation correctly.  We currently use this:

+    Index       resultRelation = resultRelInfo->ri_RangeTableIndex;

And the rest of the code the patch adds is only to adjust it based on
where the partition ResultRelInfo seems to have originated from.  If the
ri_RangeTableIndex was set correctly to begin with, we wouldn't need to
fiddle with that in the FDW code.  Regular ResultRelInfo's already have it
set correctly, it's only the ones that ExecInitPartitionInfo creates that
seem be creating the problem.

I tried to do that.  So, attached are two patches.

1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
   InitResultRelInfo

2. v5 of the patch to fix the bug of foreign partitions

Thoughts?

Thanks,
Amit

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Kyotaro HORIGUCHI
Date:
At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<573e60cc-beeb-b534-d89a-7622fae35585@lab.ntt.co.jp>
> Oops, really meant to send the "+1 to the root -> rte refactoring" comment
> and the rest in the same email.
> 
> On 2018/04/25 4:49, Robert Haas wrote:
> > I have done some refactoring to avoid that.  See attached.
> > 
> > I didn't really get beyond the refactoring stage with this today.
> > This version still seems to work, but I don't really understand the
> > logic in postgresBeginForeignInsert which decides whether to use the
> > RTE from the range table or create our own.  We seem to need to do one
> > sometimes and the other sometimes, but I don't know why that is,
> > really.  Nor do I understand why we need the other changes in the
> > patch.  There's probably a good reason, but I haven't figured it out
> > yet.
> 
> So, the main part of the patch that fixes the bug is the following per the
> latest v4 patch.
> 
> +    if (resultRelInfo->ri_PartitionRoot && plan)
> +    {
> +        bool dostuff = true;
> +
> +        if (resultRelation != plan->nominalRelation)
> +            dostuff = false;
> +        else
> +            resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
> +
> +        if (dostuff)
> +        {
> +            rte = list_nth(estate->es_range_table, resultRelation - 1);
> +            rte = copyObject(rte);
> +            rte->relid = RelationGetRelid(rel);
> +            rte->relkind = RELKIND_FOREIGN_TABLE;
> +        }
> +    }
> +
> +    if (rte == NULL)
> +    {
> +        rte = makeNode(RangeTblEntry);
> +        rte->rtekind = RTE_RELATION;
> +        rte->relid = RelationGetRelid(rel);
> +        rte->relkind = RELKIND_FOREIGN_TABLE;
> +    }
> 
> After the refactoring, it appears to me that we only need this much:
> 
> +    rte = makeNode(RangeTblEntry);
> +    rte->rtekind = RTE_RELATION;
> +    rte->relid = RelationGetRelid(rel);
> +    rte->relkind = RELKIND_FOREIGN_TABLE;

Mmm.. That is, only relid is required to deparse (I don't mean
that it should be refactored so.). On the other hand
create_foreign_modify requires rte->checkAsUser as well. The
following query (probably) unexpectedly fails with the latest
patch. It succeeds with -3 patch.

=# create user usr1 login;
=# create view v1 as select * from itrtest;
=# revoke all ON itrtest FROM PUBLIC;
=# grant SELECT, INSERT ON v1 to usr1;
=> select * from v1;   -- OK
=> insert into v1 (select n, 'foo' from (values (1), (2)) as t(n)) returning *;
ERROR:  password is required
DETAIL:  Non-superusers must provide a password in the user mapping.

We need to read it of the parent?

> that is, *after* teaching ExecInitPartitionInfo to correctly set the RT
> index of the ResultRelInfo it creates.  After the refactoring, the only
> thing this patch needs to do is to choose the RT index of the result
> relation correctly.  We currently use this:
> 
> +    Index       resultRelation = resultRelInfo->ri_RangeTableIndex;
> 
> And the rest of the code the patch adds is only to adjust it based on
> where the partition ResultRelInfo seems to have originated from.  If the
> ri_RangeTableIndex was set correctly to begin with, we wouldn't need to
> fiddle with that in the FDW code.  Regular ResultRelInfo's already have it
> set correctly, it's only the ones that ExecInitPartitionInfo creates that
> seem be creating the problem.
> 
> I tried to do that.  So, attached are two patches.
> 
> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>    InitResultRelInfo
> 
> 2. v5 of the patch to fix the bug of foreign partitions
> 
> Thoughts?

Maybe, one reason that I feel uneasy is how the patch accesses
desired resultRelInfo.

+    Index    firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;

Looking ExecInitModifyTable, just one resultRelInfo has been
passed to BeginForeignModify so it should not access as an
array. I will feel at easy if the line were in the following
shape. Does it make sense?

+    Index    firstVarno = mtstate->resultRelInfo->ri_RangeTableIndex;


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
Horiguchi-san,

Thanks for taking a look at it.

On 2018/04/25 16:42, Kyotaro HORIGUCHI wrote:
> At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote wrote:
>> After the refactoring, it appears to me that we only need this much:
>>
>> +    rte = makeNode(RangeTblEntry);
>> +    rte->rtekind = RTE_RELATION;
>> +    rte->relid = RelationGetRelid(rel);
>> +    rte->relkind = RELKIND_FOREIGN_TABLE;
> 
> Mmm.. That is, only relid is required to deparse (I don't mean
> that it should be refactored so.). On the other hand
> create_foreign_modify requires rte->checkAsUser as well. The
> following query (probably) unexpectedly fails with the latest
> patch. It succeeds with -3 patch.
> 
> =# create user usr1 login;
> =# create view v1 as select * from itrtest;
> =# revoke all ON itrtest FROM PUBLIC;
> =# grant SELECT, INSERT ON v1 to usr1;
> => select * from v1;   -- OK
> => insert into v1 (select n, 'foo' from (values (1), (2)) as t(n)) returning *;
> ERROR:  password is required
> DETAIL:  Non-superusers must provide a password in the user mapping.
> 
> We need to read it of the parent?

Hmm, I missed that we do need information from a proper RTE as well.  So,
I suppose we should be doing this instead of creating the RTE for foreign
partition from scratch.

+    rte = list_nth(estate->es_range_table, resultRelation - 1);
+    rte = copyObject(rte);
+    rte->relid = RelationGetRelid(rel);
+    rte->relkind = RELKIND_FOREIGN_TABLE;

If we apply the other patch I proposed, resultRelation always points to
the correct RTE.

>> I tried to do that.  So, attached are two patches.
>>
>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>    InitResultRelInfo
>>
>> 2. v5 of the patch to fix the bug of foreign partitions
>>
>> Thoughts?
> 
> Maybe, one reason that I feel uneasy is how the patch accesses
> desired resultRelInfo.
> 
> +    Index    firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
> 
> Looking ExecInitModifyTable, just one resultRelInfo has been
> passed to BeginForeignModify so it should not access as an
> array. I will feel at easy if the line were in the following
> shape. Does it make sense?
> 
> +    Index    firstVarno = mtstate->resultRelInfo->ri_RangeTableIndex;

This is the comment on
teach-ExecInitPartitionInfo-to-use-correct-RT-index.patch, right?  I
haven't seen either ExecInitModifyTable or BeginForeignModify being
involved in this discussion, but I see your point.  I see no problem with
doing it that way, I have updated that patch to do it that way.  Also,
changed the line above it that is unrelated to this patch just to be
consistent.

Attached updated patches:

1. teach-ExecInitPartitionInfo-to-use-correct-RT-index-2.patch
2. fix-tuple-routing-for-foreign-partitions-6.patch

Thanks,
Amit

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/25 4:49), Robert Haas wrote:
> On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
>> <alvherre@alvh.no-ip.org>  wrote:
>>> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
>>> into it?
>>
>> Thanks for the ping.  I just had a look over the proposed patch and I
>> guess I don't like it very much.  Temporarily modifying the range
>> table in place and then changing it back before we return seems ugly
>> and error-prone.  I hope we can come up with a solution that doesn't
>> involve needing to do that.

I see your concern.

> I have done some refactoring to avoid that.  See attached.

I like this refactoring.

> I didn't really get beyond the refactoring stage with this today.
> This version still seems to work, but I don't really understand the
> logic in postgresBeginForeignInsert which decides whether to use the
> RTE from the range table or create our own.  We seem to need to do one
> sometimes and the other sometimes, but I don't know why that is,
> really.  Nor do I understand why we need the other changes in the
> patch.  There's probably a good reason, but I haven't figured it out
> yet.

To reduce the cost of creating RTEs, the original patch uses the RTE in 
the range table if the partition is an UPDATE subplan partition.  One 
thing I'm concerned about is this change to postgresBeginForeignInsert:

+    if (resultRelInfo->ri_PartitionRoot && plan)
+    {
+        bool dostuff = true;
+
+        if (resultRelation != plan->nominalRelation)
+            dostuff = false;
+        else
+            resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+
+        if (dostuff)
+        {
+            rte = list_nth(estate->es_range_table, resultRelation - 1);
+            rte = copyObject(rte);
+            rte->relid = RelationGetRelid(rel);
+            rte->relkind = RELKIND_FOREIGN_TABLE;
+        }
+    }
+
+    if (rte == NULL)
+    {
+        rte = makeNode(RangeTblEntry);
+        rte->rtekind = RTE_RELATION;
+        rte->relid = RelationGetRelid(rel);
+        rte->relkind = RELKIND_FOREIGN_TABLE;
+    }

IIUC, I think this creates a new RTE for an UPDATE subplan partition by 
makeNode, but I'm not sure if that would work well because we decide 
which user to do the remote access as by looking at rte->checkAsUser, in 
create_foreign_modify.

Thanks for working on this!

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Robert Haas
Date:
On Tue, Apr 24, 2018 at 10:19 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> I tried to do that.  So, attached are two patches.
>
> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>    InitResultRelInfo
>
> 2. v5 of the patch to fix the bug of foreign partitions
>
> Thoughts?

Are you splitting this into 2 patches here because there are 2
separate issues?  If so, what are those issues?  There seem to be a
bunch of interrelated changes here but I haven't seen a clear
explanation of which ones are needed for which reasons.

I agree that fixing ExecInitPartitionInfo to use the right RT index in
the first place sounds like a better idea than trying to piece
together which RT index we should be using after-the-fact, but I need
to better understand what problems we're trying to fix here before I
can be sure if that's the strategy I want to endorse...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/25 17:51), Etsuro Fujita wrote:
> (2018/04/25 4:49), Robert Haas wrote:
>> I didn't really get beyond the refactoring stage with this today.
>> This version still seems to work, but I don't really understand the
>> logic in postgresBeginForeignInsert which decides whether to use the
>> RTE from the range table or create our own. We seem to need to do one
>> sometimes and the other sometimes, but I don't know why that is,
>> really. Nor do I understand why we need the other changes in the
>> patch. There's probably a good reason, but I haven't figured it out
>> yet.
>
> To reduce the cost of creating RTEs, the original patch uses the RTE in
> the range table if the partition is an UPDATE subplan partition.

One thing to add: in addition to that case, the original patch uses the 
RTE in the range table if the foreign table is the target specified in a 
COPY command.

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/25 17:29), Amit Langote wrote:
> On 2018/04/25 16:42, Kyotaro HORIGUCHI wrote:
>> At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote wrote:
>>> After the refactoring, it appears to me that we only need this much:
>>>
>>> +    rte = makeNode(RangeTblEntry);
>>> +    rte->rtekind = RTE_RELATION;
>>> +    rte->relid = RelationGetRelid(rel);
>>> +    rte->relkind = RELKIND_FOREIGN_TABLE;
>>
>> Mmm.. That is, only relid is required to deparse (I don't mean
>> that it should be refactored so.). On the other hand
>> create_foreign_modify requires rte->checkAsUser as well.

That's right.  I took care of this in my version, but unfortuneately, 
that was ignored in the updated versions.  Maybe the comments I added to 
the patch were not enough, though.

> Hmm, I missed that we do need information from a proper RTE as well.  So,
> I suppose we should be doing this instead of creating the RTE for foreign
> partition from scratch.
>
> +    rte = list_nth(estate->es_range_table, resultRelation - 1);
> +    rte = copyObject(rte);
> +    rte->relid = RelationGetRelid(rel);
> +    rte->relkind = RELKIND_FOREIGN_TABLE;

As I said upthread, we can use the RTE in the range table as-is if the 
foreign table is one of the UPDATE subplan partitions or the target 
specified in a COPY command.  So, I created the patch that way because 
that would save cycles.  Why not just use that RTE in those cases?

> If we apply the other patch I proposed, resultRelation always points to
> the correct RTE.
>
>>> I tried to do that.  So, attached are two patches.
>>>
>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>>     InitResultRelInfo
>>>
>>> 2. v5 of the patch to fix the bug of foreign partitions
>>>
>>> Thoughts?

Actually, I also thought the former when creating the patch, but I left 
that as-is because I'm not sure that would be really safe; 
ExecConstraints looks at the RT index via 
GetInsertedColumns/GetUpdatedColumns, so what I thought was: that might 
cause unexpected behavior.  Anyway, I think that the former is more like 
an improvement rather than a fix, so it would be better to leave that 
for another patch for PG12?

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/26 2:59), Robert Haas wrote:
> On Tue, Apr 24, 2018 at 10:19 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp>  wrote:
>> I tried to do that.  So, attached are two patches.
>>
>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>     InitResultRelInfo
>>
>> 2. v5 of the patch to fix the bug of foreign partitions
>>
>> Thoughts?
>
> Are you splitting this into 2 patches here because there are 2
> separate issues?  If so, what are those issues?  There seem to be a
> bunch of interrelated changes here but I haven't seen a clear
> explanation of which ones are needed for which reasons.

As I said in an earlier email, I'm wondering if it would be better to 
leave that for another patch for PG12.

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/26 12:43), Etsuro Fujita wrote:
> (2018/04/25 17:29), Amit Langote wrote:
>> On 2018/04/25 16:42, Kyotaro HORIGUCHI wrote:
>>> At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote wrote:
>>>> After the refactoring, it appears to me that we only need this much:
>>>>
>>>> + rte = makeNode(RangeTblEntry);
>>>> + rte->rtekind = RTE_RELATION;
>>>> + rte->relid = RelationGetRelid(rel);
>>>> + rte->relkind = RELKIND_FOREIGN_TABLE;
>>>
>>> Mmm.. That is, only relid is required to deparse (I don't mean
>>> that it should be refactored so.). On the other hand
>>> create_foreign_modify requires rte->checkAsUser as well.
>
> That's right. I took care of this in my version, but unfortuneately,
> that was ignored in the updated versions. Maybe the comments I added to
> the patch were not enough, though.
>
>> Hmm, I missed that we do need information from a proper RTE as well. So,
>> I suppose we should be doing this instead of creating the RTE for foreign
>> partition from scratch.
>>
>> + rte = list_nth(estate->es_range_table, resultRelation - 1);
>> + rte = copyObject(rte);
>> + rte->relid = RelationGetRelid(rel);
>> + rte->relkind = RELKIND_FOREIGN_TABLE;
>
> As I said upthread, we can use the RTE in the range table as-is if the
> foreign table is one of the UPDATE subplan partitions or the target
> specified in a COPY command. So, I created the patch that way because
> that would save cycles. Why not just use that RTE in those cases?
>
>> If we apply the other patch I proposed, resultRelation always points to
>> the correct RTE.
>>
>>>> I tried to do that. So, attached are two patches.
>>>>
>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>>> InitResultRelInfo
>>>>
>>>> 2. v5 of the patch to fix the bug of foreign partitions
>>>>
>>>> Thoughts?
>
> Actually, I also thought the former when creating the patch, but I left
> that as-is because I'm not sure that would be really safe;
> ExecConstraints looks at the RT index via
> GetInsertedColumns/GetUpdatedColumns, so what I thought was: that might
> cause unexpected behavior. Anyway, I think that the former is more like
> an improvement rather than a fix, so it would be better to leave that
> for another patch for PG12?

Here is a new version I'd like to propose for fixing this issue without 
the former patch.

Thanks for working on this!

Best regards,
Etsuro Fujita

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/04/26 12:43, Etsuro Fujita wrote:
> (2018/04/25 17:29), Amit Langote wrote:
>> Hmm, I missed that we do need information from a proper RTE as well.  So,
>> I suppose we should be doing this instead of creating the RTE for foreign
>> partition from scratch.
>>
>> +    rte = list_nth(estate->es_range_table, resultRelation - 1);
>> +    rte = copyObject(rte);
>> +    rte->relid = RelationGetRelid(rel);
>> +    rte->relkind = RELKIND_FOREIGN_TABLE;
> 
> As I said upthread, we can use the RTE in the range table as-is if the
> foreign table is one of the UPDATE subplan partitions or the target
> specified in a COPY command.  So, I created the patch that way because
> that would save cycles.  Why not just use that RTE in those cases?

Yes, I agree it's better to reuse the RTE in the cases we can.  So, how
does this look:

+    /*
+     * If the foreign table is a partition, we need to create a new RTE
+     * describing the foreign table for use by deparseInsertSql and
+     * create_foreign_modify() below, after first copying the parent's
+     * RTE and modifying some fields to describe the foreign partition to
+     * work on. However, if this is invoked by UPDATE, the existing RTE
+     * may already correspond to this partition if it is one of the
+     * UPDATE subplan target rels; in that case, we can just use the
+     * existing RTE as-is.
+     */
+    rte = list_nth(estate->es_range_table, resultRelation - 1);
+    if (rte->relid != RelationGetRelid(rel))
+    {
+        rte = copyObject(rte);
+        rte->relid = RelationGetRelid(rel);
+        rte->relkind = RELKIND_FOREIGN_TABLE;
+
+        /*
+         * For UPDATE, we must use the RT index of the first subplan
+         * target rel's RTE, because the core code would have built
+         * expressions for the partition, such as RETURNING, using that
+         * RT index as varno of Vars contained in those expressions.
+         */
+        if (plan && plan->operation == CMD_UPDATE &&
+            resultRelation == plan->nominalRelation)
+            resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+    }

I added the block inside the if because I agree with your comment below
that the changes to ExecInitPartitionInfo are better kept for later to
think more carefully about the dependencies it might break.

>> If we apply the other patch I proposed, resultRelation always points to
>> the correct RTE.
>>
>>>> I tried to do that.  So, attached are two patches.
>>>>
>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>>>     InitResultRelInfo
>>>>
>>>> 2. v5 of the patch to fix the bug of foreign partitions
>>>>
>>>> Thoughts?
> 
> Actually, I also thought the former when creating the patch, but I left
> that as-is because I'm not sure that would be really safe; ExecConstraints
> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so what I
> thought was: that might cause unexpected behavior.

OK, I have to agree here that we better leave 1 to be looked at later.

After this change, GetInsertedColumns/GetUpdatedColumns will start
returning a different set of columns in some cases than it does now.
Currently, it *always* returns a set containing root table's attribute
numbers, even for UPDATE.  But with this change, for UPDATE, it will start
returning the set containing the first subplan target rel's attribute
numbers, which could be any partition with arbitrarily different attribute
numbers.

> Anyway, I think that
> the former is more like an improvement rather than a fix, so it would be
> better to leave that for another patch for PG12?

I agree, so I'm dropping the patch for 1.

See attached an updated version with changes as described above.

Thanks,
Amit

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/04/26 15:27, Etsuro Fujita wrote:
> (2018/04/26 12:43), Etsuro Fujita wrote:
>> (2018/04/25 17:29), Amit Langote wrote:
>>> On 2018/04/25 16:42, Kyotaro HORIGUCHI wrote:
>>>> At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote wrote:
>>>>> After the refactoring, it appears to me that we only need this much:
>>>>>
>>>>> + rte = makeNode(RangeTblEntry);
>>>>> + rte->rtekind = RTE_RELATION;
>>>>> + rte->relid = RelationGetRelid(rel);
>>>>> + rte->relkind = RELKIND_FOREIGN_TABLE;
>>>>
>>>> Mmm.. That is, only relid is required to deparse (I don't mean
>>>> that it should be refactored so.). On the other hand
>>>> create_foreign_modify requires rte->checkAsUser as well.
>>
>> That's right. I took care of this in my version, but unfortuneately,
>> that was ignored in the updated versions. Maybe the comments I added to
>> the patch were not enough, though.
>>
>>> Hmm, I missed that we do need information from a proper RTE as well. So,
>>> I suppose we should be doing this instead of creating the RTE for foreign
>>> partition from scratch.
>>>
>>> + rte = list_nth(estate->es_range_table, resultRelation - 1);
>>> + rte = copyObject(rte);
>>> + rte->relid = RelationGetRelid(rel);
>>> + rte->relkind = RELKIND_FOREIGN_TABLE;
>>
>> As I said upthread, we can use the RTE in the range table as-is if the
>> foreign table is one of the UPDATE subplan partitions or the target
>> specified in a COPY command. So, I created the patch that way because
>> that would save cycles. Why not just use that RTE in those cases?
>>
>>> If we apply the other patch I proposed, resultRelation always points to
>>> the correct RTE.
>>>
>>>>> I tried to do that. So, attached are two patches.
>>>>>
>>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>>>> InitResultRelInfo
>>>>>
>>>>> 2. v5 of the patch to fix the bug of foreign partitions
>>>>>
>>>>> Thoughts?
>>
>> Actually, I also thought the former when creating the patch, but I left
>> that as-is because I'm not sure that would be really safe;
>> ExecConstraints looks at the RT index via
>> GetInsertedColumns/GetUpdatedColumns, so what I thought was: that might
>> cause unexpected behavior. Anyway, I think that the former is more like
>> an improvement rather than a fix, so it would be better to leave that
>> for another patch for PG12?
> 
> Here is a new version I'd like to propose for fixing this issue without
> the former patch.
> 
> Thanks for working on this!

Sorry, didn't notice this before sending my patch, which I also marked v7.

It's a bit different than yours and has comments with excerpts from your
earlier versions.  See if you find it helpful.

Thanks,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/26 15:39), Amit Langote wrote:
> On 2018/04/26 15:27, Etsuro Fujita wrote:
>> Here is a new version I'd like to propose for fixing this issue without
>> the former patch.

> Sorry, didn't notice this before sending my patch, which I also marked v7.
>
> It's a bit different than yours and has comments with excerpts from your
> earlier versions.  See if you find it helpful.

OK, will review your version.

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/04/26 2:59, Robert Haas wrote:
> On Tue, Apr 24, 2018 at 10:19 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> I tried to do that.  So, attached are two patches.
>>
>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>    InitResultRelInfo
>>
>> 2. v5 of the patch to fix the bug of foreign partitions
>>
>> Thoughts?
> 
> Are you splitting this into 2 patches here because there are 2
> separate issues?  If so, what are those issues?  There seem to be a
> bunch of interrelated changes here but I haven't seen a clear
> explanation of which ones are needed for which reasons.
> 
> I agree that fixing ExecInitPartitionInfo to use the right RT index in
> the first place sounds like a better idea than trying to piece
> together which RT index we should be using after-the-fact, but I need
> to better understand what problems we're trying to fix here before I
> can be sure if that's the strategy I want to endorse...

After considering Fujita-san's comment downthread, I think it might be
better to revisit 1 later, because it could break a certain things which
rely on ri_RangeTableIndex being set to the index of the root partitioned
table's RT entry.  Fujita-san pointed out GetInsertedColumns et al that
return a set of attribute numbers after looking up the RT entry using
ri_RangeTableIndex of a given ResultRelInfo.  They would start returning a
different set for partitions than previously in some cases due to this
change.  Maybe, there are other things that rely on what a partition's
ri_RangeTableEntry has been set to.

For now, I think we'll have to stick with a solution to determine which RT
index to pass to deparseInsertSql that involves a bit of
reverse-engineering, whereby we base that on where the partition's
ResultRelInfo seems to have originated from.  If it looks like it was
created by ExecInitModifyTable because the partition is one of the subplan
target rels, then use the RTE and the RT index as is.  If not,
ExecInitPartitionInfo would have created it because the partition was
chosen as the tuple routing target.  In that case, we have to freshly
create an RTE for the partition after copying most of the fields from the
parent's RTE.  We're done if the partition was selected as routing target
for an INSERT.  If for UPDATE, we also have to change the RT index to pass
to deparseInsertSql to that of the first subplan's target rel, because
that's what we use as varno of expressions contained in the partition's
ResultRelInfo.  Phew!

Thanks,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/26 15:35), Amit Langote wrote:
> On 2018/04/26 12:43, Etsuro Fujita wrote:
>> (2018/04/25 17:29), Amit Langote wrote:
>>> Hmm, I missed that we do need information from a proper RTE as well.  So,
>>> I suppose we should be doing this instead of creating the RTE for foreign
>>> partition from scratch.
>>>
>>> +    rte = list_nth(estate->es_range_table, resultRelation - 1);
>>> +    rte = copyObject(rte);
>>> +    rte->relid = RelationGetRelid(rel);
>>> +    rte->relkind = RELKIND_FOREIGN_TABLE;
>>
>> As I said upthread, we can use the RTE in the range table as-is if the
>> foreign table is one of the UPDATE subplan partitions or the target
>> specified in a COPY command.  So, I created the patch that way because
>> that would save cycles.  Why not just use that RTE in those cases?
>
> Yes, I agree it's better to reuse the RTE in the cases we can.  So, how
> does this look:
>
> +    /*
> +     * If the foreign table is a partition, we need to create a new RTE
> +     * describing the foreign table for use by deparseInsertSql and
> +     * create_foreign_modify() below, after first copying the parent's
> +     * RTE and modifying some fields to describe the foreign partition to
> +     * work on. However, if this is invoked by UPDATE, the existing RTE
> +     * may already correspond to this partition if it is one of the
> +     * UPDATE subplan target rels; in that case, we can just use the
> +     * existing RTE as-is.
> +     */
> +    rte = list_nth(estate->es_range_table, resultRelation - 1);
> +    if (rte->relid != RelationGetRelid(rel))
> +    {
> +        rte = copyObject(rte);
> +        rte->relid = RelationGetRelid(rel);
> +        rte->relkind = RELKIND_FOREIGN_TABLE;
> +
> +        /*
> +         * For UPDATE, we must use the RT index of the first subplan
> +         * target rel's RTE, because the core code would have built
> +         * expressions for the partition, such as RETURNING, using that
> +         * RT index as varno of Vars contained in those expressions.
> +         */
> +        if (plan&&  plan->operation == CMD_UPDATE&&
> +            resultRelation == plan->nominalRelation)
> +            resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
> +    }

Seems like a better change than mine; because this simplifies the logic.

> I added the block inside the if because I agree with your comment below
> that the changes to ExecInitPartitionInfo are better kept for later to
> think more carefully about the dependencies it might break.

OK

>>> If we apply the other patch I proposed, resultRelation always points to
>>> the correct RTE.
>>>
>>>>> I tried to do that.  So, attached are two patches.
>>>>>
>>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>>>>      InitResultRelInfo
>>>>>
>>>>> 2. v5 of the patch to fix the bug of foreign partitions
>>>>>
>>>>> Thoughts?
>>
>> Actually, I also thought the former when creating the patch, but I left
>> that as-is because I'm not sure that would be really safe; ExecConstraints
>> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so what I
>> thought was: that might cause unexpected behavior.
>
> OK, I have to agree here that we better leave 1 to be looked at later.
>
> After this change, GetInsertedColumns/GetUpdatedColumns will start
> returning a different set of columns in some cases than it does now.
> Currently, it *always* returns a set containing root table's attribute
> numbers, even for UPDATE.  But with this change, for UPDATE, it will start
> returning the set containing the first subplan target rel's attribute
> numbers, which could be any partition with arbitrarily different attribute
> numbers.
>
>> Anyway, I think that
>> the former is more like an improvement rather than a fix, so it would be
>> better to leave that for another patch for PG12?
>
> I agree, so I'm dropping the patch for 1.

OK, let's focus on #2!

> See attached an updated version with changes as described above.

Looks good to me.  Thanks for the updated version!

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Kyotaro HORIGUCHI
Date:
It seems almost fine for me, but just one point..

At Thu, 26 Apr 2018 17:36:44 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5AE18F9C.6080805@lab.ntt.co.jp>
> (2018/04/26 15:35), Amit Langote wrote:
> > On 2018/04/26 12:43, Etsuro Fujita wrote:
> > +            resultRelation == plan->nominalRelation)
> > + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
> > +    }
> 
> Seems like a better change than mine; because this simplifies the
> logic.

Please rewrite it to use not array reference, but pointer
reference if one mtstate logically holds just one resultRelInfo.

> > I added the block inside the if because I agree with your comment
> > below
> > that the changes to ExecInitPartitionInfo are better kept for later to
> > think more carefully about the dependencies it might break.
> 
> OK
> 
> >>> If we apply the other patch I proposed, resultRelation always points
> >>> to
> >>> the correct RTE.
> >>>
> >>>>> I tried to do that.  So, attached are two patches.
> >>>>>
> >>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
> >>>>>      InitResultRelInfo
> >>>>>
> >>>>> 2. v5 of the patch to fix the bug of foreign partitions
> >>>>>
> >>>>> Thoughts?
> >>
> >> Actually, I also thought the former when creating the patch, but I
> >> left
> >> that as-is because I'm not sure that would be really safe;
> >> ExecConstraints
> >> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so
> >> what I
> >> thought was: that might cause unexpected behavior.
> >
> > OK, I have to agree here that we better leave 1 to be looked at later.
> >
> > After this change, GetInsertedColumns/GetUpdatedColumns will start
> > returning a different set of columns in some cases than it does now.
> > Currently, it *always* returns a set containing root table's attribute
> > numbers, even for UPDATE.  But with this change, for UPDATE, it will
> > start
> > returning the set containing the first subplan target rel's attribute
> > numbers, which could be any partition with arbitrarily different
> > attribute
> > numbers.
> >
> >> Anyway, I think that
> >> the former is more like an improvement rather than a fix, so it would
> >> be
> >> better to leave that for another patch for PG12?
> >
> > I agree, so I'm dropping the patch for 1.
> 
> OK, let's focus on #2!
> 
> > See attached an updated version with changes as described above.
> 
> Looks good to me.  Thanks for the updated version!

Agreed on all points above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/04/26 20:06), Kyotaro HORIGUCHI wrote:
> It seems almost fine for me, but just one point..
>
> At Thu, 26 Apr 2018 17:36:44 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5AE18F9C.6080805@lab.ntt.co.jp>
>> (2018/04/26 15:35), Amit Langote wrote:
>>> On 2018/04/26 12:43, Etsuro Fujita wrote:
>>> +            resultRelation == plan->nominalRelation)
>>> + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
>>> +    }
>>
>> Seems like a better change than mine; because this simplifies the
>> logic.
>
> Please rewrite it to use not array reference, but pointer
> reference if one mtstate logically holds just one resultRelInfo.

Maybe I don't understand your words correctky, but in that UPDATE case, 
I think that mtstate can have multiple ResultRelInfo.

>>>>>>> So, attached are two patches.
>>>>>>>
>>>>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>>>>>>       InitResultRelInfo
>>>>>>>
>>>>>>> 2. v5 of the patch to fix the bug of foreign partitions
>>>>>>>
>>>>>>> Thoughts?
>>>>
>>>> Actually, I also thought the former when creating the patch, but I
>>>> left
>>>> that as-is because I'm not sure that would be really safe;
>>>> ExecConstraints
>>>> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so
>>>> what I
>>>> thought was: that might cause unexpected behavior.
>>>
>>> OK, I have to agree here that we better leave 1 to be looked at later.
>>>
>>> After this change, GetInsertedColumns/GetUpdatedColumns will start
>>> returning a different set of columns in some cases than it does now.
>>> Currently, it *always* returns a set containing root table's attribute
>>> numbers, even for UPDATE.  But with this change, for UPDATE, it will
>>> start
>>> returning the set containing the first subplan target rel's attribute
>>> numbers, which could be any partition with arbitrarily different
>>> attribute
>>> numbers.
>>>
>>>> Anyway, I think that
>>>> the former is more like an improvement rather than a fix, so it would
>>>> be
>>>> better to leave that for another patch for PG12?
>>>
>>> I agree, so I'm dropping the patch for 1.
>>
>> OK, let's focus on #2!
>>
>>> See attached an updated version with changes as described above.
>>
>> Looks good to me.  Thanks for the updated version!
>
> Agreed on all points above.

Thanks for reviewing!

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Kyotaro HORIGUCHI
Date:
At Thu, 26 Apr 2018 21:16:38 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5AE1C326.6040201@lab.ntt.co.jp>
> (2018/04/26 20:06), Kyotaro HORIGUCHI wrote:
> > Please rewrite it to use not array reference, but pointer
> > reference if one mtstate logically holds just one resultRelInfo.
> 
> Maybe I don't understand your words correctky, but in that UPDATE
> case, I think that mtstate can have multiple ResultRelInfo.

Ah, mtstate has the same number of resultRelInfo with subplans
and it is *written* in the comment just above:( And it is exactly
for the UPDATE case. Sorry for the silly comment.

> >>>> Anyway, I think that
> >>>> the former is more like an improvement rather than a fix, so it would
> >>>> be
> >>>> better to leave that for another patch for PG12?
> >>>
> >>> I agree, so I'm dropping the patch for 1.
> >>
> >> OK, let's focus on #2!
> >>
> >>> See attached an updated version with changes as described above.
> >>
> >> Looks good to me.  Thanks for the updated version!
> >
> > Agreed on all points above.
> 
> Thanks for reviewing!

I'm happy if it helps you.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/04/27 10:01, Kyotaro HORIGUCHI wrote:
> At Thu, 26 Apr 2018 21:16:38 +0900, Etsuro Fujita wrote:
>> (2018/04/26 20:06), Kyotaro HORIGUCHI wrote:
>>> Agreed on all points above.
>>
>> Thanks for reviewing!
> 
> I'm happy if it helps you.

Thank you both for reviewing.

Regards,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Robert Haas
Date:
On Thu, Apr 26, 2018 at 9:23 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> Thanks for reviewing!
>>
>> I'm happy if it helps you.
>
> Thank you both for reviewing.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Oddity in tuple routing for foreign partitions

From
Andres Freund
Date:
On 2018-05-01 13:41:32 -0400, Robert Haas wrote:
> On Thu, Apr 26, 2018 at 9:23 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> >>> Thanks for reviewing!
> >>
> >> I'm happy if it helps you.
> >
> > Thank you both for reviewing.
> 
> Committed.

There's still an open items entry for this thread - has that been
resolved by this commit

Greetings,

Andres Freund


Re: Oddity in tuple routing for foreign partitions

From
Amit Langote
Date:
On 2018/05/02 6:09, Andres Freund wrote:
> On 2018-05-01 13:41:32 -0400, Robert Haas wrote:
>> On Thu, Apr 26, 2018 at 9:23 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>>> Thanks for reviewing!
>>>>
>>>> I'm happy if it helps you.
>>>
>>> Thank you both for reviewing.
>>
>> Committed.

Thank you.

> There's still an open items entry for this thread - has that been
> resolved by this commit

Afaik, yes.  So moved that to resolved items.

Regards,
Amit



Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/05/02 10:10), Amit Langote wrote:
> On 2018/05/02 6:09, Andres Freund wrote:
>> On 2018-05-01 13:41:32 -0400, Robert Haas wrote:
>>> Committed.
>
> Thank you.

Thanks for committing, Robert!

>> There's still an open items entry for this thread - has that been
>> resolved by this commit
>
> Afaik, yes.  So moved that to resolved items.

Yeah, thanks for that, Amit!

Best regards,
Etsuro Fujita


Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/05/02 15:45), Etsuro Fujita wrote:
> (2018/05/02 10:10), Amit Langote wrote:
>> On 2018/05/02 6:09, Andres Freund wrote:
>>> On 2018-05-01 13:41:32 -0400, Robert Haas wrote:
>>>> Committed.

Here is a small patch to remove a no-longer-needed cast in 
postgresBeginForeignInsert().

Best regards,
Etsuro Fujita

Attachment

Re: Oddity in tuple routing for foreign partitions

From
Robert Haas
Date:
On Wed, May 2, 2018 at 7:06 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Here is a small patch to remove a no-longer-needed cast in
> postgresBeginForeignInsert().

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Oddity in tuple routing for foreign partitions

From
Etsuro Fujita
Date:
(2018/05/03 9:29), Robert Haas wrote:
> On Wed, May 2, 2018 at 7:06 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> Here is a small patch to remove a no-longer-needed cast in
>> postgresBeginForeignInsert().
>
> Committed.

Thanks!

Best regards,
Etsuro Fujita