Thread: Oddity in tuple routing for foreign partitions
Hi, While updating the fix-postgres_fdw-WCO-handling patch, I noticed that the patch for tuple routing for foreign partitions doesn't work well with remote triggers. Here is an example: postgres=# create table loct1 (a int check (a in (1)), b text); postgres=# create function br_insert_trigfunc() returns trigger as $$begin new.b := new.b || ' triggered !'; return new; end$$ language plpgsql; postgres=# create trigger loct1_br_insert_trigger before insert on loct1 for each row execute procedure br_insert_trigfunc(); postgres=# create table itrtest (a int, b text) partition by list (a); postgres=# create foreign table remp1 (a int check (a in (1)), b text) server loopback options (table_name 'loct1'); postgres=# alter table itrtest attach partition remp1 for values in (1); This behaves oddly: postgres=# insert into itrtest values (1, 'foo') returning *; a | b ---+----- 1 | foo (1 row) INSERT 0 1 The new value of b in the RETURNING result should be concatenated with ' triggered !', as shown below: postgres=# select tableoid::regclass, * from itrtest ; tableoid | a | b ----------+---+----------------- remp1 | 1 | foo triggered ! (1 row) The reason for that is: since that in ExecInitPartitionInfo, the core calls BeginForeignInsert via ExecInitRoutingInfo before initializing ri_returningList of ResultRelInfo for the partition, BeginForeignInsert sees that the ri_returningList is NULL and incorrectly recognizes that the local query doesn't have a RETURNING list. So, I moved the ExecInitRoutingInfo call after building the ri_returningList (and before handling ON CONFLICT because we reference the conversion map created by that when handling that clause). The first version of the patch called BeginForeignInsert that order, but I updated the patch incorrectly. :( Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo and added that to ExecInitPartitionInfo, right after the InitResultRelInfo call, because it would be better to abort the operation as soon as we find the partition to be non-routable. Another thing I noticed is the RT index of the foreign partition set to 1 in postgresBeginForeignInsert to create a remote query; that would not work for cases where the partitioned table has an RT index larger than 1 (eg, the partitioned table is not the primary ModifyTable node); in which cases, since the varno of Vars belonging to the foreign partition in the RETURNING expressions is the same as the partitioned table, calling deparseReturningList with the RT index set to 1 against the RETURNING expressions would produce attrs_used to be NULL, leading to postgres_fdw not retrieving actually inserted data from the remote, even if remote triggers might change those data. So, I fixed this as well, by setting the RT index accordingly to the partitioned table. Attached is a patch for fixing these issues. I'll add this to the open items list for PG11. (After addressing this, I'll post an updated version of the fix-postgres_fdw-WCO-handling patch.) Best regards, Etsuro Fujita
Attachment
Fujita-san, On 2018/04/16 20:25, Etsuro Fujita wrote: > Hi, > > While updating the fix-postgres_fdw-WCO-handling patch, I noticed that > the patch for tuple routing for foreign partitions doesn't work well > with remote triggers. Here is an example: > > postgres=# create table loct1 (a int check (a in (1)), b text); > postgres=# create function br_insert_trigfunc() returns trigger as > $$begin new.b := new.b || ' triggered !'; return new; end$$ language > plpgsql; > postgres=# create trigger loct1_br_insert_trigger before insert on loct1 > for each row execute procedure br_insert_trigfunc(); > postgres=# create table itrtest (a int, b text) partition by list (a); > postgres=# create foreign table remp1 (a int check (a in (1)), b text) > server loopback options (table_name 'loct1'); > postgres=# alter table itrtest attach partition remp1 for values in (1); > > This behaves oddly: > > postgres=# insert into itrtest values (1, 'foo') returning *; > a | b > ---+----- > 1 | foo > (1 row) > > INSERT 0 1 > > The new value of b in the RETURNING result should be concatenated with ' > triggered !', as shown below: > > postgres=# select tableoid::regclass, * from itrtest ; > tableoid | a | b > ----------+---+----------------- > remp1 | 1 | foo triggered ! > (1 row) > > The reason for that is: since that in ExecInitPartitionInfo, the core > calls BeginForeignInsert via ExecInitRoutingInfo before initializing > ri_returningList of ResultRelInfo for the partition, BeginForeignInsert > sees that the ri_returningList is NULL and incorrectly recognizes that > the local query doesn't have a RETURNING list. Good catch! > So, I moved the > ExecInitRoutingInfo call after building the ri_returningList (and before > handling ON CONFLICT because we reference the conversion map created by > that when handling that clause). The first version of the patch called > BeginForeignInsert that order, but I updated the patch incorrectly. :( I didn't notice that when reviewing either. Actually, I was under the impression that BeginForeignInsert consumes returningList from the ModifyTable node itself, not the ResultRelInfo, but now I see that ri_returningList was added exactly for BeginForeignInsert to consume. Good thing about that is that we get a Var-translated version instead of the original one that contains parent's attnos. > Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo > and added that to ExecInitPartitionInfo right after the> InitResultRelInfo call, because it would be better to abort the > operation as soon as we find the partition to be non-routable. Sounds fine. > Another > thing I noticed is the RT index of the foreign partition set to 1 in > postgresBeginForeignInsert to create a remote query; that would not work > for cases where the partitioned table has an RT index larger than 1 (eg, > the partitioned table is not the primary ModifyTable node); in which > cases, since the varno of Vars belonging to the foreign partition in the > RETURNING expressions is the same as the partitioned table, calling > deparseReturningList with the RT index set to 1 against the RETURNING > expressions would produce attrs_used to be NULL, leading to postgres_fdw > not retrieving actually inserted data from the remote, even if remote > triggers might change those data. So, I fixed this as well, by setting > the RT index accordingly to the partitioned table. Check. > Attached is a patch > for fixing these issues. I'll add this to the open items list for PG11. > (After addressing this, I'll post an updated version of the > fix-postgres_fdw-WCO-handling patch.) Your patch seems to fix the issue and code changes seem fine to me. Thanks, Amit
Hi Amit, (2018/04/17 10:10), Amit Langote wrote: > On 2018/04/16 20:25, Etsuro Fujita wrote: >> While updating the fix-postgres_fdw-WCO-handling patch, I noticed that >> the patch for tuple routing for foreign partitions doesn't work well >> with remote triggers. Here is an example: >> >> postgres=# create table loct1 (a int check (a in (1)), b text); >> postgres=# create function br_insert_trigfunc() returns trigger as >> $$begin new.b := new.b || ' triggered !'; return new; end$$ language >> plpgsql; >> postgres=# create trigger loct1_br_insert_trigger before insert on loct1 >> for each row execute procedure br_insert_trigfunc(); >> postgres=# create table itrtest (a int, b text) partition by list (a); >> postgres=# create foreign table remp1 (a int check (a in (1)), b text) >> server loopback options (table_name 'loct1'); >> postgres=# alter table itrtest attach partition remp1 for values in (1); >> >> This behaves oddly: >> >> postgres=# insert into itrtest values (1, 'foo') returning *; >> a | b >> ---+----- >> 1 | foo >> (1 row) >> >> INSERT 0 1 >> >> The new value of b in the RETURNING result should be concatenated with ' >> triggered !', as shown below: >> >> postgres=# select tableoid::regclass, * from itrtest ; >> tableoid | a | b >> ----------+---+----------------- >> remp1 | 1 | foo triggered ! >> (1 row) >> >> The reason for that is: since that in ExecInitPartitionInfo, the core >> calls BeginForeignInsert via ExecInitRoutingInfo before initializing >> ri_returningList of ResultRelInfo for the partition, BeginForeignInsert >> sees that the ri_returningList is NULL and incorrectly recognizes that >> the local query doesn't have a RETURNING list. > > Good catch! > >> So, I moved the >> ExecInitRoutingInfo call after building the ri_returningList (and before >> handling ON CONFLICT because we reference the conversion map created by >> that when handling that clause). The first version of the patch called >> BeginForeignInsert that order, but I updated the patch incorrectly. :( > > I didn't notice that when reviewing either. Actually, I was under the > impression that BeginForeignInsert consumes returningList from the > ModifyTable node itself, not the ResultRelInfo, but now I see that > ri_returningList was added exactly for BeginForeignInsert to consume. > Good thing about that is that we get a Var-translated version instead of > the original one that contains parent's attnos. > >> Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo >> and added that to ExecInitPartitionInfo right after the> InitResultRelInfo call, because it would be better to abortthe >> operation as soon as we find the partition to be non-routable. > > Sounds fine. > >> Another >> thing I noticed is the RT index of the foreign partition set to 1 in >> postgresBeginForeignInsert to create a remote query; that would not work >> for cases where the partitioned table has an RT index larger than 1 (eg, >> the partitioned table is not the primary ModifyTable node); in which >> cases, since the varno of Vars belonging to the foreign partition in the >> RETURNING expressions is the same as the partitioned table, calling >> deparseReturningList with the RT index set to 1 against the RETURNING >> expressions would produce attrs_used to be NULL, leading to postgres_fdw >> not retrieving actually inserted data from the remote, even if remote >> triggers might change those data. So, I fixed this as well, by setting >> the RT index accordingly to the partitioned table. > > Check. > >> Attached is a patch >> for fixing these issues. I'll add this to the open items list for PG11. >> (After addressing this, I'll post an updated version of the >> fix-postgres_fdw-WCO-handling patch.) > > Your patch seems to fix the issue and code changes seem fine to me. Thanks for the review! Best regards, Etsuro Fujita
Hello. At Tue, 17 Apr 2018 10:10:38 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <fefb9510-e304-27da-e984-a2b0ac77927a@lab.ntt.co.jp> > Fujita-san, > > On 2018/04/16 20:25, Etsuro Fujita wrote: > > Hi, > > > > While updating the fix-postgres_fdw-WCO-handling patch, I noticed that > > the patch for tuple routing for foreign partitions doesn't work well > > with remote triggers. Here is an example: > > > > postgres=# create table loct1 (a int check (a in (1)), b text); > > postgres=# create function br_insert_trigfunc() returns trigger as > > $$begin new.b := new.b || ' triggered !'; return new; end$$ language > > plpgsql; > > postgres=# create trigger loct1_br_insert_trigger before insert on loct1 > > for each row execute procedure br_insert_trigfunc(); > > postgres=# create table itrtest (a int, b text) partition by list (a); > > postgres=# create foreign table remp1 (a int check (a in (1)), b text) > > server loopback options (table_name 'loct1'); > > postgres=# alter table itrtest attach partition remp1 for values in (1); > > > > This behaves oddly: > > > > postgres=# insert into itrtest values (1, 'foo') returning *; > > a | b > > ---+----- > > 1 | foo > > (1 row) > > > > INSERT 0 1 > > > > The new value of b in the RETURNING result should be concatenated with ' > > triggered !', as shown below: > > > > postgres=# select tableoid::regclass, * from itrtest ; > > tableoid | a | b > > ----------+---+----------------- > > remp1 | 1 | foo triggered ! > > (1 row) > > > > The reason for that is: since that in ExecInitPartitionInfo, the core > > calls BeginForeignInsert via ExecInitRoutingInfo before initializing > > ri_returningList of ResultRelInfo for the partition, BeginForeignInsert > > sees that the ri_returningList is NULL and incorrectly recognizes that > > the local query doesn't have a RETURNING list. > > Good catch! > > > So, I moved the > > ExecInitRoutingInfo call after building the ri_returningList (and before > > handling ON CONFLICT because we reference the conversion map created by > > that when handling that clause). The first version of the patch called > > BeginForeignInsert that order, but I updated the patch incorrectly. :( > > I didn't notice that when reviewing either. Actually, I was under the > impression that BeginForeignInsert consumes returningList from the > ModifyTable node itself, not the ResultRelInfo, but now I see that > ri_returningList was added exactly for BeginForeignInsert to consume. > Good thing about that is that we get a Var-translated version instead of > the original one that contains parent's attnos. > > > Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo > > and added that to ExecInitPartitionInfo right after the> InitResultRelInfo call, because it would be better to abortthe > > operation as soon as we find the partition to be non-routable. > > Sounds fine. If I'm reading this correctly, ExecInitParititionInfo calls ExecInitRoutingInfo so currently CheckValidityResultRel is called for the child when partrel is created in ExecPrepareTupleRouting. But the move of CheckValidResultRel results in letting partrel just created in ExecPrepareTupleRouting not be checked at all since ri_ParititionReadyForRouting is always set true in ExecInitPartitionInfo. > > Another > > thing I noticed is the RT index of the foreign partition set to 1 in > > postgresBeginForeignInsert to create a remote query; that would not work > > for cases where the partitioned table has an RT index larger than 1 (eg, > > the partitioned table is not the primary ModifyTable node); in which > > cases, since the varno of Vars belonging to the foreign partition in the > > RETURNING expressions is the same as the partitioned table, calling > > deparseReturningList with the RT index set to 1 against the RETURNING > > expressions would produce attrs_used to be NULL, leading to postgres_fdw > > not retrieving actually inserted data from the remote, even if remote > > triggers might change those data. So, I fixed this as well, by setting > > the RT index accordingly to the partitioned table. > > Check. I'm not sure but is it true that the parent is always at resultRelation - 1? > > Attached is a patch > > for fixing these issues. I'll add this to the open items list for PG11. > > (After addressing this, I'll post an updated version of the > > fix-postgres_fdw-WCO-handling patch.) > > Your patch seems to fix the issue and code changes seem fine to me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2018/04/17 11:13, Kyotaro HORIGUCHI wrote: >>> Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo >>> and added that to ExecInitPartitionInfo right after the> InitResultRelInfo call, >>> because it would be better to abort the >>> operation as soon as we find the partition to be non-routable. >> >> Sounds fine. > > If I'm reading this correctly, ExecInitParititionInfo calls > ExecInitRoutingInfo so currently CheckValidityResultRel is called > for the child when partrel is created in ExecPrepareTupleRouting. > But the move of CheckValidResultRel results in letting partrel > just created in ExecPrepareTupleRouting not be checked at all > since ri_ParititionReadyForRouting is always set true in > ExecInitPartitionInfo. I thought the same upon reading the email, but it seems that the patch does add CheckValidResultRel() in ExecInitPartitionInfo() as well: @@ -335,6 +335,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, estate->es_instrument); /* + * Verify result relation is a valid target for an INSERT. An UPDATE of a + * partition-key becomes a DELETE+INSERT operation, so this check is still + * required when the operation is CMD_UPDATE. + */ + CheckValidResultRel(leaf_part_rri, CMD_INSERT); + + /* * Since we've just initialized this ResultRelInfo, it's not in any list * attached to the estate as yet. Add it, so that it can be found later. * >>> Another >>> thing I noticed is the RT index of the foreign partition set to 1 in >>> postgresBeginForeignInsert to create a remote query; that would not work >>> for cases where the partitioned table has an RT index larger than 1 (eg, >>> the partitioned table is not the primary ModifyTable node); in which >>> cases, since the varno of Vars belonging to the foreign partition in the >>> RETURNING expressions is the same as the partitioned table, calling >>> deparseReturningList with the RT index set to 1 against the RETURNING >>> expressions would produce attrs_used to be NULL, leading to postgres_fdw >>> not retrieving actually inserted data from the remote, even if remote >>> triggers might change those data. So, I fixed this as well, by setting >>> the RT index accordingly to the partitioned table. >> >> Check. > > I'm not sure but is it true that the parent is always at > resultRelation - 1? Unless I'm missing something, EState.es_range_table contains *all* range table entries that were created by the planner. So, if the planner assigned resultRelation as the RT index for the parent, then it seems we can rely on getting the same entry back with list_nth_cell(estate->es_range_table, resultRelation - 1). That seems true even if the parent wasn't the *primary* target relation. For example, the following works with the patch: -- in addition to the tables shown in Fujita-san's email, define one more -- local partition create table locp2 partition of itrtest for values in (2); -- simple case insert into itrtest values (1, 'foo') returning *; a | b ---+----------------- 1 | foo triggered ! (1 row) -- itrtest (the parent) appears as both the primary target relation and -- otherwise. The latter in the form of being mentioned in the with query with ins (a, b) as (insert into itrtest values (1, 'foo') returning *) insert into itrtest select a + 1, b from ins returning *; a | b ---+----------------- 2 | foo triggered ! (1 row) But maybe I misunderstood your question. Thanks, Amit
(2018/04/17 16:10), Amit Langote wrote: > On 2018/04/17 11:13, Kyotaro HORIGUCHI wrote: >> If I'm reading this correctly, ExecInitParititionInfo calls >> ExecInitRoutingInfo so currently CheckValidityResultRel is called >> for the child when partrel is created in ExecPrepareTupleRouting. >> But the move of CheckValidResultRel results in letting partrel >> just created in ExecPrepareTupleRouting not be checked at all >> since ri_ParititionReadyForRouting is always set true in >> ExecInitPartitionInfo. > > I thought the same upon reading the email, but it seems that the patch > does add CheckValidResultRel() in ExecInitPartitionInfo() as well: > > @@ -335,6 +335,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, > estate->es_instrument); > > /* > + * Verify result relation is a valid target for an INSERT. An UPDATE > of a > + * partition-key becomes a DELETE+INSERT operation, so this check is > still > + * required when the operation is CMD_UPDATE. > + */ > + CheckValidResultRel(leaf_part_rri, CMD_INSERT); > + > + /* > * Since we've just initialized this ResultRelInfo, it's not in any list > * attached to the estate as yet. Add it, so that it can be found later. > * That's right. So, the validity check would be applied to a partition created in ExecPrepareTupleRouting as well. >>>> Another >>>> thing I noticed is the RT index of the foreign partition set to 1 in >>>> postgresBeginForeignInsert to create a remote query; that would not work >>>> for cases where the partitioned table has an RT index larger than 1 (eg, >>>> the partitioned table is not the primary ModifyTable node); in which >>>> cases, since the varno of Vars belonging to the foreign partition in the >>>> RETURNING expressions is the same as the partitioned table, calling >>>> deparseReturningList with the RT index set to 1 against the RETURNING >>>> expressions would produce attrs_used to be NULL, leading to postgres_fdw >>>> not retrieving actually inserted data from the remote, even if remote >>>> triggers might change those data. So, I fixed this as well, by setting >>>> the RT index accordingly to the partitioned table. >>> >>> Check. >> >> I'm not sure but is it true that the parent is always at >> resultRelation - 1? > > Unless I'm missing something, EState.es_range_table contains *all* range > table entries that were created by the planner. So, if the planner > assigned resultRelation as the RT index for the parent, then it seems we > can rely on getting the same entry back with > list_nth_cell(estate->es_range_table, resultRelation - 1). That seems > true even if the parent wasn't the *primary* target relation. For > example, the following works with the patch: > > -- in addition to the tables shown in Fujita-san's email, define one more > -- local partition > create table locp2 partition of itrtest for values in (2); > > -- simple case > insert into itrtest values (1, 'foo') returning *; > a | b > ---+----------------- > 1 | foo triggered ! > (1 row) > > -- itrtest (the parent) appears as both the primary target relation and > -- otherwise. The latter in the form of being mentioned in the with query > > with ins (a, b) as (insert into itrtest values (1, 'foo') returning *) > insert into itrtest select a + 1, b from ins returning *; > a | b > ---+----------------- > 2 | foo triggered ! > (1 row) I agree with that. Thanks for the explanation and the example, Amit! Maybe my explanation (or the naming parent_rte/child_rte in the patch) was not necessarily good, so I guess that that confused horiguchi-san. Let me explain. In the INSERT/COPY-tuple-routing case, as explained by Amit, the RTE at that position in the EState's range table is the one for the partitioned table of a given partition, so the statement would be true. BUT in the UPDATE-tuple-routing case, the RTE is the one for the given partition, not for the partitioned table, so the statement would not be true. In the latter case, we don't need to create a child RTE and replace the original RTE with it, but I handled both cases the same way for simplicity. Thanks for the comments, Horiguchi-san! Best regards, Etsuro Fujita
On 2018/04/17 16:41, Etsuro Fujita wrote: > In the INSERT/COPY-tuple-routing case, as explained by Amit, the > RTE at that position in the EState's range table is the one for the > partitioned table of a given partition, so the statement would be true. > BUT in the UPDATE-tuple-routing case, the RTE is the one for the given > partition, not for the partitioned table, so the statement would not be > true. In the latter case, we don't need to create a child RTE and replace > the original RTE with it, but I handled both cases the same way for > simplicity. Oh, I didn't really consider this part carefully. That the resultRelInfo received by BeginForeignInsert (when called by ExecInitRoutingInfo) could be a reused UPDATE result relation. It might be possible to justify the parent_rte/child_rte terminology by explaining it a bit better. I see three cases that arise during tuple routing: 1. This is INSERT and so the resultRelation that's received in BeginForeignInsert has been freshly created in ExecInitPartitionInfo and it bears node->nominalRelation or 1 as its ri_RangeTableIndex 2. This is UPDATE and the resultRelInfo that's received in BeginForeignInsert has been freshly created in ExecInitPartitionInfo and it bears node->nominalRelation or 1 as its ri_RangeTableIndex 3. This is UPDATE and the resultRelInfo that's received in BeginForeignInsert is a reused one, in which case, it bears the planner assigned ri_RangeTableIndex In all three cases, I think we can rely on using ri_RangeTableIndex to fetch a valid "parent" RTE from es_range_table. Do you think we need to clarify this in a comment? Thanks, Amit
(2018/04/18 14:44), Amit Langote wrote: > On 2018/04/17 16:41, Etsuro Fujita wrote: >> In the INSERT/COPY-tuple-routing case, as explained by Amit, the >> RTE at that position in the EState's range table is the one for the >> partitioned table of a given partition, so the statement would be true. >> BUT in the UPDATE-tuple-routing case, the RTE is the one for the given >> partition, not for the partitioned table, so the statement would not be >> true. In the latter case, we don't need to create a child RTE and replace >> the original RTE with it, but I handled both cases the same way for >> simplicity. > > Oh, I didn't really consider this part carefully. That the resultRelInfo > received by BeginForeignInsert (when called by ExecInitRoutingInfo) could > be a reused UPDATE result relation. It might be possible to justify the > parent_rte/child_rte terminology by explaining it a bit better. Yeah, I think that terminology would be confusing, so I changed it to old_rte/new_rte. > I see > three cases that arise during tuple routing: > > 1. This is INSERT and so the resultRelation that's received in > BeginForeignInsert has been freshly created in ExecInitPartitionInfo > and it bears node->nominalRelation or 1 as its ri_RangeTableIndex Right. > 2. This is UPDATE and the resultRelInfo that's received in > BeginForeignInsert has been freshly created in ExecInitPartitionInfo > and it bears node->nominalRelation or 1 as its ri_RangeTableIndex In that case, we have a valid plan node, so it would only bear node->nominalRelation. > 3. This is UPDATE and the resultRelInfo that's received in > BeginForeignInsert is a reused one, in which case, it bears the planner > assigned ri_RangeTableIndex Right. > In all three cases, I think we can rely on using ri_RangeTableIndex to > fetch a valid "parent" RTE from es_range_table. I slept on this, I noticed there is another bug in case #2. Here is an example with the previous patch: postgres=# create table utrtest (a int, b text) partition by list (a); postgres=# create table loct (a int check (a in (1)), b text); postgres=# create foreign table remp (a int check (a in (1)), b text) server loopback options (table_name 'loct'); postgres=# create table locp (a int check (a in (2)), b text); postgres=# alter table utrtest attach partition remp for values in (1); postgres=# alter table utrtest attach partition locp for values in (2); postgres=# create trigger loct_br_insert_trigger before insert on loct for each row execute procedure br_insert_trigfunc(); where the trigger function is the one defined in an earlier email. postgres=# insert into utrtest values (2, 'qux'); INSERT 0 1 postgres=# update utrtest set a = 1 where a = 2 returning *; a | b ---+----- 1 | qux (1 row) UPDATE 1 Wrong result! The result should be concatenated with ' triggered !'. In case #2, since we initialize expressions for the partition's ResultRelInfo including RETURNING by translating the attnos of the corresponding expressions in the ResultRelInfo for the first subplan target rel, I think we should replace the RTE for the first subplan target rel, not the RTE for the nominal rel, with the new one created for the foreign table. Attached is an updated version for fixing this issue. > Do you think we need to clarify this in a comment? Yeah, but I updated comments about this a little bit different way in the attached. Does that make sense? Thanks for the comments! Best regards, Etsuro Fujita
Attachment
On 2018/04/18 21:55, Etsuro Fujita wrote: > (2018/04/18 14:44), Amit Langote wrote: >> Oh, I didn't really consider this part carefully. That the resultRelInfo >> received by BeginForeignInsert (when called by ExecInitRoutingInfo) could >> be a reused UPDATE result relation. It might be possible to justify the >> parent_rte/child_rte terminology by explaining it a bit better. > >> 2. This is UPDATE and the resultRelInfo that's received in >> BeginForeignInsert has been freshly created in ExecInitPartitionInfo >> and it bears node->nominalRelation or 1 as its ri_RangeTableIndex [ ... ] >> In all three cases, I think we can rely on using ri_RangeTableIndex to >> fetch a valid "parent" RTE from es_range_table. > > I slept on this, I noticed there is another bug in case #2. Here is an > example with the previous patch: > > postgres=# create table utrtest (a int, b text) partition by list (a); > postgres=# create table loct (a int check (a in (1)), b text); > postgres=# create foreign table remp (a int check (a in (1)), b text) > server loopback options (table_name 'loct'); > postgres=# create table locp (a int check (a in (2)), b text); > postgres=# alter table utrtest attach partition remp for values in (1); > postgres=# alter table utrtest attach partition locp for values in (2); > postgres=# create trigger loct_br_insert_trigger before insert on loct for > each row execute procedure br_insert_trigfunc(); > > where the trigger function is the one defined in an earlier email. > > postgres=# insert into utrtest values (2, 'qux'); > INSERT 0 1 > postgres=# update utrtest set a = 1 where a = 2 returning *; > a | b > ---+----- > 1 | qux > (1 row) > > UPDATE 1 > > Wrong result! The result should be concatenated with ' triggered !'. > > In case #2, since we initialize expressions for the partition's > ResultRelInfo including RETURNING by translating the attnos of the > corresponding expressions in the ResultRelInfo for the first subplan > target rel, I think we should replace the RTE for the first subplan target > rel, not the RTE for the nominal rel, with the new one created for the > foreign table. Ah, so in case #2, we use firstVarno (per ExecInitPartitionInfo terms) as varno throughout. So, we'd like to put the new RTE in that slot. Would it be a good idea to explain *why* we need to replace the RTE in the first place? Afaics, it's for deparseColumnRef() to find the correct relation when it uses planner_rt_fetch() to get the RTE. > Attached is an updated version for fixing this issue. > >> Do you think we need to clarify this in a comment? > > Yeah, but I updated comments about this a little bit different way in the > attached. Does that make sense? It looks generally good, although in the following: + /* + * If the foreign table is a partition, temporarily replace the parent's + * RTE in the range table with a new target RTE describing the foreign + * table for use by deparseInsertSql() and create_foreign_modify() below. + */ .. it could be mentioned that we don't switch either the RTE or the value assigned to resultRelation if the RTE currently at resultRelation RT index is the one created by the planner and refers to the same relation that resultRelInfo does. Beside that, I noticed that the patch has a stray white-space at the end of the following line: + /* partition that isn't a subplan target rel */ Thanks, Amit
At Tue, 17 Apr 2018 16:41:31 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <5AD5A52B.7050206@lab.ntt.co.jp> > (2018/04/17 16:10), Amit Langote wrote: > > On 2018/04/17 11:13, Kyotaro HORIGUCHI wrote: > >> If I'm reading this correctly, ExecInitParititionInfo calls > >> ExecInitRoutingInfo so currently CheckValidityResultRel is called > >> for the child when partrel is created in ExecPrepareTupleRouting. > >> But the move of CheckValidResultRel results in letting partrel > >> just created in ExecPrepareTupleRouting not be checked at all > >> since ri_ParititionReadyForRouting is always set true in > >> ExecInitPartitionInfo. > > > > I thought the same upon reading the email, but it seems that the patch > > does add CheckValidResultRel() in ExecInitPartitionInfo() as well: > > > > @@ -335,6 +335,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, > > estate->es_instrument); > > > > /* > > + * Verify result relation is a valid target for an INSERT. An UPDATE > > of a > > + * partition-key becomes a DELETE+INSERT operation, so this check is > > still > > + * required when the operation is CMD_UPDATE. > > + */ > > + CheckValidResultRel(leaf_part_rri, CMD_INSERT); > > + > > + /* > > * Since we've just initialized this ResultRelInfo, it's not in any > > * list > > * attached to the estate as yet. Add it, so that it can be found > > * later. > > * > > That's right. So, the validity check would be applied to a partition > created in ExecPrepareTupleRouting as well. Yes, that's actually true but it seems quite bug-prone or at least hard to understand. I understand that the CheckValidResultRel(INSERT) is just a prerequisite for ExecInitRoutingInfo but the relationship is obfucated after this patch. If we have a strong reason to error-out as fast as possible, the original code seems better to me.. > >>>> Another > >>>> thing I noticed is the RT index of the foreign partition set to 1 in > >>>> postgresBeginForeignInsert to create a remote query; that would not > >>>> work > >>>> for cases where the partitioned table has an RT index larger than 1 > >>>> (eg, > >>>> the partitioned table is not the primary ModifyTable node); in which > >>>> cases, since the varno of Vars belonging to the foreign partition in > >>>> the > >>>> RETURNING expressions is the same as the partitioned table, calling > >>>> deparseReturningList with the RT index set to 1 against the RETURNING > >>>> expressions would produce attrs_used to be NULL, leading to > >>>> postgres_fdw > >>>> not retrieving actually inserted data from the remote, even if remote > >>>> triggers might change those data. So, I fixed this as well, by > >>>> setting > >>>> the RT index accordingly to the partitioned table. > >>> > >>> Check. > >> > >> I'm not sure but is it true that the parent is always at > >> resultRelation - 1? > > > > Unless I'm missing something, EState.es_range_table contains *all* > > range > > table entries that were created by the planner. So, if the planner > > assigned resultRelation as the RT index for the parent, then it seems > > we > > can rely on getting the same entry back with > > list_nth_cell(estate->es_range_table, resultRelation - 1). That seems > > true even if the parent wasn't the *primary* target relation. For > > example, the following works with the patch: > > > > -- in addition to the tables shown in Fujita-san's email, define one more > > -- local partition > > create table locp2 partition of itrtest for values in (2); > > > > -- simple case > > insert into itrtest values (1, 'foo') returning *; > > a | b > > ---+----------------- > > 1 | foo triggered ! > > (1 row) > > > > -- itrtest (the parent) appears as both the primary target relation and > > -- otherwise. The latter in the form of being mentioned in the with > > -- query > > > > with ins (a, b) as (insert into itrtest values (1, 'foo') returning *) > > insert into itrtest select a + 1, b from ins returning *; > > a | b > > ---+----------------- > > 2 | foo triggered ! > > (1 row) > > I agree with that. Thanks for the explanation and the example, Amit! > > Maybe my explanation (or the naming parent_rte/child_rte in the patch) > was not necessarily good, so I guess that that confused > horiguchi-san. Let me explain. In the INSERT/COPY-tuple-routing case, > as explained by Amit, the RTE at that position in the EState's range > table is the one for the partitioned table of a given partition, so > the statement would be true. BUT in the UPDATE-tuple-routing case, > the RTE is the one for the given partition, not for the partitioned > table, so the statement would not be true. In the latter case, we > don't need to create a child RTE and replace the original RTE with it, > but I handled both cases the same way for simplicity. Thank you, your understanding of my concern is right. Thanks for the explanation. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
(2018/04/19 16:43), Amit Langote wrote: > On 2018/04/18 21:55, Etsuro Fujita wrote: >> (2018/04/18 14:44), Amit Langote wrote: >>> That the resultRelInfo >>> received by BeginForeignInsert (when called by ExecInitRoutingInfo) could >>> be a reused UPDATE result relation. >>> 2. This is UPDATE and the resultRelInfo that's received in >>> BeginForeignInsert has been freshly created in ExecInitPartitionInfo >>> and it bears node->nominalRelation or 1 as its ri_RangeTableIndex >>> In all three cases, I think we can rely on using ri_RangeTableIndex to >>> fetch a valid "parent" RTE from es_range_table. >> >> I slept on this, I noticed there is another bug in case #2. >> In case #2, since we initialize expressions for the partition's >> ResultRelInfo including RETURNING by translating the attnos of the >> corresponding expressions in the ResultRelInfo for the first subplan >> target rel, I think we should replace the RTE for the first subplan target >> rel, not the RTE for the nominal rel, with the new one created for the >> foreign table. > > Ah, so in case #2, we use firstVarno (per ExecInitPartitionInfo terms) as > varno throughout. So, we'd like to put the new RTE in that slot. > > Would it be a good idea to explain *why* we need to replace the RTE in the > first place? Afaics, it's for deparseColumnRef() to find the correct > relation when it uses planner_rt_fetch() to get the RTE. That might be a good idea, but one thing I'm concerned about is that since we might use the RTE in different ways in future developments, such a comment might be obsolete rather sooner. So, I just added *for use by deparseInsertSql() and create_foreign_modify() below* to the comments shown below. But I'd like to leave this to the committer. >> Attached is an updated version for fixing this issue. >> >>> Do you think we need to clarify this in a comment? >> >> Yeah, but I updated comments about this a little bit different way in the >> attached. Does that make sense? > > It looks generally good, although in the following: > > + /* > + * If the foreign table is a partition, temporarily replace the parent's > + * RTE in the range table with a new target RTE describing the foreign > + * table for use by deparseInsertSql() and create_foreign_modify() below. > + */ > > .. it could be mentioned that we don't switch either the RTE or the value > assigned to resultRelation if the RTE currently at resultRelation RT index > is the one created by the planner and refers to the same relation that > resultRelInfo does. Done. > Beside that, I noticed that the patch has a stray white-space at the end > of the following line: > > + /* partition that isn't a subplan target rel */ Fixed. Thanks for the review! Attached is a new version of the patch. Best regards, Etsuro Fujita
Attachment
(2018/04/19 19:03), Kyotaro HORIGUCHI wrote: > At Tue, 17 Apr 2018 16:41:31 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5AD5A52B.7050206@lab.ntt.co.jp> >> (2018/04/17 16:10), Amit Langote wrote: >>> On 2018/04/17 11:13, Kyotaro HORIGUCHI wrote: >>>> If I'm reading this correctly, ExecInitParititionInfo calls >>>> ExecInitRoutingInfo so currently CheckValidityResultRel is called >>>> for the child when partrel is created in ExecPrepareTupleRouting. >>>> But the move of CheckValidResultRel results in letting partrel >>>> just created in ExecPrepareTupleRouting not be checked at all >>>> since ri_ParititionReadyForRouting is always set true in >>>> ExecInitPartitionInfo. >>> >>> I thought the same upon reading the email, but it seems that the patch >>> does add CheckValidResultRel() in ExecInitPartitionInfo() as well: >>> >>> @@ -335,6 +335,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, >>> estate->es_instrument); >>> >>> /* >>> + * Verify result relation is a valid target for an INSERT. An UPDATE >>> of a >>> + * partition-key becomes a DELETE+INSERT operation, so this check is >>> still >>> + * required when the operation is CMD_UPDATE. >>> + */ >>> + CheckValidResultRel(leaf_part_rri, CMD_INSERT); >>> + >>> + /* >>> * Since we've just initialized this ResultRelInfo, it's not in any >>> * list >>> * attached to the estate as yet. Add it, so that it can be found >>> * later. >>> * >> >> That's right. So, the validity check would be applied to a partition >> created in ExecPrepareTupleRouting as well. > > Yes, that's actually true but it seems quite bug-prone or at > least hard to understand. I understand that the > CheckValidResultRel(INSERT) is just a prerequisite for > ExecInitRoutingInfo but the relationship is obfucated after this > patch. If we have a strong reason to error-out as fast as > possible, the original code seems better to me.. Actually, I think that change would make the initialization for a partition more consistent with that for a main target rel in ExecInitModifyTable, where we first perform the CheckValidResultRel check against a target rel, and if valid, then open indices, initializes the FDW, initialize expressions such as WITH CHECK OPTION and RETURNING, and so on. That's reasonable, and I like consistency, so I'd vote for that change. Thanks for the review! Best regards, Etsuro Fujita
Fujita-san, Thanks for the updated patch. On 2018/04/19 21:42, Etsuro Fujita wrote: > (2018/04/19 16:43), Amit Langote wrote: >> Would it be a good idea to explain *why* we need to replace the RTE in the >> first place? Afaics, it's for deparseColumnRef() to find the correct >> relation when it uses planner_rt_fetch() to get the RTE. > > That might be a good idea, but one thing I'm concerned about is that since > we might use the RTE in different ways in future developments, such a > comment might be obsolete rather sooner. So, I just added *for use by > deparseInsertSql() and create_foreign_modify() below* to the comments > shown below. But I'd like to leave this to the committer. OK, fine by me. >> It looks generally good, although in the following: >> >> + /* >> + * If the foreign table is a partition, temporarily replace the >> parent's >> + * RTE in the range table with a new target RTE describing the foreign >> + * table for use by deparseInsertSql() and create_foreign_modify() >> below. >> + */ >> >> .. it could be mentioned that we don't switch either the RTE or the value >> assigned to resultRelation if the RTE currently at resultRelation RT index >> is the one created by the planner and refers to the same relation that >> resultRelInfo does. > > Done. > >> Beside that, I noticed that the patch has a stray white-space at the end >> of the following line: >> >> + /* partition that isn't a subplan target rel */ > > Fixed. > > Thanks for the review! Attached is a new version of the patch. Looks good. Thanks, Amit
(2018/04/20 9:48), Amit Langote wrote: > On 2018/04/19 21:42, Etsuro Fujita wrote: >> (2018/04/19 16:43), Amit Langote wrote: >>> Would it be a good idea to explain *why* we need to replace the RTE in the >>> first place? Afaics, it's for deparseColumnRef() to find the correct >>> relation when it uses planner_rt_fetch() to get the RTE. >> >> That might be a good idea, but one thing I'm concerned about is that since >> we might use the RTE in different ways in future developments, such a >> comment might be obsolete rather sooner. So, I just added *for use by >> deparseInsertSql() and create_foreign_modify() below* to the comments >> shown below. But I'd like to leave this to the committer. > > OK, fine by me. > >>> It looks generally good, although in the following: >>> >>> + /* >>> + * If the foreign table is a partition, temporarily replace the >>> parent's >>> + * RTE in the range table with a new target RTE describing the foreign >>> + * table for use by deparseInsertSql() and create_foreign_modify() >>> below. >>> + */ >>> >>> .. it could be mentioned that we don't switch either the RTE or the value >>> assigned to resultRelation if the RTE currently at resultRelation RT index >>> is the one created by the planner and refers to the same relation that >>> resultRelInfo does. >> >> Done. >> >>> Beside that, I noticed that the patch has a stray white-space at the end >>> of the following line: >>> >>> + /* partition that isn't a subplan target rel */ >> >> Fixed. >> >> Thanks for the review! Attached is a new version of the patch. > > Looks good. Thank you! Best regards, Etsuro Fujita
Robert, I think this is your turf, per 3d956d9562aa. Are you looking into it? Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Robert, I think this is your turf, per 3d956d9562aa. Are you looking > into it? Thanks for the ping. I just had a look over the proposed patch and I guess I don't like it very much. Temporarily modifying the range table in place and then changing it back before we return seems ugly and error-prone. I hope we can come up with a solution that doesn't involve needing to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: >> Robert, I think this is your turf, per 3d956d9562aa. Are you looking >> into it? > > Thanks for the ping. I just had a look over the proposed patch and I > guess I don't like it very much. Temporarily modifying the range > table in place and then changing it back before we return seems ugly > and error-prone. I hope we can come up with a solution that doesn't > involve needing to do that. I have done some refactoring to avoid that. See attached. I didn't really get beyond the refactoring stage with this today. This version still seems to work, but I don't really understand the logic in postgresBeginForeignInsert which decides whether to use the RTE from the range table or create our own. We seem to need to do one sometimes and the other sometimes, but I don't know why that is, really. Nor do I understand why we need the other changes in the patch. There's probably a good reason, but I haven't figured it out yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hello. At Tue, 24 Apr 2018 15:49:20 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+Tgmob+CiHt+9JEUiXzNVmUcUf1zBb-bUeffVmbZWJHV0YVtw@mail.gmail.com> > On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera > > <alvherre@alvh.no-ip.org> wrote: > >> Robert, I think this is your turf, per 3d956d9562aa. Are you looking > >> into it? > > > > Thanks for the ping. I just had a look over the proposed patch and I > > guess I don't like it very much. Temporarily modifying the range > > table in place and then changing it back before we return seems ugly > > and error-prone. I hope we can come up with a solution that doesn't > > involve needing to do that. > > I have done some refactoring to avoid that. See attached. > > I didn't really get beyond the refactoring stage with this today. > This version still seems to work, but I don't really understand the > logic in postgresBeginForeignInsert which decides whether to use the > RTE from the range table or create our own. We seem to need to do one > sometimes and the other sometimes, but I don't know why that is, > really. Nor do I understand why we need the other changes in the > patch. There's probably a good reason, but I haven't figured it out > yet. FWIW, the refactoring drops the condition that the ForeignInsert is a part of UPDATE. The code exists just for avoiding temprary RTE generation in 2 cases out of the 3 possible cases mentioned in [1]. If it looks too complex for the gain, we can always create an RTE for deparsing as Fujita-san's first patch in this thread did. Anyway the condition for "dostuff" + "is update" might be a bit too arbitrary. [1] https://www.postgresql.org/message-id/f970d875-9711-b8cb-f270-965fa3e40334@lab.ntt.co.jp regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2018/04/25 4:49, Robert Haas wrote: > On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera >> <alvherre@alvh.no-ip.org> wrote: >>> Robert, I think this is your turf, per 3d956d9562aa. Are you looking >>> into it? >> >> Thanks for the ping. I just had a look over the proposed patch and I >> guess I don't like it very much. Temporarily modifying the range >> table in place and then changing it back before we return seems ugly >> and error-prone. I hope we can come up with a solution that doesn't >> involve needing to do that. > > I have done some refactoring to avoid that. See attached. +1 for getting rid of the PlannerInfo argument of the many functions in that code. I have long wondered if we couldn't rid of it and especially thought of it when reviewing this patch. Thanks, Amit
I forgot to mention that. At Wed, 25 Apr 2018 10:17:02 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <f894af29-345f-678b-480a-9f6e8cc314bd@lab.ntt.co.jp> > On 2018/04/25 4:49, Robert Haas wrote: > > On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera > >> <alvherre@alvh.no-ip.org> wrote: > >>> Robert, I think this is your turf, per 3d956d9562aa. Are you looking > >>> into it? > >> > >> Thanks for the ping. I just had a look over the proposed patch and I > >> guess I don't like it very much. Temporarily modifying the range > >> table in place and then changing it back before we return seems ugly > >> and error-prone. I hope we can come up with a solution that doesn't > >> involve needing to do that. > > > > I have done some refactoring to avoid that. See attached. > > +1 for getting rid of the PlannerInfo argument of the many functions in > that code. I have long wondered if we couldn't rid of it and especially > thought of it when reviewing this patch. +1 from me. Thanks for making things simpler and easy to understand. I feel the same as Amit:p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Oops, really meant to send the "+1 to the root -> rte refactoring" comment and the rest in the same email. On 2018/04/25 4:49, Robert Haas wrote: > I have done some refactoring to avoid that. See attached. > > I didn't really get beyond the refactoring stage with this today. > This version still seems to work, but I don't really understand the > logic in postgresBeginForeignInsert which decides whether to use the > RTE from the range table or create our own. We seem to need to do one > sometimes and the other sometimes, but I don't know why that is, > really. Nor do I understand why we need the other changes in the > patch. There's probably a good reason, but I haven't figured it out > yet. So, the main part of the patch that fixes the bug is the following per the latest v4 patch. + if (resultRelInfo->ri_PartitionRoot && plan) + { + bool dostuff = true; + + if (resultRelation != plan->nominalRelation) + dostuff = false; + else + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex; + + if (dostuff) + { + rte = list_nth(estate->es_range_table, resultRelation - 1); + rte = copyObject(rte); + rte->relid = RelationGetRelid(rel); + rte->relkind = RELKIND_FOREIGN_TABLE; + } + } + + if (rte == NULL) + { + rte = makeNode(RangeTblEntry); + rte->rtekind = RTE_RELATION; + rte->relid = RelationGetRelid(rel); + rte->relkind = RELKIND_FOREIGN_TABLE; + } After the refactoring, it appears to me that we only need this much: + rte = makeNode(RangeTblEntry); + rte->rtekind = RTE_RELATION; + rte->relid = RelationGetRelid(rel); + rte->relkind = RELKIND_FOREIGN_TABLE; that is, *after* teaching ExecInitPartitionInfo to correctly set the RT index of the ResultRelInfo it creates. After the refactoring, the only thing this patch needs to do is to choose the RT index of the result relation correctly. We currently use this: + Index resultRelation = resultRelInfo->ri_RangeTableIndex; And the rest of the code the patch adds is only to adjust it based on where the partition ResultRelInfo seems to have originated from. If the ri_RangeTableIndex was set correctly to begin with, we wouldn't need to fiddle with that in the FDW code. Regular ResultRelInfo's already have it set correctly, it's only the ones that ExecInitPartitionInfo creates that seem be creating the problem. I tried to do that. So, attached are two patches. 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to InitResultRelInfo 2. v5 of the patch to fix the bug of foreign partitions Thoughts? Thanks, Amit
Attachment
At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <573e60cc-beeb-b534-d89a-7622fae35585@lab.ntt.co.jp> > Oops, really meant to send the "+1 to the root -> rte refactoring" comment > and the rest in the same email. > > On 2018/04/25 4:49, Robert Haas wrote: > > I have done some refactoring to avoid that. See attached. > > > > I didn't really get beyond the refactoring stage with this today. > > This version still seems to work, but I don't really understand the > > logic in postgresBeginForeignInsert which decides whether to use the > > RTE from the range table or create our own. We seem to need to do one > > sometimes and the other sometimes, but I don't know why that is, > > really. Nor do I understand why we need the other changes in the > > patch. There's probably a good reason, but I haven't figured it out > > yet. > > So, the main part of the patch that fixes the bug is the following per the > latest v4 patch. > > + if (resultRelInfo->ri_PartitionRoot && plan) > + { > + bool dostuff = true; > + > + if (resultRelation != plan->nominalRelation) > + dostuff = false; > + else > + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex; > + > + if (dostuff) > + { > + rte = list_nth(estate->es_range_table, resultRelation - 1); > + rte = copyObject(rte); > + rte->relid = RelationGetRelid(rel); > + rte->relkind = RELKIND_FOREIGN_TABLE; > + } > + } > + > + if (rte == NULL) > + { > + rte = makeNode(RangeTblEntry); > + rte->rtekind = RTE_RELATION; > + rte->relid = RelationGetRelid(rel); > + rte->relkind = RELKIND_FOREIGN_TABLE; > + } > > After the refactoring, it appears to me that we only need this much: > > + rte = makeNode(RangeTblEntry); > + rte->rtekind = RTE_RELATION; > + rte->relid = RelationGetRelid(rel); > + rte->relkind = RELKIND_FOREIGN_TABLE; Mmm.. That is, only relid is required to deparse (I don't mean that it should be refactored so.). On the other hand create_foreign_modify requires rte->checkAsUser as well. The following query (probably) unexpectedly fails with the latest patch. It succeeds with -3 patch. =# create user usr1 login; =# create view v1 as select * from itrtest; =# revoke all ON itrtest FROM PUBLIC; =# grant SELECT, INSERT ON v1 to usr1; => select * from v1; -- OK => insert into v1 (select n, 'foo' from (values (1), (2)) as t(n)) returning *; ERROR: password is required DETAIL: Non-superusers must provide a password in the user mapping. We need to read it of the parent? > that is, *after* teaching ExecInitPartitionInfo to correctly set the RT > index of the ResultRelInfo it creates. After the refactoring, the only > thing this patch needs to do is to choose the RT index of the result > relation correctly. We currently use this: > > + Index resultRelation = resultRelInfo->ri_RangeTableIndex; > > And the rest of the code the patch adds is only to adjust it based on > where the partition ResultRelInfo seems to have originated from. If the > ri_RangeTableIndex was set correctly to begin with, we wouldn't need to > fiddle with that in the FDW code. Regular ResultRelInfo's already have it > set correctly, it's only the ones that ExecInitPartitionInfo creates that > seem be creating the problem. > > I tried to do that. So, attached are two patches. > > 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to > InitResultRelInfo > > 2. v5 of the patch to fix the bug of foreign partitions > > Thoughts? Maybe, one reason that I feel uneasy is how the patch accesses desired resultRelInfo. + Index firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex; Looking ExecInitModifyTable, just one resultRelInfo has been passed to BeginForeignModify so it should not access as an array. I will feel at easy if the line were in the following shape. Does it make sense? + Index firstVarno = mtstate->resultRelInfo->ri_RangeTableIndex; regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Horiguchi-san, Thanks for taking a look at it. On 2018/04/25 16:42, Kyotaro HORIGUCHI wrote: > At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote wrote: >> After the refactoring, it appears to me that we only need this much: >> >> + rte = makeNode(RangeTblEntry); >> + rte->rtekind = RTE_RELATION; >> + rte->relid = RelationGetRelid(rel); >> + rte->relkind = RELKIND_FOREIGN_TABLE; > > Mmm.. That is, only relid is required to deparse (I don't mean > that it should be refactored so.). On the other hand > create_foreign_modify requires rte->checkAsUser as well. The > following query (probably) unexpectedly fails with the latest > patch. It succeeds with -3 patch. > > =# create user usr1 login; > =# create view v1 as select * from itrtest; > =# revoke all ON itrtest FROM PUBLIC; > =# grant SELECT, INSERT ON v1 to usr1; > => select * from v1; -- OK > => insert into v1 (select n, 'foo' from (values (1), (2)) as t(n)) returning *; > ERROR: password is required > DETAIL: Non-superusers must provide a password in the user mapping. > > We need to read it of the parent? Hmm, I missed that we do need information from a proper RTE as well. So, I suppose we should be doing this instead of creating the RTE for foreign partition from scratch. + rte = list_nth(estate->es_range_table, resultRelation - 1); + rte = copyObject(rte); + rte->relid = RelationGetRelid(rel); + rte->relkind = RELKIND_FOREIGN_TABLE; If we apply the other patch I proposed, resultRelation always points to the correct RTE. >> I tried to do that. So, attached are two patches. >> >> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >> InitResultRelInfo >> >> 2. v5 of the patch to fix the bug of foreign partitions >> >> Thoughts? > > Maybe, one reason that I feel uneasy is how the patch accesses > desired resultRelInfo. > > + Index firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex; > > Looking ExecInitModifyTable, just one resultRelInfo has been > passed to BeginForeignModify so it should not access as an > array. I will feel at easy if the line were in the following > shape. Does it make sense? > > + Index firstVarno = mtstate->resultRelInfo->ri_RangeTableIndex; This is the comment on teach-ExecInitPartitionInfo-to-use-correct-RT-index.patch, right? I haven't seen either ExecInitModifyTable or BeginForeignModify being involved in this discussion, but I see your point. I see no problem with doing it that way, I have updated that patch to do it that way. Also, changed the line above it that is unrelated to this patch just to be consistent. Attached updated patches: 1. teach-ExecInitPartitionInfo-to-use-correct-RT-index-2.patch 2. fix-tuple-routing-for-foreign-partitions-6.patch Thanks, Amit
Attachment
(2018/04/25 4:49), Robert Haas wrote: > On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera >> <alvherre@alvh.no-ip.org> wrote: >>> Robert, I think this is your turf, per 3d956d9562aa. Are you looking >>> into it? >> >> Thanks for the ping. I just had a look over the proposed patch and I >> guess I don't like it very much. Temporarily modifying the range >> table in place and then changing it back before we return seems ugly >> and error-prone. I hope we can come up with a solution that doesn't >> involve needing to do that. I see your concern. > I have done some refactoring to avoid that. See attached. I like this refactoring. > I didn't really get beyond the refactoring stage with this today. > This version still seems to work, but I don't really understand the > logic in postgresBeginForeignInsert which decides whether to use the > RTE from the range table or create our own. We seem to need to do one > sometimes and the other sometimes, but I don't know why that is, > really. Nor do I understand why we need the other changes in the > patch. There's probably a good reason, but I haven't figured it out > yet. To reduce the cost of creating RTEs, the original patch uses the RTE in the range table if the partition is an UPDATE subplan partition. One thing I'm concerned about is this change to postgresBeginForeignInsert: + if (resultRelInfo->ri_PartitionRoot && plan) + { + bool dostuff = true; + + if (resultRelation != plan->nominalRelation) + dostuff = false; + else + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex; + + if (dostuff) + { + rte = list_nth(estate->es_range_table, resultRelation - 1); + rte = copyObject(rte); + rte->relid = RelationGetRelid(rel); + rte->relkind = RELKIND_FOREIGN_TABLE; + } + } + + if (rte == NULL) + { + rte = makeNode(RangeTblEntry); + rte->rtekind = RTE_RELATION; + rte->relid = RelationGetRelid(rel); + rte->relkind = RELKIND_FOREIGN_TABLE; + } IIUC, I think this creates a new RTE for an UPDATE subplan partition by makeNode, but I'm not sure if that would work well because we decide which user to do the remote access as by looking at rte->checkAsUser, in create_foreign_modify. Thanks for working on this! Best regards, Etsuro Fujita
On Tue, Apr 24, 2018 at 10:19 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > I tried to do that. So, attached are two patches. > > 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to > InitResultRelInfo > > 2. v5 of the patch to fix the bug of foreign partitions > > Thoughts? Are you splitting this into 2 patches here because there are 2 separate issues? If so, what are those issues? There seem to be a bunch of interrelated changes here but I haven't seen a clear explanation of which ones are needed for which reasons. I agree that fixing ExecInitPartitionInfo to use the right RT index in the first place sounds like a better idea than trying to piece together which RT index we should be using after-the-fact, but I need to better understand what problems we're trying to fix here before I can be sure if that's the strategy I want to endorse... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
(2018/04/25 17:51), Etsuro Fujita wrote: > (2018/04/25 4:49), Robert Haas wrote: >> I didn't really get beyond the refactoring stage with this today. >> This version still seems to work, but I don't really understand the >> logic in postgresBeginForeignInsert which decides whether to use the >> RTE from the range table or create our own. We seem to need to do one >> sometimes and the other sometimes, but I don't know why that is, >> really. Nor do I understand why we need the other changes in the >> patch. There's probably a good reason, but I haven't figured it out >> yet. > > To reduce the cost of creating RTEs, the original patch uses the RTE in > the range table if the partition is an UPDATE subplan partition. One thing to add: in addition to that case, the original patch uses the RTE in the range table if the foreign table is the target specified in a COPY command. Best regards, Etsuro Fujita
(2018/04/25 17:29), Amit Langote wrote: > On 2018/04/25 16:42, Kyotaro HORIGUCHI wrote: >> At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote wrote: >>> After the refactoring, it appears to me that we only need this much: >>> >>> + rte = makeNode(RangeTblEntry); >>> + rte->rtekind = RTE_RELATION; >>> + rte->relid = RelationGetRelid(rel); >>> + rte->relkind = RELKIND_FOREIGN_TABLE; >> >> Mmm.. That is, only relid is required to deparse (I don't mean >> that it should be refactored so.). On the other hand >> create_foreign_modify requires rte->checkAsUser as well. That's right. I took care of this in my version, but unfortuneately, that was ignored in the updated versions. Maybe the comments I added to the patch were not enough, though. > Hmm, I missed that we do need information from a proper RTE as well. So, > I suppose we should be doing this instead of creating the RTE for foreign > partition from scratch. > > + rte = list_nth(estate->es_range_table, resultRelation - 1); > + rte = copyObject(rte); > + rte->relid = RelationGetRelid(rel); > + rte->relkind = RELKIND_FOREIGN_TABLE; As I said upthread, we can use the RTE in the range table as-is if the foreign table is one of the UPDATE subplan partitions or the target specified in a COPY command. So, I created the patch that way because that would save cycles. Why not just use that RTE in those cases? > If we apply the other patch I proposed, resultRelation always points to > the correct RTE. > >>> I tried to do that. So, attached are two patches. >>> >>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >>> InitResultRelInfo >>> >>> 2. v5 of the patch to fix the bug of foreign partitions >>> >>> Thoughts? Actually, I also thought the former when creating the patch, but I left that as-is because I'm not sure that would be really safe; ExecConstraints looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so what I thought was: that might cause unexpected behavior. Anyway, I think that the former is more like an improvement rather than a fix, so it would be better to leave that for another patch for PG12? Best regards, Etsuro Fujita
(2018/04/26 2:59), Robert Haas wrote: > On Tue, Apr 24, 2018 at 10:19 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> I tried to do that. So, attached are two patches. >> >> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >> InitResultRelInfo >> >> 2. v5 of the patch to fix the bug of foreign partitions >> >> Thoughts? > > Are you splitting this into 2 patches here because there are 2 > separate issues? If so, what are those issues? There seem to be a > bunch of interrelated changes here but I haven't seen a clear > explanation of which ones are needed for which reasons. As I said in an earlier email, I'm wondering if it would be better to leave that for another patch for PG12. Best regards, Etsuro Fujita
(2018/04/26 12:43), Etsuro Fujita wrote: > (2018/04/25 17:29), Amit Langote wrote: >> On 2018/04/25 16:42, Kyotaro HORIGUCHI wrote: >>> At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote wrote: >>>> After the refactoring, it appears to me that we only need this much: >>>> >>>> + rte = makeNode(RangeTblEntry); >>>> + rte->rtekind = RTE_RELATION; >>>> + rte->relid = RelationGetRelid(rel); >>>> + rte->relkind = RELKIND_FOREIGN_TABLE; >>> >>> Mmm.. That is, only relid is required to deparse (I don't mean >>> that it should be refactored so.). On the other hand >>> create_foreign_modify requires rte->checkAsUser as well. > > That's right. I took care of this in my version, but unfortuneately, > that was ignored in the updated versions. Maybe the comments I added to > the patch were not enough, though. > >> Hmm, I missed that we do need information from a proper RTE as well. So, >> I suppose we should be doing this instead of creating the RTE for foreign >> partition from scratch. >> >> + rte = list_nth(estate->es_range_table, resultRelation - 1); >> + rte = copyObject(rte); >> + rte->relid = RelationGetRelid(rel); >> + rte->relkind = RELKIND_FOREIGN_TABLE; > > As I said upthread, we can use the RTE in the range table as-is if the > foreign table is one of the UPDATE subplan partitions or the target > specified in a COPY command. So, I created the patch that way because > that would save cycles. Why not just use that RTE in those cases? > >> If we apply the other patch I proposed, resultRelation always points to >> the correct RTE. >> >>>> I tried to do that. So, attached are two patches. >>>> >>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >>>> InitResultRelInfo >>>> >>>> 2. v5 of the patch to fix the bug of foreign partitions >>>> >>>> Thoughts? > > Actually, I also thought the former when creating the patch, but I left > that as-is because I'm not sure that would be really safe; > ExecConstraints looks at the RT index via > GetInsertedColumns/GetUpdatedColumns, so what I thought was: that might > cause unexpected behavior. Anyway, I think that the former is more like > an improvement rather than a fix, so it would be better to leave that > for another patch for PG12? Here is a new version I'd like to propose for fixing this issue without the former patch. Thanks for working on this! Best regards, Etsuro Fujita
Attachment
On 2018/04/26 12:43, Etsuro Fujita wrote: > (2018/04/25 17:29), Amit Langote wrote: >> Hmm, I missed that we do need information from a proper RTE as well. So, >> I suppose we should be doing this instead of creating the RTE for foreign >> partition from scratch. >> >> + rte = list_nth(estate->es_range_table, resultRelation - 1); >> + rte = copyObject(rte); >> + rte->relid = RelationGetRelid(rel); >> + rte->relkind = RELKIND_FOREIGN_TABLE; > > As I said upthread, we can use the RTE in the range table as-is if the > foreign table is one of the UPDATE subplan partitions or the target > specified in a COPY command. So, I created the patch that way because > that would save cycles. Why not just use that RTE in those cases? Yes, I agree it's better to reuse the RTE in the cases we can. So, how does this look: + /* + * If the foreign table is a partition, we need to create a new RTE + * describing the foreign table for use by deparseInsertSql and + * create_foreign_modify() below, after first copying the parent's + * RTE and modifying some fields to describe the foreign partition to + * work on. However, if this is invoked by UPDATE, the existing RTE + * may already correspond to this partition if it is one of the + * UPDATE subplan target rels; in that case, we can just use the + * existing RTE as-is. + */ + rte = list_nth(estate->es_range_table, resultRelation - 1); + if (rte->relid != RelationGetRelid(rel)) + { + rte = copyObject(rte); + rte->relid = RelationGetRelid(rel); + rte->relkind = RELKIND_FOREIGN_TABLE; + + /* + * For UPDATE, we must use the RT index of the first subplan + * target rel's RTE, because the core code would have built + * expressions for the partition, such as RETURNING, using that + * RT index as varno of Vars contained in those expressions. + */ + if (plan && plan->operation == CMD_UPDATE && + resultRelation == plan->nominalRelation) + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex; + } I added the block inside the if because I agree with your comment below that the changes to ExecInitPartitionInfo are better kept for later to think more carefully about the dependencies it might break. >> If we apply the other patch I proposed, resultRelation always points to >> the correct RTE. >> >>>> I tried to do that. So, attached are two patches. >>>> >>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >>>> InitResultRelInfo >>>> >>>> 2. v5 of the patch to fix the bug of foreign partitions >>>> >>>> Thoughts? > > Actually, I also thought the former when creating the patch, but I left > that as-is because I'm not sure that would be really safe; ExecConstraints > looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so what I > thought was: that might cause unexpected behavior. OK, I have to agree here that we better leave 1 to be looked at later. After this change, GetInsertedColumns/GetUpdatedColumns will start returning a different set of columns in some cases than it does now. Currently, it *always* returns a set containing root table's attribute numbers, even for UPDATE. But with this change, for UPDATE, it will start returning the set containing the first subplan target rel's attribute numbers, which could be any partition with arbitrarily different attribute numbers. > Anyway, I think that > the former is more like an improvement rather than a fix, so it would be > better to leave that for another patch for PG12? I agree, so I'm dropping the patch for 1. See attached an updated version with changes as described above. Thanks, Amit
Attachment
On 2018/04/26 15:27, Etsuro Fujita wrote: > (2018/04/26 12:43), Etsuro Fujita wrote: >> (2018/04/25 17:29), Amit Langote wrote: >>> On 2018/04/25 16:42, Kyotaro HORIGUCHI wrote: >>>> At Wed, 25 Apr 2018 11:19:23 +0900, Amit Langote wrote: >>>>> After the refactoring, it appears to me that we only need this much: >>>>> >>>>> + rte = makeNode(RangeTblEntry); >>>>> + rte->rtekind = RTE_RELATION; >>>>> + rte->relid = RelationGetRelid(rel); >>>>> + rte->relkind = RELKIND_FOREIGN_TABLE; >>>> >>>> Mmm.. That is, only relid is required to deparse (I don't mean >>>> that it should be refactored so.). On the other hand >>>> create_foreign_modify requires rte->checkAsUser as well. >> >> That's right. I took care of this in my version, but unfortuneately, >> that was ignored in the updated versions. Maybe the comments I added to >> the patch were not enough, though. >> >>> Hmm, I missed that we do need information from a proper RTE as well. So, >>> I suppose we should be doing this instead of creating the RTE for foreign >>> partition from scratch. >>> >>> + rte = list_nth(estate->es_range_table, resultRelation - 1); >>> + rte = copyObject(rte); >>> + rte->relid = RelationGetRelid(rel); >>> + rte->relkind = RELKIND_FOREIGN_TABLE; >> >> As I said upthread, we can use the RTE in the range table as-is if the >> foreign table is one of the UPDATE subplan partitions or the target >> specified in a COPY command. So, I created the patch that way because >> that would save cycles. Why not just use that RTE in those cases? >> >>> If we apply the other patch I proposed, resultRelation always points to >>> the correct RTE. >>> >>>>> I tried to do that. So, attached are two patches. >>>>> >>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >>>>> InitResultRelInfo >>>>> >>>>> 2. v5 of the patch to fix the bug of foreign partitions >>>>> >>>>> Thoughts? >> >> Actually, I also thought the former when creating the patch, but I left >> that as-is because I'm not sure that would be really safe; >> ExecConstraints looks at the RT index via >> GetInsertedColumns/GetUpdatedColumns, so what I thought was: that might >> cause unexpected behavior. Anyway, I think that the former is more like >> an improvement rather than a fix, so it would be better to leave that >> for another patch for PG12? > > Here is a new version I'd like to propose for fixing this issue without > the former patch. > > Thanks for working on this! Sorry, didn't notice this before sending my patch, which I also marked v7. It's a bit different than yours and has comments with excerpts from your earlier versions. See if you find it helpful. Thanks, Amit
(2018/04/26 15:39), Amit Langote wrote: > On 2018/04/26 15:27, Etsuro Fujita wrote: >> Here is a new version I'd like to propose for fixing this issue without >> the former patch. > Sorry, didn't notice this before sending my patch, which I also marked v7. > > It's a bit different than yours and has comments with excerpts from your > earlier versions. See if you find it helpful. OK, will review your version. Best regards, Etsuro Fujita
On 2018/04/26 2:59, Robert Haas wrote: > On Tue, Apr 24, 2018 at 10:19 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> I tried to do that. So, attached are two patches. >> >> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >> InitResultRelInfo >> >> 2. v5 of the patch to fix the bug of foreign partitions >> >> Thoughts? > > Are you splitting this into 2 patches here because there are 2 > separate issues? If so, what are those issues? There seem to be a > bunch of interrelated changes here but I haven't seen a clear > explanation of which ones are needed for which reasons. > > I agree that fixing ExecInitPartitionInfo to use the right RT index in > the first place sounds like a better idea than trying to piece > together which RT index we should be using after-the-fact, but I need > to better understand what problems we're trying to fix here before I > can be sure if that's the strategy I want to endorse... After considering Fujita-san's comment downthread, I think it might be better to revisit 1 later, because it could break a certain things which rely on ri_RangeTableIndex being set to the index of the root partitioned table's RT entry. Fujita-san pointed out GetInsertedColumns et al that return a set of attribute numbers after looking up the RT entry using ri_RangeTableIndex of a given ResultRelInfo. They would start returning a different set for partitions than previously in some cases due to this change. Maybe, there are other things that rely on what a partition's ri_RangeTableEntry has been set to. For now, I think we'll have to stick with a solution to determine which RT index to pass to deparseInsertSql that involves a bit of reverse-engineering, whereby we base that on where the partition's ResultRelInfo seems to have originated from. If it looks like it was created by ExecInitModifyTable because the partition is one of the subplan target rels, then use the RTE and the RT index as is. If not, ExecInitPartitionInfo would have created it because the partition was chosen as the tuple routing target. In that case, we have to freshly create an RTE for the partition after copying most of the fields from the parent's RTE. We're done if the partition was selected as routing target for an INSERT. If for UPDATE, we also have to change the RT index to pass to deparseInsertSql to that of the first subplan's target rel, because that's what we use as varno of expressions contained in the partition's ResultRelInfo. Phew! Thanks, Amit
(2018/04/26 15:35), Amit Langote wrote: > On 2018/04/26 12:43, Etsuro Fujita wrote: >> (2018/04/25 17:29), Amit Langote wrote: >>> Hmm, I missed that we do need information from a proper RTE as well. So, >>> I suppose we should be doing this instead of creating the RTE for foreign >>> partition from scratch. >>> >>> + rte = list_nth(estate->es_range_table, resultRelation - 1); >>> + rte = copyObject(rte); >>> + rte->relid = RelationGetRelid(rel); >>> + rte->relkind = RELKIND_FOREIGN_TABLE; >> >> As I said upthread, we can use the RTE in the range table as-is if the >> foreign table is one of the UPDATE subplan partitions or the target >> specified in a COPY command. So, I created the patch that way because >> that would save cycles. Why not just use that RTE in those cases? > > Yes, I agree it's better to reuse the RTE in the cases we can. So, how > does this look: > > + /* > + * If the foreign table is a partition, we need to create a new RTE > + * describing the foreign table for use by deparseInsertSql and > + * create_foreign_modify() below, after first copying the parent's > + * RTE and modifying some fields to describe the foreign partition to > + * work on. However, if this is invoked by UPDATE, the existing RTE > + * may already correspond to this partition if it is one of the > + * UPDATE subplan target rels; in that case, we can just use the > + * existing RTE as-is. > + */ > + rte = list_nth(estate->es_range_table, resultRelation - 1); > + if (rte->relid != RelationGetRelid(rel)) > + { > + rte = copyObject(rte); > + rte->relid = RelationGetRelid(rel); > + rte->relkind = RELKIND_FOREIGN_TABLE; > + > + /* > + * For UPDATE, we must use the RT index of the first subplan > + * target rel's RTE, because the core code would have built > + * expressions for the partition, such as RETURNING, using that > + * RT index as varno of Vars contained in those expressions. > + */ > + if (plan&& plan->operation == CMD_UPDATE&& > + resultRelation == plan->nominalRelation) > + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex; > + } Seems like a better change than mine; because this simplifies the logic. > I added the block inside the if because I agree with your comment below > that the changes to ExecInitPartitionInfo are better kept for later to > think more carefully about the dependencies it might break. OK >>> If we apply the other patch I proposed, resultRelation always points to >>> the correct RTE. >>> >>>>> I tried to do that. So, attached are two patches. >>>>> >>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >>>>> InitResultRelInfo >>>>> >>>>> 2. v5 of the patch to fix the bug of foreign partitions >>>>> >>>>> Thoughts? >> >> Actually, I also thought the former when creating the patch, but I left >> that as-is because I'm not sure that would be really safe; ExecConstraints >> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so what I >> thought was: that might cause unexpected behavior. > > OK, I have to agree here that we better leave 1 to be looked at later. > > After this change, GetInsertedColumns/GetUpdatedColumns will start > returning a different set of columns in some cases than it does now. > Currently, it *always* returns a set containing root table's attribute > numbers, even for UPDATE. But with this change, for UPDATE, it will start > returning the set containing the first subplan target rel's attribute > numbers, which could be any partition with arbitrarily different attribute > numbers. > >> Anyway, I think that >> the former is more like an improvement rather than a fix, so it would be >> better to leave that for another patch for PG12? > > I agree, so I'm dropping the patch for 1. OK, let's focus on #2! > See attached an updated version with changes as described above. Looks good to me. Thanks for the updated version! Best regards, Etsuro Fujita
It seems almost fine for me, but just one point.. At Thu, 26 Apr 2018 17:36:44 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <5AE18F9C.6080805@lab.ntt.co.jp> > (2018/04/26 15:35), Amit Langote wrote: > > On 2018/04/26 12:43, Etsuro Fujita wrote: > > + resultRelation == plan->nominalRelation) > > + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex; > > + } > > Seems like a better change than mine; because this simplifies the > logic. Please rewrite it to use not array reference, but pointer reference if one mtstate logically holds just one resultRelInfo. > > I added the block inside the if because I agree with your comment > > below > > that the changes to ExecInitPartitionInfo are better kept for later to > > think more carefully about the dependencies it might break. > > OK > > >>> If we apply the other patch I proposed, resultRelation always points > >>> to > >>> the correct RTE. > >>> > >>>>> I tried to do that. So, attached are two patches. > >>>>> > >>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to > >>>>> InitResultRelInfo > >>>>> > >>>>> 2. v5 of the patch to fix the bug of foreign partitions > >>>>> > >>>>> Thoughts? > >> > >> Actually, I also thought the former when creating the patch, but I > >> left > >> that as-is because I'm not sure that would be really safe; > >> ExecConstraints > >> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so > >> what I > >> thought was: that might cause unexpected behavior. > > > > OK, I have to agree here that we better leave 1 to be looked at later. > > > > After this change, GetInsertedColumns/GetUpdatedColumns will start > > returning a different set of columns in some cases than it does now. > > Currently, it *always* returns a set containing root table's attribute > > numbers, even for UPDATE. But with this change, for UPDATE, it will > > start > > returning the set containing the first subplan target rel's attribute > > numbers, which could be any partition with arbitrarily different > > attribute > > numbers. > > > >> Anyway, I think that > >> the former is more like an improvement rather than a fix, so it would > >> be > >> better to leave that for another patch for PG12? > > > > I agree, so I'm dropping the patch for 1. > > OK, let's focus on #2! > > > See attached an updated version with changes as described above. > > Looks good to me. Thanks for the updated version! Agreed on all points above. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
(2018/04/26 20:06), Kyotaro HORIGUCHI wrote: > It seems almost fine for me, but just one point.. > > At Thu, 26 Apr 2018 17:36:44 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5AE18F9C.6080805@lab.ntt.co.jp> >> (2018/04/26 15:35), Amit Langote wrote: >>> On 2018/04/26 12:43, Etsuro Fujita wrote: >>> + resultRelation == plan->nominalRelation) >>> + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex; >>> + } >> >> Seems like a better change than mine; because this simplifies the >> logic. > > Please rewrite it to use not array reference, but pointer > reference if one mtstate logically holds just one resultRelInfo. Maybe I don't understand your words correctky, but in that UPDATE case, I think that mtstate can have multiple ResultRelInfo. >>>>>>> So, attached are two patches. >>>>>>> >>>>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to >>>>>>> InitResultRelInfo >>>>>>> >>>>>>> 2. v5 of the patch to fix the bug of foreign partitions >>>>>>> >>>>>>> Thoughts? >>>> >>>> Actually, I also thought the former when creating the patch, but I >>>> left >>>> that as-is because I'm not sure that would be really safe; >>>> ExecConstraints >>>> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so >>>> what I >>>> thought was: that might cause unexpected behavior. >>> >>> OK, I have to agree here that we better leave 1 to be looked at later. >>> >>> After this change, GetInsertedColumns/GetUpdatedColumns will start >>> returning a different set of columns in some cases than it does now. >>> Currently, it *always* returns a set containing root table's attribute >>> numbers, even for UPDATE. But with this change, for UPDATE, it will >>> start >>> returning the set containing the first subplan target rel's attribute >>> numbers, which could be any partition with arbitrarily different >>> attribute >>> numbers. >>> >>>> Anyway, I think that >>>> the former is more like an improvement rather than a fix, so it would >>>> be >>>> better to leave that for another patch for PG12? >>> >>> I agree, so I'm dropping the patch for 1. >> >> OK, let's focus on #2! >> >>> See attached an updated version with changes as described above. >> >> Looks good to me. Thanks for the updated version! > > Agreed on all points above. Thanks for reviewing! Best regards, Etsuro Fujita
At Thu, 26 Apr 2018 21:16:38 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <5AE1C326.6040201@lab.ntt.co.jp> > (2018/04/26 20:06), Kyotaro HORIGUCHI wrote: > > Please rewrite it to use not array reference, but pointer > > reference if one mtstate logically holds just one resultRelInfo. > > Maybe I don't understand your words correctky, but in that UPDATE > case, I think that mtstate can have multiple ResultRelInfo. Ah, mtstate has the same number of resultRelInfo with subplans and it is *written* in the comment just above:( And it is exactly for the UPDATE case. Sorry for the silly comment. > >>>> Anyway, I think that > >>>> the former is more like an improvement rather than a fix, so it would > >>>> be > >>>> better to leave that for another patch for PG12? > >>> > >>> I agree, so I'm dropping the patch for 1. > >> > >> OK, let's focus on #2! > >> > >>> See attached an updated version with changes as described above. > >> > >> Looks good to me. Thanks for the updated version! > > > > Agreed on all points above. > > Thanks for reviewing! I'm happy if it helps you. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2018/04/27 10:01, Kyotaro HORIGUCHI wrote: > At Thu, 26 Apr 2018 21:16:38 +0900, Etsuro Fujita wrote: >> (2018/04/26 20:06), Kyotaro HORIGUCHI wrote: >>> Agreed on all points above. >> >> Thanks for reviewing! > > I'm happy if it helps you. Thank you both for reviewing. Regards, Amit
On Thu, Apr 26, 2018 at 9:23 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> Thanks for reviewing! >> >> I'm happy if it helps you. > > Thank you both for reviewing. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-05-01 13:41:32 -0400, Robert Haas wrote: > On Thu, Apr 26, 2018 at 9:23 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >>> Thanks for reviewing! > >> > >> I'm happy if it helps you. > > > > Thank you both for reviewing. > > Committed. There's still an open items entry for this thread - has that been resolved by this commit Greetings, Andres Freund
On 2018/05/02 6:09, Andres Freund wrote: > On 2018-05-01 13:41:32 -0400, Robert Haas wrote: >> On Thu, Apr 26, 2018 at 9:23 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>>> Thanks for reviewing! >>>> >>>> I'm happy if it helps you. >>> >>> Thank you both for reviewing. >> >> Committed. Thank you. > There's still an open items entry for this thread - has that been > resolved by this commit Afaik, yes. So moved that to resolved items. Regards, Amit
(2018/05/02 10:10), Amit Langote wrote: > On 2018/05/02 6:09, Andres Freund wrote: >> On 2018-05-01 13:41:32 -0400, Robert Haas wrote: >>> Committed. > > Thank you. Thanks for committing, Robert! >> There's still an open items entry for this thread - has that been >> resolved by this commit > > Afaik, yes. So moved that to resolved items. Yeah, thanks for that, Amit! Best regards, Etsuro Fujita
(2018/05/02 15:45), Etsuro Fujita wrote: > (2018/05/02 10:10), Amit Langote wrote: >> On 2018/05/02 6:09, Andres Freund wrote: >>> On 2018-05-01 13:41:32 -0400, Robert Haas wrote: >>>> Committed. Here is a small patch to remove a no-longer-needed cast in postgresBeginForeignInsert(). Best regards, Etsuro Fujita
Attachment
On Wed, May 2, 2018 at 7:06 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Here is a small patch to remove a no-longer-needed cast in > postgresBeginForeignInsert(). Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
(2018/05/03 9:29), Robert Haas wrote: > On Wed, May 2, 2018 at 7:06 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Here is a small patch to remove a no-longer-needed cast in >> postgresBeginForeignInsert(). > > Committed. Thanks! Best regards, Etsuro Fujita