Thread: non-bulk inserts and tuple routing
Hi. I have a patch that rearranges the code around partition tuple-routing, such that allocation of per-partition objects (ResultRelInfo, TupleConversionMap, etc.) is delayed until a given partition is actually inserted into (i.e., a tuple is routed to it). I can see good win for non-bulk inserts with the patch and the patch is implemented such that it doesn't affect the bulk-insert case much. Performance numbers: * Uses following hash-partitioned table: create table t1 (a int, b int) partition by hash (a); create table t1_x partition of t1 for values with (modulus M, remainder R) ... * Non-bulk insert uses the following code (insert 100,000 rows one-by-one): do $$ begin for i in 1..100000 loop insert into t1 values (i, i+1); end loop; end; $$; * Times in milliseconds: #parts HEAD Patched 8 6216.300 4977.670 16 9061.388 6360.093 32 14081.656 8752.405 64 24887.110 13919.384 128 45926.251 24582.411 256 88088.084 45490.894 As you can see the performance can be as much as 2x faster with the patch, although time taken still increases as the number of partitions increases, because we still lock *all* partitions at the beginning. * Bulk-inserting 100,000 rows using COPY: copy t1 from '/tmp/t1.csv' csv; * Times in milliseconds: #parts HEAD Patched 8 458.301 450.875 16 409.271 510.723 32 500.960 612.003 64 430.687 795.046 128 449.314 565.786 256 493.171 490.187 Not much harm here, although numbers are a bit noisy. Patch is divided into 4, first 3 of which are refactoring patches. I know this patch will conflict severely with [1] and [2], so it's fine if we consider applying these later. Will add this to next CF. Thanks, Amit [1] https://commitfest.postgresql.org/16/1023/ [2] https://commitfest.postgresql.org/16/1184/
Attachment
On Tue, Dec 19, 2017 at 3:36 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > * Bulk-inserting 100,000 rows using COPY: > > copy t1 from '/tmp/t1.csv' csv; > > * Times in milliseconds: > > #parts HEAD Patched > > 8 458.301 450.875 > 16 409.271 510.723 > 32 500.960 612.003 > 64 430.687 795.046 > 128 449.314 565.786 > 256 493.171 490.187 While the earlier numbers were monotonically increasing with number of partitions, these numbers don't. For example the number on HEAD with 8 partitions is higher than that with 128 partitions as well. That's kind of wierd. May be something wrong with the measurement. Do we see similar unstability when bulk inserting in an unpartitioned table? Also, the numbers against 64 partitions are really bad. That's almost 2x slower. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Hi Ashutosh. On 2017/12/19 19:12, Ashutosh Bapat wrote: > On Tue, Dec 19, 2017 at 3:36 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >> * Bulk-inserting 100,000 rows using COPY: >> >> copy t1 from '/tmp/t1.csv' csv; >> >> * Times in milliseconds: >> >> #parts HEAD Patched >> >> 8 458.301 450.875 >> 16 409.271 510.723 >> 32 500.960 612.003 >> 64 430.687 795.046 >> 128 449.314 565.786 >> 256 493.171 490.187 > > While the earlier numbers were monotonically increasing with number of > partitions, these numbers don't. For example the number on HEAD with 8 > partitions is higher than that with 128 partitions as well. That's > kind of wierd. May be something wrong with the measurement. In the bulk-insert case, we initialize partitions only once, because the COPY that loads those 100,000 rows is executed just once. Whereas in the non-bulk insert case, we initialize partitions (lock, allocate various objects) 100,000 times, because that's how many times the INSERT is executed, once for each of 100,000 rows to be inserted. Without the patch, the object initialization occurs N times where N is the number of partitions. With the patch it occurs just once -- only for the partition to which the row was routed. Time required, although became smaller with the patch, is still monotonically increasing, because the patch didn't do anything about locking all partitions. Does that make sense? > Do we see > similar unstability when bulk inserting in an unpartitioned table? > Also, the numbers against 64 partitions are really bad. That's almost > 2x slower. Sorry, as I said the numbers I initially posted were a bit noisy. I just re-ran that COPY against the patched and get the following numbers: #parts Patched 8 441.852 16 417.510 32 435.276 64 486.497 128 436.473 256 446.312 Thanks, Amit
On 2017/12/19 19:06, Amit Langote wrote: > Hi. > > I have a patch that rearranges the code around partition tuple-routing, > such that allocation of per-partition objects (ResultRelInfo, > TupleConversionMap, etc.) is delayed until a given partition is actually > inserted into (i.e., a tuple is routed to it). I can see good win for > non-bulk inserts with the patch and the patch is implemented such that it > doesn't affect the bulk-insert case much. > > Performance numbers: > > * Uses following hash-partitioned table: > > create table t1 (a int, b int) partition by hash (a); > create table t1_x partition of t1 for values with (modulus M, remainder R) > ... > > > * Non-bulk insert uses the following code (insert 100,000 rows one-by-one): > > do $$ > begin > for i in 1..100000 loop > insert into t1 values (i, i+1); > end loop; > end; $$; > > * Times in milliseconds: > > #parts HEAD Patched > > 8 6216.300 4977.670 > 16 9061.388 6360.093 > 32 14081.656 8752.405 > 64 24887.110 13919.384 > 128 45926.251 24582.411 > 256 88088.084 45490.894 > > As you can see the performance can be as much as 2x faster with the patch, > although time taken still increases as the number of partitions increases, > because we still lock *all* partitions at the beginning. > > * Bulk-inserting 100,000 rows using COPY: > > copy t1 from '/tmp/t1.csv' csv; > > * Times in milliseconds: > > #parts HEAD Patched > > 8 458.301 450.875 > 16 409.271 510.723 > 32 500.960 612.003 > 64 430.687 795.046 > 128 449.314 565.786 > 256 493.171 490.187 > > Not much harm here, although numbers are a bit noisy. > > Patch is divided into 4, first 3 of which are refactoring patches. > > I know this patch will conflict severely with [1] and [2], so it's fine if > we consider applying these later. Will add this to next CF. I rebased the patches, since they started conflicting with a recently committed patch [1]. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cc6337d2fed5
Attachment
On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > I rebased the patches, since they started conflicting with a recently > committed patch [1]. I think that my latest commit has managed to break this pretty thoroughly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018/01/20 7:07, Robert Haas wrote: > On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> I rebased the patches, since they started conflicting with a recently >> committed patch [1]. > > I think that my latest commit has managed to break this pretty thoroughly. I rebased it. Here are the performance numbers again. * Uses following hash-partitioned table: create table t1 (a int, b int) partition by hash (a); create table t1_x partition of t1 for values with (modulus M, remainder R) ... * Non-bulk insert uses the following code (insert 100,000 rows one-by-one): do $$ begin for i in 1..100000 loop insert into t1 values (i, i+1); end loop; end; $$; Times in milliseconds: #parts HEAD Patched 8 6148.313 4938.775 16 8882.420 6203.911 32 14251.072 8595.068 64 24465.691 13718.161 128 45099.435 23898.026 256 87307.332 44428.126 * Bulk-inserting 100,000 rows using COPY: copy t1 from '/tmp/t1.csv' csv; Times in milliseconds: #parts HEAD Patched 8 466.170 446.865 16 445.341 444.990 32 443.544 487.713 64 460.579 435.412 128 469.953 422.403 256 463.592 431.118 Thanks, Amit
Attachment
On 2018/01/24 17:25, Amit Langote wrote: > On 2018/01/20 7:07, Robert Haas wrote: >> On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote wrote: >>> I rebased the patches, since they started conflicting with a recently >>> committed patch [1]. >> >> I think that my latest commit has managed to break this pretty thoroughly. > > I rebased it. Here are the performance numbers again. > > * Uses following hash-partitioned table: > > create table t1 (a int, b int) partition by hash (a); > create table t1_x partition of t1 for values with (modulus M, remainder R) > ... > > > * Non-bulk insert uses the following code (insert 100,000 rows one-by-one): > > do $$ > begin > for i in 1..100000 loop > insert into t1 values (i, i+1); > end loop; > end; $$; > > Times in milliseconds: > > #parts HEAD Patched > 8 6148.313 4938.775 > 16 8882.420 6203.911 > 32 14251.072 8595.068 > 64 24465.691 13718.161 > 128 45099.435 23898.026 > 256 87307.332 44428.126 > > * Bulk-inserting 100,000 rows using COPY: > > copy t1 from '/tmp/t1.csv' csv; > > Times in milliseconds: > > #parts HEAD Patched > > 8 466.170 446.865 > 16 445.341 444.990 > 32 443.544 487.713 > 64 460.579 435.412 > 128 469.953 422.403 > 256 463.592 431.118 Rebased again. Thanks, Amit
Attachment
(2018/01/25 11:11), Amit Langote wrote: > Rebased again. Thanks for the rebased patch! The patches apply cleanly and compile successfully, but make check fails in an assert-enabled build. Best regards, Etsuro Fujita
Fujita-san, Thanks for the review. On 2018/01/25 18:30, Etsuro Fujita wrote: > (2018/01/25 11:11), Amit Langote wrote: >> Rebased again. > > Thanks for the rebased patch! > > The patches apply cleanly and compile successfully, but make check fails > in an assert-enabled build. Hmm, I can't seem to reproduce the failure with v4 patches I posted earlier today. ======================= All 186 tests passed. ======================= Can you please post the errors you're seeing? Thanks, Amit
(2018/01/25 18:52), Amit Langote wrote: > On 2018/01/25 18:30, Etsuro Fujita wrote: >> The patches apply cleanly and compile successfully, but make check fails >> in an assert-enabled build. > > Hmm, I can't seem to reproduce the failure with v4 patches I posted > earlier today. > > ======================= > All 186 tests passed. > ======================= > > Can you please post the errors you're seeing? OK, will do. Best regards, Etsuro Fujita
(2018/01/25 18:52), Amit Langote wrote: > On 2018/01/25 18:30, Etsuro Fujita wrote: >> The patches apply cleanly and compile successfully, but make check fails >> in an assert-enabled build. > > Hmm, I can't seem to reproduce the failure with v4 patches I posted > earlier today. > Can you please post the errors you're seeing? A quick debug showed that the failure was due to a segmentation fault caused by this change to ExecSetupPartitionTupleRouting (in patch v4-0003-During-tuple-routing-initialize-per-partition-obj): - bool is_update = false; + bool is_update; I modified that patch to initialize the is_update to false as before. With the modified version, make check passed successfully. I'll review the patch in more detail! Best regards, Etsuro Fujita
Fujita-san, On 2018/01/30 18:22, Etsuro Fujita wrote: > (2018/01/25 18:52), Amit Langote wrote: >> On 2018/01/25 18:30, Etsuro Fujita wrote: >>> The patches apply cleanly and compile successfully, but make check fails >>> in an assert-enabled build. >> >> Hmm, I can't seem to reproduce the failure with v4 patches I posted >> earlier today. > >> Can you please post the errors you're seeing? > > A quick debug showed that the failure was due to a segmentation fault > caused by this change to ExecSetupPartitionTupleRouting (in patch > v4-0003-During-tuple-routing-initialize-per-partition-obj): > > - bool is_update = false; > > + bool is_update; > > I modified that patch to initialize the is_update to false as before. With > the modified version, make check passed successfully. Oops, my bad. > I'll review the patch in more detail! Thank you. Will wait for your comments before sending a new version then. Regards, Amit
(2018/01/30 18:39), Amit Langote wrote: > Will wait for your comments before sending a new version then. Ok, I'll post my comments as soon as possible. Best regards, Etsuro Fujita
(2018/01/30 18:52), Etsuro Fujita wrote: > (2018/01/30 18:39), Amit Langote wrote: >> Will wait for your comments before sending a new version then. > > Ok, I'll post my comments as soon as possible. * ExecInitPartitionResultRelInfo is called from ExecFindPartition, but we could call that another way; in ExecInsert/CopyFrom we call that after ExecFindPartition if the partition chosen by ExecFindPartition has not been initialized yet. Maybe either would be OK, but I like that because I think that would not only better divide that labor but better fit into the existing code in ExecInsert/CopyFrom IMO. * In ExecInitPartitionResultRelInfo: + /* + * Note that the entries in this list appear in no predetermined + * order as result of initializing partition result rels as and when + * they're needed. + */ + estate->es_leaf_result_relations = + lappend(estate->es_leaf_result_relations, + leaf_part_rri); Is it OK to put this within the "if (leaf_part_rri == NULL)" block? * In the same function: + /* + * Verify result relation is a valid target for INSERT. + */ + CheckValidResultRel(leaf_part_rri, CMD_INSERT); I think it would be better to leave the previous comments as-is here: /* * Verify result relation is a valid target for an INSERT. An UPDATE * of a partition-key becomes a DELETE+INSERT operation, so this check * is still required when the operation is CMD_UPDATE. */ * ExecInitPartitionResultRelInfo does the work other than the initialization of ResultRelInfo for the chosen partition (eg, create a tuple conversion map to convert a tuple routed to the partition from the parent's type to the partition's). So I'd propose to rename that function to eg, ExecInitPartition. * CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting and ExecFindPartition with a mostly-dummy ModifyTableState node. I'm not sure that is a good idea. My concern about that is that might be something like a headache in future development. * The patch 0001 and 0002 are pretty small but can't be reviewed without the patch 0003. The total size of the three patches aren't that large, so I think it would be better to put those patches together into a single patch. That's all for now. I'll continue to review the patches! Best regards, Etsuro Fujita
Fujita-san, Thank you for the review. On 2018/02/02 19:56, Etsuro Fujita wrote: > (2018/01/30 18:52), Etsuro Fujita wrote: >> (2018/01/30 18:39), Amit Langote wrote: >>> Will wait for your comments before sending a new version then. >> >> Ok, I'll post my comments as soon as possible. > > * ExecInitPartitionResultRelInfo is called from ExecFindPartition, but we > could call that another way; in ExecInsert/CopyFrom we call that after > ExecFindPartition if the partition chosen by ExecFindPartition has not > been initialized yet. Maybe either would be OK, but I like that because I > think that would not only better divide that labor but better fit into the > existing code in ExecInsert/CopyFrom IMO. I see no problem with that, so done that way. > * In ExecInitPartitionResultRelInfo: > + /* > + * Note that the entries in this list appear in no predetermined > + * order as result of initializing partition result rels as and when > + * they're needed. > + */ > + estate->es_leaf_result_relations = > + lappend(estate->es_leaf_result_relations, > + leaf_part_rri); > > Is it OK to put this within the "if (leaf_part_rri == NULL)" block? Good catch. I moved it outside the block. I was under the impression that leaf result relations that were reused from the mtstate->resultRelInfo arrary would have already been added to the list, but it seems they are not. > * In the same function: > + /* > + * Verify result relation is a valid target for INSERT. > + */ > + CheckValidResultRel(leaf_part_rri, CMD_INSERT); > > I think it would be better to leave the previous comments as-is here: > > /* > * Verify result relation is a valid target for an INSERT. An UPDATE > * of a partition-key becomes a DELETE+INSERT operation, so this > check > * is still required when the operation is CMD_UPDATE. > */ Oops, my bad. Didn't notice that I had ended up removing the part about UPDATE. > * ExecInitPartitionResultRelInfo does the work other than the > initialization of ResultRelInfo for the chosen partition (eg, create a > tuple conversion map to convert a tuple routed to the partition from the > parent's type to the partition's). So I'd propose to rename that function > to eg, ExecInitPartition. I went with ExevInitPartitionInfo. > * CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting and > ExecFindPartition with a mostly-dummy ModifyTableState node. I'm not sure > that is a good idea. My concern about that is that might be something > like a headache in future development. OK, I removed those changes. > * The patch 0001 and 0002 are pretty small but can't be reviewed without > the patch 0003. The total size of the three patches aren't that large, so > I think it would be better to put those patches together into a single patch. As I said above, I got rid of 0001. Then, I merged the ExecFindPartition() refactoring patch 0002 into 0003. The code in tupconv_map_for_subplan() currently assumes that it can rely on all leaf partitions having been initialized. Since we're breaking that assumption with this proposal, that needed to be changed. So the patch contained some refactoring to make it not rely on that assumption. I carved that out into a separate patch which can be applied and tested before the main patch. > That's all for now. I'll continue to review the patches! Here is the updated version that contains two patches as described above. Thanks, Amit
Attachment
(2018/02/05 14:34), Amit Langote wrote: > On 2018/02/02 19:56, Etsuro Fujita wrote: >> * ExecInitPartitionResultRelInfo is called from ExecFindPartition, but we >> could call that another way; in ExecInsert/CopyFrom we call that after >> ExecFindPartition if the partition chosen by ExecFindPartition has not >> been initialized yet. Maybe either would be OK, but I like that because I >> think that would not only better divide that labor but better fit into the >> existing code in ExecInsert/CopyFrom IMO. > > I see no problem with that, so done that way. Thanks. >> * In ExecInitPartitionResultRelInfo: >> + /* >> + * Note that the entries in this list appear in no predetermined >> + * order as result of initializing partition result rels as and when >> + * they're needed. >> + */ >> + estate->es_leaf_result_relations = >> + lappend(estate->es_leaf_result_relations, >> + leaf_part_rri); >> >> Is it OK to put this within the "if (leaf_part_rri == NULL)" block? > > Good catch. I moved it outside the block. I was under the impression > that leaf result relations that were reused from the > mtstate->resultRelInfo arrary would have already been added to the list, > but it seems they are not. I commented this because the update-tuple-routing patch has added to the list ResultRelInfos reused from the mtstate->resultRelInfo arrary, but on reflection I noticed this would cause oddity in reporting execution stats for partitions' triggers in EXPLAIN ANALYZE. Here is an example using the head: postgres=# create table trigger_test (a int, b text) partition by list (a); CREATE TABLE postgres=# create table trigger_test1 partition of trigger_test for values in (1); CREATE TABLE postgres=# create table trigger_test2 partition of trigger_test for values in (2); CREATE TABLE postgres=# create trigger before_upd_row_trig before update on trigger_test1 for each row execute procedure trigger_data (23, 'skidoo'); CREATE TRIGGER postgres=# create trigger before_del_row_trig before delete on trigger_test1 for each row execute procedure trigger_data (23, 'skidoo'); CREATE TRIGGER postgres=# insert into trigger_test values (1, 'foo'); INSERT 0 1 postgres=# explain analyze update trigger_test set a = 2 where a = 1; NOTICE: before_upd_row_trig(23, skidoo) BEFORE ROW UPDATE ON trigger_test1 NOTICE: OLD: (1,foo),NEW: (2,foo) NOTICE: before_del_row_trig(23, skidoo) BEFORE ROW DELETE ON trigger_test1 NOTICE: OLD: (1,foo) QUERY PLAN -------------------------------------------------------------------------------- ------------------------------- Update on trigger_test (cost=0.00..25.88 rows=6 width=42) (actual time=2.337.. 2.337 rows=0 loops=1) Update on trigger_test1 -> Seq Scan on trigger_test1 (cost=0.00..25.88 rows=6 width=42) (actual tim e=0.009..0.011 rows=1 loops=1) Filter: (a = 1) Planning time: 0.186 ms Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1 Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1 Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1 Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1 Execution time: 2.396 ms (10 rows) Both trigger stats for the on-update and on-delete triggers are doubly shown in the above output. The reason would be that ExplainPrintTriggers called report_triggers twice for trigger_test1's ResultRelInfo: once for it from queryDesc->estate->es_result_relations and once for it from queryDesc->estate->es_leaf_result_relations. I don't think this is intended behavior, so I think we should fix this. I think we could probably address this by modifying ExecInitPartitionInfo in your patch so that it doesn't add to the es_leaf_result_relations list ResultRelInfos reused from the mtstate->resultRelInfo arrary, as your previous version of the patch. (ExecGetTriggerResultRel looks at the list too, but it would probably work well for this change.) It might be better to address this in another patch, though. >> * In the same function: >> + /* >> + * Verify result relation is a valid target for INSERT. >> + */ >> + CheckValidResultRel(leaf_part_rri, CMD_INSERT); >> >> I think it would be better to leave the previous comments as-is here: >> >> /* >> * Verify result relation is a valid target for an INSERT. An UPDATE >> * of a partition-key becomes a DELETE+INSERT operation, so this >> check >> * is still required when the operation is CMD_UPDATE. >> */ > > Oops, my bad. Didn't notice that I had ended up removing the part about > UPDATE. OK. >> * ExecInitPartitionResultRelInfo does the work other than the >> initialization of ResultRelInfo for the chosen partition (eg, create a >> tuple conversion map to convert a tuple routed to the partition from the >> parent's type to the partition's). So I'd propose to rename that function >> to eg, ExecInitPartition. > > I went with ExevInitPartitionInfo. Fine with me. >> * CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting and >> ExecFindPartition with a mostly-dummy ModifyTableState node. I'm not sure >> that is a good idea. My concern about that is that might be something >> like a headache in future development. > > OK, I removed those changes. Thanks. >> * The patch 0001 and 0002 are pretty small but can't be reviewed without >> the patch 0003. The total size of the three patches aren't that large, so >> I think it would be better to put those patches together into a single patch. > > As I said above, I got rid of 0001. Then, I merged the > ExecFindPartition() refactoring patch 0002 into 0003. > > The code in tupconv_map_for_subplan() currently assumes that it can rely > on all leaf partitions having been initialized. Since we're breaking that > assumption with this proposal, that needed to be changed. So the patch > contained some refactoring to make it not rely on that assumption. I > carved that out into a separate patch which can be applied and tested > before the main patch. OK, will review that patch separately. > Here is the updated version that contains two patches as described above. Thanks for updating the patches! I'll post my next comments in a few days. Best regards, Etsuro Fujita
On 2018/02/05 19:43, Etsuro Fujita wrote: > (2018/02/05 14:34), Amit Langote wrote: >> On 2018/02/02 19:56, Etsuro Fujita wrote: >>> * In ExecInitPartitionResultRelInfo: >>> + /* >>> + * Note that the entries in this list appear in no predetermined >>> + * order as result of initializing partition result rels as and >>> when >>> + * they're needed. >>> + */ >>> + estate->es_leaf_result_relations = >>> + lappend(estate->es_leaf_result_relations, >>> + leaf_part_rri); >>> >>> Is it OK to put this within the "if (leaf_part_rri == NULL)" block? >> >> Good catch. I moved it outside the block. I was under the impression >> that leaf result relations that were reused from the >> mtstate->resultRelInfo arrary would have already been added to the list, >> but it seems they are not. > > I commented this because the update-tuple-routing patch has added to the > list ResultRelInfos reused from the mtstate->resultRelInfo arrary, but on > reflection I noticed this would cause oddity in reporting execution stats > for partitions' triggers in EXPLAIN ANALYZE. Here is an example using the > head: > > postgres=# create table trigger_test (a int, b text) partition by list (a); > CREATE TABLE > postgres=# create table trigger_test1 partition of trigger_test for values > in (1); > CREATE TABLE > postgres=# create table trigger_test2 partition of trigger_test for values > in (2); > CREATE TABLE > postgres=# create trigger before_upd_row_trig before update on > trigger_test1 for > each row execute procedure trigger_data (23, 'skidoo'); > CREATE TRIGGER > postgres=# create trigger before_del_row_trig before delete on > trigger_test1 for > each row execute procedure trigger_data (23, 'skidoo'); > CREATE TRIGGER > postgres=# insert into trigger_test values (1, 'foo'); > INSERT 0 1 > postgres=# explain analyze update trigger_test set a = 2 where a = 1; > NOTICE: before_upd_row_trig(23, skidoo) BEFORE ROW UPDATE ON trigger_test1 > NOTICE: OLD: (1,foo),NEW: (2,foo) > NOTICE: before_del_row_trig(23, skidoo) BEFORE ROW DELETE ON trigger_test1 > NOTICE: OLD: (1,foo) > QUERY PLAN > > -------------------------------------------------------------------------------- > > ------------------------------- > Update on trigger_test (cost=0.00..25.88 rows=6 width=42) (actual > time=2.337.. > 2.337 rows=0 loops=1) > Update on trigger_test1 > -> Seq Scan on trigger_test1 (cost=0.00..25.88 rows=6 width=42) > (actual tim > e=0.009..0.011 rows=1 loops=1) > Filter: (a = 1) > Planning time: 0.186 ms > Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1 > Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1 > Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1 > Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1 > Execution time: 2.396 ms > (10 rows) > > Both trigger stats for the on-update and on-delete triggers are doubly > shown in the above output. The reason would be that ExplainPrintTriggers > called report_triggers twice for trigger_test1's ResultRelInfo: once for > it from queryDesc->estate->es_result_relations and once for it from > queryDesc->estate->es_leaf_result_relations. I don't think this is > intended behavior, so I think we should fix this. I think we could > probably address this by modifying ExecInitPartitionInfo in your patch so > that it doesn't add to the es_leaf_result_relations list ResultRelInfos > reused from the mtstate->resultRelInfo arrary, as your previous version of > the patch. (ExecGetTriggerResultRel looks at the list too, but it would > probably work well for this change.) It might be better to address this > in another patch, though. I see. Thanks for the analysis and the explanation. Seeing as this bug exists in HEAD, as you also seem to be saying, we'd need to fix it independently of the patches on this thread. I've posted a patch in another thread titled "update tuple routing and triggers". Thanks, Amit
(2018/02/05 19:43), Etsuro Fujita wrote: > (2018/02/05 14:34), Amit Langote wrote: >> The code in tupconv_map_for_subplan() currently assumes that it can rely >> on all leaf partitions having been initialized. On reflection I noticed this analysis is not 100% correct; I think what that function actually assumes is that all sublplans' partitions have already been initialized, not all leaf partitions. >> Since we're breaking that >> assumption with this proposal, that needed to be changed. So the patch >> contained some refactoring to make it not rely on that assumption. I don't think we really need this refactoring because since that as in another patch you posted, we could initialize all subplans' partitions in ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be called without any changes to that function because of what I said above. >> Here is the updated version that contains two patches as described above. > > Thanks for updating the patches! I'll post my next comments in a few days. Here are comments for the other patch (patch v24-0002-During-tuple-routing-initialize-per-partition-ob.patch): o On changes to ExecSetupPartitionTupleRouting: * The comment below wouldn't be correct; no ExecInitResultRelInfo in the patch. - proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions * - sizeof(ResultRelInfo *)); + /* + * Actual ResultRelInfo's and TupleConversionMap's are allocated in + * ExecInitResultRelInfo(). + */ + proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions * + sizeof(ResultRelInfo *)); * The patch removes this from the initialization step for a subplan's partition, but I think it would be better to keep this here because I think it's a good thing to put the initialization stuff together into one place. - /* - * This is required in order to we convert the partition's - * tuple to be compatible with the root partitioned table's - * tuple descriptor. When generating the per-subplan result - * rels, this was not set. - */ - leaf_part_rri->ri_PartitionRoot = rel; * I think it would be better to keep this comment here. - /* Remember the subplan offset for this ResultRelInfo */ * Why is this removed from that initialization? - proute->partitions[i] = leaf_part_rri; o On changes to ExecInitPartitionInfo: * I don't understand the step starting from this, but I'm wondering if that step can be removed by keeping the above setup of proute->partitions for the subplan's partition in ExecSetupPartitionTupleRouting. + /* + * If we are doing tuple routing for update, try to reuse the + * per-subplan resultrel for this partition that ExecInitModifyTable() + * might already have created. + */ + if (mtstate && mtstate->operation == CMD_UPDATE) That's all I have for now. Best regards, Etsuro Fujita
(2018/02/07 19:36), Etsuro Fujita wrote: > (2018/02/05 19:43), Etsuro Fujita wrote: >> (2018/02/05 14:34), Amit Langote wrote: >>> Here is the updated version that contains two patches as described >>> above. Here are some minor comments: o On changes to ExecInsert * This might be just my taste, but I think it would be better to (1) change ExecInitPartitionInfo so that it creates and returns a newly-initialized ResultRelInfo for an initialized partition, and (2) rewrite this bit: + /* Initialize partition info, if not done already. */ + ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate, + leaf_part_index); + /* * Save the old ResultRelInfo and switch to the one corresponding to * the selected partition. */ saved_resultRelInfo = resultRelInfo; resultRelInfo = proute->partitions[leaf_part_index]; + Assert(resultRelInfo != NULL); to something like this (I would say the same thing to the copy.c changes): /* * Save the old ResultRelInfo and switch to the one corresponding to * the selected partition. */ saved_resultRelInfo = resultRelInfo; resultRelInfo = proute->partitions[leaf_part_index]; if (resultRelInfo == NULL); { /* Initialize partition info. */ resultRelInfo = ExecInitPartitionInfo(mtstate, saved_resultRelInfo, proute, estate, leaf_part_index); } This would make ExecInitPartitionInfo more simple because it can assume that the given partition has not been initialized yet. o On changes to execPartition.h * Please add a brief decsription about partition_oids to the comments for this struct. @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting { PartitionDispatch *partition_dispatch_info; int num_dispatch; + Oid *partition_oids; That's it. Best regards, Etsuro Fujita
On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > if (resultRelInfo == NULL); > { > /* Initialize partition info. */ > resultRelInfo = ExecInitPartitionInfo(mtstate, > saved_resultRelInfo, > proute, > estate, > leaf_part_index); > } I'm pretty sure that code has one semicolon too many. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Fujita-san, Thanks a lot for the review. I had mistakenly tagged these patches v24, but they were actually supposed to be v5. So the attached updated patch is tagged v6. On 2018/02/07 19:36, Etsuro Fujita wrote: >> (2018/02/05 14:34), Amit Langote wrote: >>> The code in tupconv_map_for_subplan() currently assumes that it can rely >>> on all leaf partitions having been initialized. > > On reflection I noticed this analysis is not 100% correct; I think what > that function actually assumes is that all sublplans' partitions have > already been initialized, not all leaf partitions. Yes, you're right. >>> Since we're breaking that >>> assumption with this proposal, that needed to be changed. So the patch >>> contained some refactoring to make it not rely on that assumption. > > I don't think we really need this refactoring because since that as in > another patch you posted, we could initialize all subplans' partitions in > ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be > called without any changes to that function because of what I said above. What my previous approach failed to consider is that in the update case, we'd already have ResultRelInfo's for some of the leaf partitions initialized, which could be saved into proute->partitions array right away. Updated patch does things that way, so all the changes I had proposed to tupconv_map_for_subplan are rendered unnecessary. > Here are comments for the other patch (patch > v24-0002-During-tuple-routing-initialize-per-partition-ob.patch): > > o On changes to ExecSetupPartitionTupleRouting: > > * The comment below wouldn't be correct; no ExecInitResultRelInfo in the > patch. > > - proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions * > - sizeof(ResultRelInfo *)); > + /* > + * Actual ResultRelInfo's and TupleConversionMap's are allocated in > + * ExecInitResultRelInfo(). > + */ > + proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions * > + sizeof(ResultRelInfo *)); I removed the comment altogether, as the comments elsewhere make the point clear. > * The patch removes this from the initialization step for a subplan's > partition, but I think it would be better to keep this here because I > think it's a good thing to put the initialization stuff together into one > place. > > - /* > - * This is required in order to we convert the partition's > - * tuple to be compatible with the root partitioned table's > - * tuple descriptor. When generating the per-subplan result > - * rels, this was not set. > - */ > - leaf_part_rri->ri_PartitionRoot = rel; It wasn't needed here with the previous approach, because we didn't touch any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it back in the updated patch. > * I think it would be better to keep this comment here. > > - /* Remember the subplan offset for this ResultRelInfo */ Fixed. > * Why is this removed from that initialization? > > - proute->partitions[i] = leaf_part_rri; Because of the old approach. Now it's back in. > o On changes to ExecInitPartitionInfo: > > * I don't understand the step starting from this, but I'm wondering if > that step can be removed by keeping the above setup of proute->partitions > for the subplan's partition in ExecSetupPartitionTupleRouting. > > + /* > + * If we are doing tuple routing for update, try to reuse the > + * per-subplan resultrel for this partition that ExecInitModifyTable() > + * might already have created. > + */ > + if (mtstate && mtstate->operation == CMD_UPDATE) Done, as mentioned above. On 2018/02/08 19:16, Etsuro Fujita wrote: > Here are some minor comments: > > o On changes to ExecInsert > > * This might be just my taste, but I think it would be better to (1) > change ExecInitPartitionInfo so that it creates and returns a > newly-initialized ResultRelInfo for an initialized partition, and (2) > rewrite this bit: > > + /* Initialize partition info, if not done already. */ > + ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate, > + leaf_part_index); > + > /* > * Save the old ResultRelInfo and switch to the one corresponding to > * the selected partition. > */ > saved_resultRelInfo = resultRelInfo; > resultRelInfo = proute->partitions[leaf_part_index]; > + Assert(resultRelInfo != NULL); > > to something like this (I would say the same thing to the copy.c changes): > > /* > * Save the old ResultRelInfo and switch to the one corresponding to > * the selected partition. > */ > saved_resultRelInfo = resultRelInfo; > resultRelInfo = proute->partitions[leaf_part_index]; > if (resultRelInfo == NULL); > { > /* Initialize partition info. */ > resultRelInfo = ExecInitPartitionInfo(mtstate, > saved_resultRelInfo, > proute, > estate, > leaf_part_index); > } > > This would make ExecInitPartitionInfo more simple because it can assume > that the given partition has not been initialized yet. Agree that it's much better to do it this way. Done. > o On changes to execPartition.h > > * Please add a brief decsription about partition_oids to the comments for > this struct. > > @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting > { > PartitionDispatch *partition_dispatch_info; > int num_dispatch; > + Oid *partition_oids; Done. Attached v6. Thanks, Amit
Attachment
(2018/02/08 23:21), Robert Haas wrote: > On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> if (resultRelInfo == NULL); >> { >> /* Initialize partition info. */ >> resultRelInfo = ExecInitPartitionInfo(mtstate, >> saved_resultRelInfo, >> proute, >> estate, >> leaf_part_index); >> } > > I'm pretty sure that code has one semicolon too many. Good catch! Best regards, Etsuro Fujita
(2018/02/09 14:32), Amit Langote wrote: > I had mistakenly tagged these patches v24, but they were actually supposed > to be v5. So the attached updated patch is tagged v6. OK. > On 2018/02/07 19:36, Etsuro Fujita wrote: >>> (2018/02/05 14:34), Amit Langote wrote: >>>> The code in tupconv_map_for_subplan() currently assumes that it can rely >>>> on all leaf partitions having been initialized. >> >> On reflection I noticed this analysis is not 100% correct; I think what >> that function actually assumes is that all sublplans' partitions have >> already been initialized, not all leaf partitions. > > Yes, you're right. > >>>> Since we're breaking that >>>> assumption with this proposal, that needed to be changed. So the patch >>>> contained some refactoring to make it not rely on that assumption. >> >> I don't think we really need this refactoring because since that as in >> another patch you posted, we could initialize all subplans' partitions in >> ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be >> called without any changes to that function because of what I said above. > > What my previous approach failed to consider is that in the update case, > we'd already have ResultRelInfo's for some of the leaf partitions > initialized, which could be saved into proute->partitions array right away. > > Updated patch does things that way, so all the changes I had proposed to > tupconv_map_for_subplan are rendered unnecessary. OK, thanks for the updated patch! >> Here are comments for the other patch (patch >> v24-0002-During-tuple-routing-initialize-per-partition-ob.patch): >> >> o On changes to ExecSetupPartitionTupleRouting: >> >> * The comment below wouldn't be correct; no ExecInitResultRelInfo in the >> patch. >> >> - proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions * >> - sizeof(ResultRelInfo *)); >> + /* >> + * Actual ResultRelInfo's and TupleConversionMap's are allocated in >> + * ExecInitResultRelInfo(). >> + */ >> + proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions * >> + sizeof(ResultRelInfo > *)); > > I removed the comment altogether, as the comments elsewhere make the point > clear. > >> * The patch removes this from the initialization step for a subplan's >> partition, but I think it would be better to keep this here because I >> think it's a good thing to put the initialization stuff together into one >> place. >> >> - /* >> - * This is required in order to we convert the partition's >> - * tuple to be compatible with the root partitioned table's >> - * tuple descriptor. When generating the per-subplan result >> - * rels, this was not set. >> - */ >> - leaf_part_rri->ri_PartitionRoot = rel; > > It wasn't needed here with the previous approach, because we didn't touch > any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it > back in the updated patch. > >> * I think it would be better to keep this comment here. >> >> - /* Remember the subplan offset for this ResultRelInfo */ > > Fixed. > >> * Why is this removed from that initialization? >> >> - proute->partitions[i] = leaf_part_rri; > > Because of the old approach. Now it's back in. > >> o On changes to ExecInitPartitionInfo: >> >> * I don't understand the step starting from this, but I'm wondering if >> that step can be removed by keeping the above setup of proute->partitions >> for the subplan's partition in ExecSetupPartitionTupleRouting. >> >> + /* >> + * If we are doing tuple routing for update, try to reuse the >> + * per-subplan resultrel for this partition that ExecInitModifyTable() >> + * might already have created. >> + */ >> + if (mtstate&& mtstate->operation == CMD_UPDATE) > > Done, as mentioned above. > > On 2018/02/08 19:16, Etsuro Fujita wrote: >> Here are some minor comments: >> >> o On changes to ExecInsert >> >> * This might be just my taste, but I think it would be better to (1) >> change ExecInitPartitionInfo so that it creates and returns a >> newly-initialized ResultRelInfo for an initialized partition, and (2) >> rewrite this bit: >> >> + /* Initialize partition info, if not done already. */ >> + ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate, >> + leaf_part_index); >> + >> /* >> * Save the old ResultRelInfo and switch to the one corresponding to >> * the selected partition. >> */ >> saved_resultRelInfo = resultRelInfo; >> resultRelInfo = proute->partitions[leaf_part_index]; >> + Assert(resultRelInfo != NULL); >> >> to something like this (I would say the same thing to the copy.c changes): >> >> /* >> * Save the old ResultRelInfo and switch to the one corresponding to >> * the selected partition. >> */ >> saved_resultRelInfo = resultRelInfo; >> resultRelInfo = proute->partitions[leaf_part_index]; >> if (resultRelInfo == NULL); >> { >> /* Initialize partition info. */ >> resultRelInfo = ExecInitPartitionInfo(mtstate, >> saved_resultRelInfo, >> proute, >> estate, >> leaf_part_index); >> } >> >> This would make ExecInitPartitionInfo more simple because it can assume >> that the given partition has not been initialized yet. > > Agree that it's much better to do it this way. Done. Thanks for all those changes! >> o On changes to execPartition.h >> >> * Please add a brief decsription about partition_oids to the comments for >> this struct. >> >> @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting >> { >> PartitionDispatch *partition_dispatch_info; >> int num_dispatch; >> + Oid *partition_oids; > > Done. Thanks, but one thing I'm wondering is: do we really need this array? I think we could store into PartitionTupleRouting the list of oids returned by RelationGetPartitionDispatchInfo in ExecSetupPartitionTupleRouting, instead. Sorry, I should have commented this in a previous email, but what do you think about that? Here are other comments: o On changes to ExecSetupPartitionTupleRouting: * This is nitpicking, but it would be better to define partrel and part_tupdesc within the if (update_rri_index < num_update_rri && RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid) block. - ResultRelInfo *leaf_part_rri; + ResultRelInfo *leaf_part_rri = NULL; Relation partrel = NULL; TupleDesc part_tupdesc; Oid leaf_oid = lfirst_oid(cell); * Do we need this? For a leaf partition that is already present in the subplan resultrels, the partition's indices (if any) would have already been opened. + /* + * Open partition indices. We wouldn't need speculative + * insertions though. + */ + if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex && + leaf_part_rri->ri_IndexRelationDescs == NULL) + ExecOpenIndices(leaf_part_rri, false); I'll look at the patch a bit more early next week, but other than that, the patch looks fairly in good shape to me. Best regards, Etsuro Fujita
Fujita-san, Thanks for the review. On 2018/02/09 21:20, Etsuro Fujita wrote: >>> * Please add a brief decsription about partition_oids to the comments for >>> this struct. >>> >>> @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting >>> { >>> PartitionDispatch *partition_dispatch_info; >>> int num_dispatch; >>> + Oid *partition_oids; >> >> Done. > > Thanks, but one thing I'm wondering is: do we really need this array? I > think we could store into PartitionTupleRouting the list of oids returned > by RelationGetPartitionDispatchInfo in ExecSetupPartitionTupleRouting, > instead. Sorry, I should have commented this in a previous email, but > what do you think about that? ExecInitPartitionInfo() will have to iterate the list to get the OID of the partition to be initialized. Isn't it much cheaper with the array? > Here are other comments: > > o On changes to ExecSetupPartitionTupleRouting: > > * This is nitpicking, but it would be better to define partrel and > part_tupdesc within the if (update_rri_index < num_update_rri && > RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == > leaf_oid) block. > > - ResultRelInfo *leaf_part_rri; > + ResultRelInfo *leaf_part_rri = NULL; > Relation partrel = NULL; > TupleDesc part_tupdesc; > Oid leaf_oid = lfirst_oid(cell); Sure, done. > * Do we need this? For a leaf partition that is already present in the > subplan resultrels, the partition's indices (if any) would have already > been opened. > > + /* > + * Open partition indices. We wouldn't > need speculative > + * insertions though. > + */ > + if > (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex && > + leaf_part_rri->ri_IndexRelationDescs == NULL) > + ExecOpenIndices(leaf_part_rri, > false); You're right. Removed the call. Updated patch is attached. Thanks, Amit
Attachment
(2018/02/13 10:12), Amit Langote wrote: > On 2018/02/09 21:20, Etsuro Fujita wrote: >>>> * Please add a brief decsription about partition_oids to the comments for >>>> this struct. >>>> >>>> @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting >>>> { >>>> PartitionDispatch *partition_dispatch_info; >>>> int num_dispatch; >>>> + Oid *partition_oids; >>> >>> Done. >> >> Thanks, but one thing I'm wondering is: do we really need this array? I >> think we could store into PartitionTupleRouting the list of oids returned >> by RelationGetPartitionDispatchInfo in ExecSetupPartitionTupleRouting, >> instead. Sorry, I should have commented this in a previous email, but >> what do you think about that? > > ExecInitPartitionInfo() will have to iterate the list to get the OID of > the partition to be initialized. Isn't it much cheaper with the array? Good point! So I agree with adding that array. >> Here are other comments: >> >> o On changes to ExecSetupPartitionTupleRouting: >> >> * This is nitpicking, but it would be better to define partrel and >> part_tupdesc within the if (update_rri_index< num_update_rri&& >> RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == >> leaf_oid) block. >> >> - ResultRelInfo *leaf_part_rri; >> + ResultRelInfo *leaf_part_rri = NULL; >> Relation partrel = NULL; >> TupleDesc part_tupdesc; >> Oid leaf_oid = lfirst_oid(cell); > > Sure, done. > >> * Do we need this? For a leaf partition that is already present in the >> subplan resultrels, the partition's indices (if any) would have already >> been opened. >> >> + /* >> + * Open partition indices. We wouldn't >> need speculative >> + * insertions though. >> + */ >> + if >> (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex&& >> + leaf_part_rri->ri_IndexRelationDescs == NULL) >> + ExecOpenIndices(leaf_part_rri, >> false); > > You're right. Removed the call. Thanks for the above changes! > Updated patch is attached. Thanks, here are some minor comments: o On changes to ExecCleanupTupleRouting: - ExecCloseIndices(resultRelInfo); - heap_close(resultRelInfo->ri_RelationDesc, NoLock); + if (resultRelInfo) + { + ExecCloseIndices(resultRelInfo); + heap_close(resultRelInfo->ri_RelationDesc, NoLock); + } You might check at the top of the loop whether resultRelInfo is NULL and if so do continue. I think that would save cycles a bit. o On ExecInitPartitionInfo: + int firstVarno; + Relation firstResultRel; My old compiler got "variable may be used uninitialized" warnings. + /* + * Build the RETURNING projection if any for the partition. Note that + * we didn't build the returningList for each partition within the + * planner, but simple translation of the varattnos will suffice. + * This only occurs for the INSERT case; in the UPDATE/DELETE cases, + * ExecInitModifyTable() would've initialized this. + */ I think the last comment should be the same as for WCO lists: "This only occurs for the INSERT case or in the case of UPDATE for which we didn't find a result rel above to reuse." + /* + * Initialize result tuple slot and assign its rowtype using the first + * RETURNING list. We assume the rest will look the same. + */ + tupDesc = ExecTypeFromTL(returningList, false); + + /* Set up a slot for the output of the RETURNING projection(s) */ + ExecInitResultTupleSlot(estate, &mtstate->ps); + ExecAssignResultType(&mtstate->ps, tupDesc); + slot = mtstate->ps.ps_ResultTupleSlot; + + /* Need an econtext too */ + if (mtstate->ps.ps_ExprContext == NULL) + ExecAssignExprContext(estate, &mtstate->ps); + econtext = mtstate->ps.ps_ExprContext; Do we need this initialization? I think we would already have the slot and econtext initialized when we get here. Other than that, the patch looks good to me. Sorry for the delay. Best regards, Etsuro Fujita
Fujita-san, Thanks for the review. On 2018/02/15 21:10, Etsuro Fujita wrote: > (2018/02/13 10:12), Amit Langote wrote: >> Updated patch is attached. > > Thanks, here are some minor comments: > > o On changes to ExecCleanupTupleRouting: > > - ExecCloseIndices(resultRelInfo); > - heap_close(resultRelInfo->ri_RelationDesc, NoLock); > + if (resultRelInfo) > + { > + ExecCloseIndices(resultRelInfo); > + heap_close(resultRelInfo->ri_RelationDesc, NoLock); > + } > > You might check at the top of the loop whether resultRelInfo is NULL and > if so do continue. I think that would save cycles a bit. Good point, done. > o On ExecInitPartitionInfo: > > + int firstVarno; > + Relation firstResultRel; > > My old compiler got "variable may be used uninitialized" warnings. Fixed. Actually, I moved those declarations from out here into the blocks where they're actually needed. > + /* > + * Build the RETURNING projection if any for the partition. Note that > + * we didn't build the returningList for each partition within the > + * planner, but simple translation of the varattnos will suffice. > + * This only occurs for the INSERT case; in the UPDATE/DELETE cases, > + * ExecInitModifyTable() would've initialized this. > + */ > > I think the last comment should be the same as for WCO lists: "This only > occurs for the INSERT case or in the case of UPDATE for which we didn't > find a result rel above to reuse." Fixed. The "above" is no longer needed, because there is no code left in ExecInitPartitionInfo() to find UPDATE result rels to reuse. That code is now in ExecSetupPartitionTupleRouting(). > + /* > + * Initialize result tuple slot and assign its rowtype using the > first > + * RETURNING list. We assume the rest will look the same. > + */ > + tupDesc = ExecTypeFromTL(returningList, false); > + > + /* Set up a slot for the output of the RETURNING projection(s) */ > + ExecInitResultTupleSlot(estate, &mtstate->ps); > + ExecAssignResultType(&mtstate->ps, tupDesc); > + slot = mtstate->ps.ps_ResultTupleSlot; > + > + /* Need an econtext too */ > + if (mtstate->ps.ps_ExprContext == NULL) > + ExecAssignExprContext(estate, &mtstate->ps); > + econtext = mtstate->ps.ps_ExprContext; > > Do we need this initialization? I think we would already have the slot > and econtext initialized when we get here. I think you're right. If node->returningLists is non-NULL at all, ExecInitModifyTable() would've initialized the needed slot and expression context. I added Assert()s to that affect. > Other than that, the patch looks good to me. > > Sorry for the delay. No problem! Thanks again. Attached updated patch. Thanks, Amit
Attachment
(2018/02/16 10:49), Amit Langote wrote: > On 2018/02/15 21:10, Etsuro Fujita wrote: >> here are some minor comments: >> >> o On changes to ExecCleanupTupleRouting: >> >> - ExecCloseIndices(resultRelInfo); >> - heap_close(resultRelInfo->ri_RelationDesc, NoLock); >> + if (resultRelInfo) >> + { >> + ExecCloseIndices(resultRelInfo); >> + heap_close(resultRelInfo->ri_RelationDesc, NoLock); >> + } >> >> You might check at the top of the loop whether resultRelInfo is NULL and >> if so do continue. I think that would save cycles a bit. > > Good point, done. Thanks. >> o On ExecInitPartitionInfo: >> >> + int firstVarno; >> + Relation firstResultRel; >> >> My old compiler got "variable may be used uninitialized" warnings. > > Fixed. Actually, I moved those declarations from out here into the blocks > where they're actually needed. OK, my compiler gets no warnings now. >> + /* >> + * Build the RETURNING projection if any for the partition. Note that >> + * we didn't build the returningList for each partition within the >> + * planner, but simple translation of the varattnos will suffice. >> + * This only occurs for the INSERT case; in the UPDATE/DELETE cases, >> + * ExecInitModifyTable() would've initialized this. >> + */ >> >> I think the last comment should be the same as for WCO lists: "This only >> occurs for the INSERT case or in the case of UPDATE for which we didn't >> find a result rel above to reuse." > > Fixed. The "above" is no longer needed, because there is no code left in > ExecInitPartitionInfo() to find UPDATE result rels to reuse. That code is > now in ExecSetupPartitionTupleRouting(). OK, that makes sense. >> + /* >> + * Initialize result tuple slot and assign its rowtype using the >> first >> + * RETURNING list. We assume the rest will look the same. >> + */ >> + tupDesc = ExecTypeFromTL(returningList, false); >> + >> + /* Set up a slot for the output of the RETURNING projection(s) */ >> + ExecInitResultTupleSlot(estate,&mtstate->ps); >> + ExecAssignResultType(&mtstate->ps, tupDesc); >> + slot = mtstate->ps.ps_ResultTupleSlot; >> + >> + /* Need an econtext too */ >> + if (mtstate->ps.ps_ExprContext == NULL) >> + ExecAssignExprContext(estate,&mtstate->ps); >> + econtext = mtstate->ps.ps_ExprContext; >> >> Do we need this initialization? I think we would already have the slot >> and econtext initialized when we get here. > > I think you're right. If node->returningLists is non-NULL at all, > ExecInitModifyTable() would've initialized the needed slot and expression > context. I added Assert()s to that affect. OK, but one thing I'd like to ask is: + /* + * Use the slot that would have been set up in ExecInitModifyTable() + * for the output of the RETURNING projection(s). Just make sure to + * assign its rowtype using the RETURNING list. + */ + Assert(mtstate->ps.ps_ResultTupleSlot != NULL); + tupDesc = ExecTypeFromTL(returningList, false); + ExecAssignResultType(&mtstate->ps, tupDesc); + slot = mtstate->ps.ps_ResultTupleSlot; Do we need that assignment here? > Attached updated patch. Thanks for updating the patch! Best regards, Etsuro Fujita
On 2018/02/16 12:41, Etsuro Fujita wrote: > (2018/02/16 10:49), Amit Langote wrote: >> I think you're right. If node->returningLists is non-NULL at all, >> ExecInitModifyTable() would've initialized the needed slot and expression >> context. I added Assert()s to that affect. > > OK, but one thing I'd like to ask is: > > + /* > + * Use the slot that would have been set up in ExecInitModifyTable() > + * for the output of the RETURNING projection(s). Just make sure to > + * assign its rowtype using the RETURNING list. > + */ > + Assert(mtstate->ps.ps_ResultTupleSlot != NULL); > + tupDesc = ExecTypeFromTL(returningList, false); > + ExecAssignResultType(&mtstate->ps, tupDesc); > + slot = mtstate->ps.ps_ResultTupleSlot; > > Do we need that assignment here? I guess mean the assignment of rowtype, that is, the ExecAssignResultType() line. On looking at this some more, it looks like we don't need to ExecAssignResultType here, as you seem to be suspecting, because we want the RETURNING projection output to use the rowtype of the first of returningLists and that's what mtstate->ps.ps_ResultTupleSlot has been set to use in the first place. So, removed the ExecAssignResultType(). Attached v9. Thanks a for the review! Regards, Amit
Attachment
(2018/02/16 13:42), Amit Langote wrote: > On 2018/02/16 12:41, Etsuro Fujita wrote: >> (2018/02/16 10:49), Amit Langote wrote: >>> I think you're right. If node->returningLists is non-NULL at all, >>> ExecInitModifyTable() would've initialized the needed slot and expression >>> context. I added Assert()s to that affect. >> >> OK, but one thing I'd like to ask is: >> >> + /* >> + * Use the slot that would have been set up in ExecInitModifyTable() >> + * for the output of the RETURNING projection(s). Just make sure to >> + * assign its rowtype using the RETURNING list. >> + */ >> + Assert(mtstate->ps.ps_ResultTupleSlot != NULL); >> + tupDesc = ExecTypeFromTL(returningList, false); >> + ExecAssignResultType(&mtstate->ps, tupDesc); >> + slot = mtstate->ps.ps_ResultTupleSlot; >> >> Do we need that assignment here? > > I guess mean the assignment of rowtype, that is, the > ExecAssignResultType() line. That's right. > On looking at this some more, it looks like > we don't need to ExecAssignResultType here, as you seem to be suspecting, > because we want the RETURNING projection output to use the rowtype of the > first of returningLists and that's what mtstate->ps.ps_ResultTupleSlot has > been set to use in the first place. Year, I think so, too. > So, removed the ExecAssignResultType(). OK, thinks. > Attached v9. Thanks a for the review! Thanks for the updated patch! In the patch you added the comments: + wcoList = linitial(node->withCheckOptionLists); + + /* + * Convert Vars in it to contain this partition's attribute numbers. + * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a + * reference to calculate attno's for the returning expression of + * this partition. In the INSERT case, that refers to the root + * partitioned table, whereas in the UPDATE tuple routing case the + * first partition in the mtstate->resultRelInfo array. In any case, + * both that relation and this partition should have the same columns, + * so we should be able to map attributes successfully. + */ + wcoList = map_partition_varattnos(wcoList, firstVarno, + partrel, firstResultRel, NULL); This would be just nitpicking, but I think it would be better to arrange these comments, maybe by dividing these to something like this: /* * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a * reference to calculate attno's for the returning expression of * this partition. In the INSERT case, that refers to the root * partitioned table, whereas in the UPDATE tuple routing case the * first partition in the mtstate->resultRelInfo array. In any case, * both that relation and this partition should have the same columns, * so we should be able to map attributes successfully. */ wcoList = linitial(node->withCheckOptionLists); /* * Convert Vars in it to contain this partition's attribute numbers. */ wcoList = map_partition_varattnos(wcoList, firstVarno, partrel, firstResultRel, NULL); I'd say the same thing to the comments you added for RETURNING. Another thing I noticed about comments in the patch is: + * In the case of INSERT on partitioned tables, there is only one + * plan. Likewise, there is only one WCO list, not one per + * partition. For UPDATE, there would be as many WCO lists as + * there are plans, but we use the first one as reference. Note + * that if there are SubPlans in there, they all end up attached + * to the one parent Plan node. The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing WCO constraints, which would change the place to attach SubPlans in WCO constraints from the parent PlanState to the subplan's PlanState, which would make the last comment obsolete. Since this change would be more consistent with PG10, I don't think we need to update the comment as such; I would vote for just removing that comment from here. Best regards, Etsuro Fujita
Fujita-san, Thanks for the review. On 2018/02/16 18:12, Etsuro Fujita wrote: > (2018/02/16 13:42), Amit Langote wrote: >> Attached v9. Thanks a for the review! > > Thanks for the updated patch! In the patch you added the comments: > > + wcoList = linitial(node->withCheckOptionLists); > + > + /* > + * Convert Vars in it to contain this partition's attribute numbers. > + * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a > + * reference to calculate attno's for the returning expression of > + * this partition. In the INSERT case, that refers to the root > + * partitioned table, whereas in the UPDATE tuple routing case the > + * first partition in the mtstate->resultRelInfo array. In any case, > + * both that relation and this partition should have the same > columns, > + * so we should be able to map attributes successfully. > + */ > + wcoList = map_partition_varattnos(wcoList, firstVarno, > + partrel, firstResultRel, NULL); > > This would be just nitpicking, but I think it would be better to arrange > these comments, maybe by dividing these to something like this: > > /* > * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a > * reference to calculate attno's for the returning expression of > * this partition. In the INSERT case, that refers to the root > * partitioned table, whereas in the UPDATE tuple routing case the > * first partition in the mtstate->resultRelInfo array. In any case, > * both that relation and this partition should have the same columns, > * so we should be able to map attributes successfully. > */ > wcoList = linitial(node->withCheckOptionLists); > > /* > * Convert Vars in it to contain this partition's attribute numbers. > */ > wcoList = map_partition_varattnos(wcoList, firstVarno, > partrel, firstResultRel, NULL); > > I'd say the same thing to the comments you added for RETURNING. Good idea. Done. > Another thing I noticed about comments in the patch is: > > + * In the case of INSERT on partitioned tables, there is only one > + * plan. Likewise, there is only one WCO list, not one per > + * partition. For UPDATE, there would be as many WCO lists as > + * there are plans, but we use the first one as reference. Note > + * that if there are SubPlans in there, they all end up attached > + * to the one parent Plan node. > > The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing > WCO constraints, which would change the place to attach SubPlans in WCO > constraints from the parent PlanState to the subplan's PlanState, which > would make the last comment obsolete. Since this change would be more > consistent with PG10, I don't think we need to update the comment as such; > I would vote for just removing that comment from here. I have thought about removing it too, so done. Updated patch attached. Thanks, Amit
Attachment
(2018/02/16 18:23), Amit Langote wrote: > On 2018/02/16 18:12, Etsuro Fujita wrote: >> In the patch you added the comments: >> >> + wcoList = linitial(node->withCheckOptionLists); >> + >> + /* >> + * Convert Vars in it to contain this partition's attribute numbers. >> + * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a >> + * reference to calculate attno's for the returning expression of >> + * this partition. In the INSERT case, that refers to the root >> + * partitioned table, whereas in the UPDATE tuple routing case the >> + * first partition in the mtstate->resultRelInfo array. In any case, >> + * both that relation and this partition should have the same >> columns, >> + * so we should be able to map attributes successfully. >> + */ >> + wcoList = map_partition_varattnos(wcoList, firstVarno, >> + partrel, firstResultRel, NULL); >> >> This would be just nitpicking, but I think it would be better to arrange >> these comments, maybe by dividing these to something like this: >> >> /* >> * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a >> * reference to calculate attno's for the returning expression of >> * this partition. In the INSERT case, that refers to the root >> * partitioned table, whereas in the UPDATE tuple routing case the >> * first partition in the mtstate->resultRelInfo array. In any case, >> * both that relation and this partition should have the same columns, >> * so we should be able to map attributes successfully. >> */ >> wcoList = linitial(node->withCheckOptionLists); >> >> /* >> * Convert Vars in it to contain this partition's attribute numbers. >> */ >> wcoList = map_partition_varattnos(wcoList, firstVarno, >> partrel, firstResultRel, NULL); >> >> I'd say the same thing to the comments you added for RETURNING. > > Good idea. Done. Thanks. I fixed a typo (s/Converti/Convert/) and adjusted these comments a bit further to match the preceding code/comments. Attached is the updated version. >> Another thing I noticed about comments in the patch is: >> >> + * In the case of INSERT on partitioned tables, there is only one >> + * plan. Likewise, there is only one WCO list, not one per >> + * partition. For UPDATE, there would be as many WCO lists as >> + * there are plans, but we use the first one as reference. Note >> + * that if there are SubPlans in there, they all end up attached >> + * to the one parent Plan node. >> >> The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing >> WCO constraints, which would change the place to attach SubPlans in WCO >> constraints from the parent PlanState to the subplan's PlanState, which >> would make the last comment obsolete. Since this change would be more >> consistent with PG10, I don't think we need to update the comment as such; >> I would vote for just removing that comment from here. > > I have thought about removing it too, so done. OK. > Updated patch attached. Thanks, I think the patch is in good shape, so I'll mark this as ready for committer. Best regards, Etsuro Fujita
Attachment
Fujita-san, On 2018/02/16 19:50, Etsuro Fujita wrote: > (2018/02/16 18:23), Amit Langote wrote: >> Good idea. Done. > Thanks. I fixed a typo (s/Converti/Convert/) and adjusted these comments > a bit further to match the preceding code/comments. Attached is the > updated version. Thank you for updating the patch. >> Updated patch attached. > > Thanks, I think the patch is in good shape, so I'll mark this as ready for > committer. Thanks a lot for reviewing! Attached rebased patch. Regards, Amit
Attachment
(2018/02/19 13:19), Amit Langote wrote: > Attached rebased patch. Thanks for the rebased patch! One thing I noticed while updating the tuple-routing-for-foreign-partitions patch on top of this is: we should switch into the per-query memory context in ExecInitPartitionInfo. Attached is an updated version for that. Best regards, Etsuro Fujita
Attachment
Fujita-san, On 2018/02/20 19:40, Etsuro Fujita wrote: > (2018/02/19 13:19), Amit Langote wrote: >> Attached rebased patch. > > Thanks for the rebased patch! > > One thing I noticed while updating the > tuple-routing-for-foreign-partitions patch on top of this is: we should > switch into the per-query memory context in ExecInitPartitionInfo. Good catch! > Attached is an updated version for that. Thanks for updating the patch. Thanks, Amit
On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Attached is an updated version for that. > > Thanks for updating the patch. Committed with a few changes. The big one was that I got rid of the local variable is_update in ExecSetupPartitionTupleRouting. That saved a level of indentation on a substantial chunk of code, and it turns out that test was redundant anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> Attached is an updated version for that. > > > > Thanks for updating the patch. > > Committed with a few changes. I propose to tweak a few comments to PartitionTupleRouting, as attached. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Feb 22, 2018 at 11:53 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Robert Haas wrote: >> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >> Attached is an updated version for that. >> > >> > Thanks for updating the patch. >> >> Committed with a few changes. > > I propose to tweak a few comments to PartitionTupleRouting, as attached. Sure, please go ahead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018/02/23 1:53, Alvaro Herrera wrote: > Robert Haas wrote: >> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>> Attached is an updated version for that. >>> >>> Thanks for updating the patch. >> >> Committed with a few changes. > > I propose to tweak a few comments to PartitionTupleRouting, as attached. I'd missed those. Thank you! Regards, Amit
On 2018/02/23 1:10, Robert Haas wrote: > On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> Attached is an updated version for that. >> >> Thanks for updating the patch. > > Committed with a few changes. The big one was that I got rid of the > local variable is_update in ExecSetupPartitionTupleRouting. That > saved a level of indentation on a substantial chunk of code, and it > turns out that test was redundant anyway. Thank you! Regards, Amit
Hi, On 2018-02-22 11:10:57 -0500, Robert Haas wrote: > On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> Attached is an updated version for that. > > > > Thanks for updating the patch. > > Committed with a few changes. The big one was that I got rid of the > local variable is_update in ExecSetupPartitionTupleRouting. That > saved a level of indentation on a substantial chunk of code, and it > turns out that test was redundant anyway. I noticed that this patch broke my JIT patch when I force JITing to be used everywhere (obviously pointless for perf reasons, but good for testing). Turns out to be a single line. ExecInitPartitionInfo has the following block: foreach(ll, wcoList) { WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll)); ExprState *wcoExpr = ExecInitQual(castNode(List, wco->qual), mtstate->mt_plans[0]); wcoExprs = lappend(wcoExprs, wcoExpr); } note how it is passing mtstate->mt_plans[0] as the parent node for the expression. I don't quite know why mtstate->mt_plans[0] was chosen here, it doesn't seem right. The WCO will not be executed in that context. Note that the ExecBuildProjectionInfo() call a few lines below also uses a different context. For JITing that fails because the compiled deform will assume the tuples look like mt_plans[0]'s scantuples. But we're not dealing with those, we're dealing with tuples returned by the relevant subplan. Also note that the code before this commit used ExprState *wcoExpr = ExecInitQual(castNode(List, wco->qual), &mtstate->ps); i.e. the ModifyTableState node, as I'd expect. Greetings, Andres Freund
On 2018-02-22 11:10:57 -0500, Robert Haas wrote: > On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> Attached is an updated version for that. > > > > Thanks for updating the patch. > > Committed with a few changes. The big one was that I got rid of the > local variable is_update in ExecSetupPartitionTupleRouting. That > saved a level of indentation on a substantial chunk of code, and it > turns out that test was redundant anyway. Btw, are there cases where this could change explain output? If there's subplan references or such in any of returning / wcte expressions, they'd not get added at explain time. It's probably fine because add the expressions also "staticly" in ExecInitModifyTable()? Greetings, Andres Freund
On 2018/03/03 13:38, Andres Freund wrote: > Hi, > > On 2018-02-22 11:10:57 -0500, Robert Haas wrote: >> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>> Attached is an updated version for that. >>> >>> Thanks for updating the patch. >> >> Committed with a few changes. The big one was that I got rid of the >> local variable is_update in ExecSetupPartitionTupleRouting. That >> saved a level of indentation on a substantial chunk of code, and it >> turns out that test was redundant anyway. > > I noticed that this patch broke my JIT patch when I force JITing to be > used everywhere (obviously pointless for perf reasons, but good for > testing). Turns out to be a single line. > > ExecInitPartitionInfo has the following block: > foreach(ll, wcoList) > { > WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll)); > ExprState *wcoExpr = ExecInitQual(castNode(List, wco->qual), > mtstate->mt_plans[0]); > > wcoExprs = lappend(wcoExprs, wcoExpr); > } > > note how it is passing mtstate->mt_plans[0] as the parent node for the > expression. I don't quite know why mtstate->mt_plans[0] was chosen > here, it doesn't seem right. The WCO will not be executed in that > context. Note that the ExecBuildProjectionInfo() call a few lines below > also uses a different context. > > For JITing that fails because the compiled deform will assume the tuples > look like mt_plans[0]'s scantuples. But we're not dealing with those, > we're dealing with tuples returned by the relevant subplan. > > Also note that the code before this commit used > ExprState *wcoExpr = ExecInitQual(castNode(List, wco->qual), > &mtstate->ps); > i.e. the ModifyTableState node, as I'd expect. I guess that it was an oversight in my patch. Please find attached a patch to fix that. Thanks, Amit
Attachment
On 2018/03/03 13:48, Andres Freund wrote: > On 2018-02-22 11:10:57 -0500, Robert Haas wrote: >> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>> Attached is an updated version for that. >>> >>> Thanks for updating the patch. >> >> Committed with a few changes. The big one was that I got rid of the >> local variable is_update in ExecSetupPartitionTupleRouting. That >> saved a level of indentation on a substantial chunk of code, and it >> turns out that test was redundant anyway. > > Btw, are there cases where this could change explain output? If there's > subplan references or such in any of returning / wcte expressions, > they'd not get added at explain time. It's probably fine because add > the expressions also "staticly" in ExecInitModifyTable()? Yes, I think. Afaics, explain.c only looks at the information that is "statically" added to ModifyTableState by ExecInitModifyTable. It considers information added by the tuple routing code only when printing information about invoked triggers, that too, only in the case of EXPLAIN ANALYZE. Thanks, Amit