Thread: non-bulk inserts and tuple routing

non-bulk inserts and tuple routing

From

Amit Langote

Date:

19 December 2017, 13:06:16

Hi.

I have a patch that rearranges the code around partition tuple-routing,
such that allocation of per-partition objects (ResultRelInfo,
TupleConversionMap, etc.) is delayed until a given partition is actually
inserted into (i.e., a tuple is routed to it).  I can see good win for
non-bulk inserts with the patch and the patch is implemented such that it
doesn't affect the bulk-insert case much.

Performance numbers:

* Uses following hash-partitioned table:

create table t1 (a int, b int) partition by hash (a);
create table t1_x partition of t1 for values with (modulus M, remainder R)
...


* Non-bulk insert uses the following code (insert 100,000 rows one-by-one):

do $$
begin
  for i in 1..100000 loop
    insert into t1 values (i, i+1);
  end loop;
end; $$;

* Times in milliseconds:

#parts           HEAD        Patched

     8       6216.300       4977.670
    16       9061.388       6360.093
    32      14081.656       8752.405
    64      24887.110      13919.384
   128      45926.251      24582.411
   256      88088.084      45490.894

As you can see the performance can be as much as 2x faster with the patch,
although time taken still increases as the number of partitions increases,
because we still lock *all* partitions at the beginning.

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

* Times in milliseconds:

#parts           HEAD        Patched

     8        458.301        450.875
    16        409.271        510.723
    32        500.960        612.003
    64        430.687        795.046
   128        449.314        565.786
   256        493.171        490.187

Not much harm here, although numbers are a bit noisy.

Patch is divided into 4, first 3 of which are refactoring patches.

I know this patch will conflict severely with [1] and [2], so it's fine if
we consider applying these later.  Will add this to next CF.

Thanks,
Amit

[1] https://commitfest.postgresql.org/16/1023/

[2] https://commitfest.postgresql.org/16/1184/

Attachment

Re: non-bulk inserts and tuple routing

From

Ashutosh Bapat

Date:

19 December 2017, 13:12:10

On Tue, Dec 19, 2017 at 3:36 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
> * Bulk-inserting 100,000 rows using COPY:
>
> copy t1 from '/tmp/t1.csv' csv;
>
> * Times in milliseconds:
>
> #parts           HEAD        Patched
>
>      8        458.301        450.875
>     16        409.271        510.723
>     32        500.960        612.003
>     64        430.687        795.046
>    128        449.314        565.786
>    256        493.171        490.187

While the earlier numbers were monotonically increasing with number of
partitions, these numbers don't. For example the number on HEAD with 8
partitions is higher than that with 128 partitions as well. That's
kind of wierd. May be something wrong with the measurement. Do we see
similar unstability when bulk inserting in an unpartitioned table?
Also, the numbers against 64 partitions are really bad. That's almost
2x slower.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

19 December 2017, 13:33:02

Hi Ashutosh.

On 2017/12/19 19:12, Ashutosh Bapat wrote:
> On Tue, Dec 19, 2017 at 3:36 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>
>> * Bulk-inserting 100,000 rows using COPY:
>>
>> copy t1 from '/tmp/t1.csv' csv;
>>
>> * Times in milliseconds:
>>
>> #parts           HEAD        Patched
>>
>>      8        458.301        450.875
>>     16        409.271        510.723
>>     32        500.960        612.003
>>     64        430.687        795.046
>>    128        449.314        565.786
>>    256        493.171        490.187
> 
> While the earlier numbers were monotonically increasing with number of
> partitions, these numbers don't. For example the number on HEAD with 8
> partitions is higher than that with 128 partitions as well. That's
> kind of wierd. May be something wrong with the measurement.

In the bulk-insert case, we initialize partitions only once, because the
COPY that loads those 100,000 rows is executed just once.

Whereas in the non-bulk insert case, we initialize partitions (lock,
allocate various objects) 100,000 times, because that's how many times the
INSERT is executed, once for each of 100,000 rows to be inserted.

Without the patch, the object initialization occurs N times where N is the
number of partitions.  With the patch it occurs just once -- only for the
partition to which the row was routed.  Time required, although became
smaller with the patch, is still monotonically increasing, because the
patch didn't do anything about locking all partitions.

Does that make sense?

> Do we see
> similar unstability when bulk inserting in an unpartitioned table?
> Also, the numbers against 64 partitions are really bad. That's almost
> 2x slower.

Sorry, as I said the numbers I initially posted were a bit noisy.  I just
re-ran that COPY against the patched and get the following numbers:

#parts        Patched

     8        441.852
    16        417.510
    32        435.276
    64        486.497
   128        436.473
   256        446.312

Thanks,
Amit

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

19 January 2018, 11:56:03

On 2017/12/19 19:06, Amit Langote wrote:
> Hi.
> 
> I have a patch that rearranges the code around partition tuple-routing,
> such that allocation of per-partition objects (ResultRelInfo,
> TupleConversionMap, etc.) is delayed until a given partition is actually
> inserted into (i.e., a tuple is routed to it).  I can see good win for
> non-bulk inserts with the patch and the patch is implemented such that it
> doesn't affect the bulk-insert case much.
> 
> Performance numbers:
> 
> * Uses following hash-partitioned table:
> 
> create table t1 (a int, b int) partition by hash (a);
> create table t1_x partition of t1 for values with (modulus M, remainder R)
> ...
> 
> 
> * Non-bulk insert uses the following code (insert 100,000 rows one-by-one):
> 
> do $$
> begin
>   for i in 1..100000 loop
>     insert into t1 values (i, i+1);
>   end loop;
> end; $$;
> 
> * Times in milliseconds:
> 
> #parts           HEAD        Patched
> 
>      8       6216.300       4977.670
>     16       9061.388       6360.093
>     32      14081.656       8752.405
>     64      24887.110      13919.384
>    128      45926.251      24582.411
>    256      88088.084      45490.894
> 
> As you can see the performance can be as much as 2x faster with the patch,
> although time taken still increases as the number of partitions increases,
> because we still lock *all* partitions at the beginning.
> 
> * Bulk-inserting 100,000 rows using COPY:
> 
> copy t1 from '/tmp/t1.csv' csv;
> 
> * Times in milliseconds:
> 
> #parts           HEAD        Patched
> 
>      8        458.301        450.875
>     16        409.271        510.723
>     32        500.960        612.003
>     64        430.687        795.046
>    128        449.314        565.786
>    256        493.171        490.187
> 
> Not much harm here, although numbers are a bit noisy.
> 
> Patch is divided into 4, first 3 of which are refactoring patches.
> 
> I know this patch will conflict severely with [1] and [2], so it's fine if
> we consider applying these later.  Will add this to next CF.

I rebased the patches, since they started conflicting with a recently
committed patch [1].

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cc6337d2fed5

Attachment

Re: non-bulk inserts and tuple routing

From

Robert Haas

Date:

20 January 2018, 01:07:30

On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> I rebased the patches, since they started conflicting with a recently
> committed patch [1].

I think that my latest commit has managed to break this pretty thoroughly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

24 January 2018, 11:25:08

On 2018/01/20 7:07, Robert Haas wrote:
> On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> I rebased the patches, since they started conflicting with a recently
>> committed patch [1].
> 
> I think that my latest commit has managed to break this pretty thoroughly.

I rebased it.  Here are the performance numbers again.

* Uses following hash-partitioned table:

create table t1 (a int, b int) partition by hash (a);
create table t1_x partition of t1 for values with (modulus M, remainder R)
...


* Non-bulk insert uses the following code (insert 100,000 rows one-by-one):

do $$
begin
  for i in 1..100000 loop
    insert into t1 values (i, i+1);
  end loop;
end; $$;

Times in milliseconds:

#parts           HEAD        Patched
     8       6148.313       4938.775
    16       8882.420       6203.911
    32      14251.072       8595.068
    64      24465.691      13718.161
   128      45099.435      23898.026
   256      87307.332      44428.126

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

Times in milliseconds:

#parts           HEAD        Patched

     8        466.170        446.865
    16        445.341        444.990
    32        443.544        487.713
    64        460.579        435.412
   128        469.953        422.403
   256        463.592        431.118

Thanks,
Amit

Attachment

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

25 January 2018, 05:11:57

On 2018/01/24 17:25, Amit Langote wrote:
> On 2018/01/20 7:07, Robert Haas wrote:
>> On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote wrote:
>>> I rebased the patches, since they started conflicting with a recently
>>> committed patch [1].
>>
>> I think that my latest commit has managed to break this pretty thoroughly.
> 
> I rebased it.  Here are the performance numbers again.
> 
> * Uses following hash-partitioned table:
> 
> create table t1 (a int, b int) partition by hash (a);
> create table t1_x partition of t1 for values with (modulus M, remainder R)
> ...
> 
> 
> * Non-bulk insert uses the following code (insert 100,000 rows one-by-one):
> 
> do $$
> begin
>   for i in 1..100000 loop
>     insert into t1 values (i, i+1);
>   end loop;
> end; $$;
> 
> Times in milliseconds:
> 
> #parts           HEAD        Patched
>      8       6148.313       4938.775
>     16       8882.420       6203.911
>     32      14251.072       8595.068
>     64      24465.691      13718.161
>    128      45099.435      23898.026
>    256      87307.332      44428.126
> 
> * Bulk-inserting 100,000 rows using COPY:
> 
> copy t1 from '/tmp/t1.csv' csv;
> 
> Times in milliseconds:
> 
> #parts           HEAD        Patched
> 
>      8        466.170        446.865
>     16        445.341        444.990
>     32        443.544        487.713
>     64        460.579        435.412
>    128        469.953        422.403
>    256        463.592        431.118

Rebased again.

Thanks,
Amit

Attachment

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

25 January 2018, 12:30:31

(2018/01/25 11:11), Amit Langote wrote:
> Rebased again.

Thanks for the rebased patch!

The patches apply cleanly and compile successfully, but make check fails 
in an assert-enabled build.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

25 January 2018, 12:52:14

Fujita-san,

Thanks for the review.

On 2018/01/25 18:30, Etsuro Fujita wrote:
> (2018/01/25 11:11), Amit Langote wrote:
>> Rebased again.
> 
> Thanks for the rebased patch!
> 
> The patches apply cleanly and compile successfully, but make check fails
> in an assert-enabled build.

Hmm, I can't seem to reproduce the failure with v4 patches I posted
earlier today.

=======================
 All 186 tests passed.
=======================

Can you please post the errors you're seeing?

Thanks,
Amit

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

25 January 2018, 13:03:08

(2018/01/25 18:52), Amit Langote wrote:
> On 2018/01/25 18:30, Etsuro Fujita wrote:
>> The patches apply cleanly and compile successfully, but make check fails
>> in an assert-enabled build.
>
> Hmm, I can't seem to reproduce the failure with v4 patches I posted
> earlier today.
>
> =======================
>   All 186 tests passed.
> =======================
>
> Can you please post the errors you're seeing?

OK, will do.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

30 January 2018, 12:22:50

(2018/01/25 18:52), Amit Langote wrote:
> On 2018/01/25 18:30, Etsuro Fujita wrote:
>> The patches apply cleanly and compile successfully, but make check fails
>> in an assert-enabled build.
>
> Hmm, I can't seem to reproduce the failure with v4 patches I posted
> earlier today.

> Can you please post the errors you're seeing?

A quick debug showed that the failure was due to a segmentation fault 
caused by this change to ExecSetupPartitionTupleRouting (in patch 
v4-0003-During-tuple-routing-initialize-per-partition-obj):

-    bool        is_update = false;

+    bool        is_update;

I modified that patch to initialize the is_update to false as before. 
With the modified version, make check passed successfully.

I'll review the patch in more detail!

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

30 January 2018, 12:39:26

Fujita-san,

On 2018/01/30 18:22, Etsuro Fujita wrote:
> (2018/01/25 18:52), Amit Langote wrote:
>> On 2018/01/25 18:30, Etsuro Fujita wrote:
>>> The patches apply cleanly and compile successfully, but make check fails
>>> in an assert-enabled build.
>>
>> Hmm, I can't seem to reproduce the failure with v4 patches I posted
>> earlier today.
> 
>> Can you please post the errors you're seeing?
> 
> A quick debug showed that the failure was due to a segmentation fault
> caused by this change to ExecSetupPartitionTupleRouting (in patch
> v4-0003-During-tuple-routing-initialize-per-partition-obj):
> 
> -    bool        is_update = false;
> 
> +    bool        is_update;
> 
> I modified that patch to initialize the is_update to false as before. With
> the modified version, make check passed successfully.

Oops, my bad.

> I'll review the patch in more detail!

Thank you.  Will wait for your comments before sending a new version then.

Regards,
Amit

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

30 January 2018, 12:52:21

(2018/01/30 18:39), Amit Langote wrote:
> Will wait for your comments before sending a new version then.

Ok, I'll post my comments as soon as possible.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

02 February 2018, 13:56:31

(2018/01/30 18:52), Etsuro Fujita wrote:
> (2018/01/30 18:39), Amit Langote wrote:
>> Will wait for your comments before sending a new version then.
>
> Ok, I'll post my comments as soon as possible.

* ExecInitPartitionResultRelInfo is called from ExecFindPartition, but 
we could call that another way; in ExecInsert/CopyFrom we call that 
after ExecFindPartition if the partition chosen by ExecFindPartition has 
not been initialized yet.  Maybe either would be OK, but I like that 
because I think that would not only better divide that labor but better 
fit into the existing code in ExecInsert/CopyFrom IMO.

* In ExecInitPartitionResultRelInfo:
+       /*
+        * Note that the entries in this list appear in no predetermined
+        * order as result of initializing partition result rels as and when
+        * they're needed.
+        */
+       estate->es_leaf_result_relations =
+ 
lappend(estate->es_leaf_result_relations,
+                                           leaf_part_rri);

Is it OK to put this within the "if (leaf_part_rri == NULL)" block?

* In the same function:
+   /*
+    * Verify result relation is a valid target for INSERT.
+    */
+   CheckValidResultRel(leaf_part_rri, CMD_INSERT);

I think it would be better to leave the previous comments as-is here:

         /*
          * Verify result relation is a valid target for an INSERT.  An 
UPDATE
          * of a partition-key becomes a DELETE+INSERT operation, so 
this check
          * is still required when the operation is CMD_UPDATE.
          */

* ExecInitPartitionResultRelInfo does the work other than the 
initialization of ResultRelInfo for the chosen partition (eg, create a 
tuple conversion map to convert a tuple routed to the partition from the 
parent's type to the partition's).  So I'd propose to rename that 
function to eg, ExecInitPartition.

* CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting 
and ExecFindPartition with a mostly-dummy ModifyTableState node.  I'm 
not sure that is a good idea.  My concern about that is that might be 
something like a headache in future development.

* The patch 0001 and 0002 are pretty small but can't be reviewed without 
the patch 0003.  The total size of the three patches aren't that large, 
so I think it would be better to put those patches together into a 
single patch.

That's all for now.  I'll continue to review the patches!

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

05 February 2018, 08:34:25

Fujita-san,

Thank you for the review.

On 2018/02/02 19:56, Etsuro Fujita wrote:
> (2018/01/30 18:52), Etsuro Fujita wrote:
>> (2018/01/30 18:39), Amit Langote wrote:
>>> Will wait for your comments before sending a new version then.
>>
>> Ok, I'll post my comments as soon as possible.
> 
> * ExecInitPartitionResultRelInfo is called from ExecFindPartition, but we
> could call that another way; in ExecInsert/CopyFrom we call that after
> ExecFindPartition if the partition chosen by ExecFindPartition has not
> been initialized yet.  Maybe either would be OK, but I like that because I
> think that would not only better divide that labor but better fit into the
> existing code in ExecInsert/CopyFrom IMO.

I see no problem with that, so done that way.

> * In ExecInitPartitionResultRelInfo:
> +       /*
> +        * Note that the entries in this list appear in no predetermined
> +        * order as result of initializing partition result rels as and when
> +        * they're needed.
> +        */
> +       estate->es_leaf_result_relations =
> + lappend(estate->es_leaf_result_relations,
> +                                           leaf_part_rri);
> 
> Is it OK to put this within the "if (leaf_part_rri == NULL)" block?

Good catch.  I moved it outside the block.  I was under the impression
that leaf result relations that were reused from the
mtstate->resultRelInfo arrary would have already been added to the list,
but it seems they are not.

> * In the same function:
> +   /*
> +    * Verify result relation is a valid target for INSERT.
> +    */
> +   CheckValidResultRel(leaf_part_rri, CMD_INSERT);
> 
> I think it would be better to leave the previous comments as-is here:
> 
>         /*
>          * Verify result relation is a valid target for an INSERT.  An UPDATE
>          * of a partition-key becomes a DELETE+INSERT operation, so this
> check
>          * is still required when the operation is CMD_UPDATE.
>          */

Oops, my bad.  Didn't notice that I had ended up removing the part about
UPDATE.

> * ExecInitPartitionResultRelInfo does the work other than the
> initialization of ResultRelInfo for the chosen partition (eg, create a
> tuple conversion map to convert a tuple routed to the partition from the
> parent's type to the partition's).  So I'd propose to rename that function
> to eg, ExecInitPartition.

I went with ExevInitPartitionInfo.

> * CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting and
> ExecFindPartition with a mostly-dummy ModifyTableState node.  I'm not sure
> that is a good idea.  My concern about that is that might be something
> like a headache in future development.

OK, I removed those changes.

> * The patch 0001 and 0002 are pretty small but can't be reviewed without
> the patch 0003.  The total size of the three patches aren't that large, so
> I think it would be better to put those patches together into a single patch.

As I said above, I got rid of 0001.  Then, I merged the
ExecFindPartition() refactoring patch 0002 into 0003.

The code in tupconv_map_for_subplan() currently assumes that it can rely
on all leaf partitions having been initialized.  Since we're breaking that
assumption with this proposal, that needed to be changed.  So the patch
contained some refactoring to make it not rely on that assumption.  I
carved that out into a separate patch which can be applied and tested
before the main patch.

> That's all for now.  I'll continue to review the patches!

Here is the updated version that contains two patches as described above.

Thanks,
Amit

Attachment

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

05 February 2018, 13:43:21

(2018/02/05 14:34), Amit Langote wrote:
> On 2018/02/02 19:56, Etsuro Fujita wrote:
>> * ExecInitPartitionResultRelInfo is called from ExecFindPartition, but we
>> could call that another way; in ExecInsert/CopyFrom we call that after
>> ExecFindPartition if the partition chosen by ExecFindPartition has not
>> been initialized yet.  Maybe either would be OK, but I like that because I
>> think that would not only better divide that labor but better fit into the
>> existing code in ExecInsert/CopyFrom IMO.
>
> I see no problem with that, so done that way.

Thanks.

>> * In ExecInitPartitionResultRelInfo:
>> +       /*
>> +        * Note that the entries in this list appear in no predetermined
>> +        * order as result of initializing partition result rels as and when
>> +        * they're needed.
>> +        */
>> +       estate->es_leaf_result_relations =
>> + lappend(estate->es_leaf_result_relations,
>> +                                           leaf_part_rri);
>>
>> Is it OK to put this within the "if (leaf_part_rri == NULL)" block?
>
> Good catch.  I moved it outside the block.  I was under the impression
> that leaf result relations that were reused from the
> mtstate->resultRelInfo arrary would have already been added to the list,
> but it seems they are not.

I commented this because the update-tuple-routing patch has added to the 
list ResultRelInfos reused from the mtstate->resultRelInfo arrary, but 
on reflection I noticed this would cause oddity in reporting execution 
stats for partitions' triggers in EXPLAIN ANALYZE.  Here is an example 
using the head:

postgres=# create table trigger_test (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table trigger_test1 partition of trigger_test for 
values in (1);
CREATE TABLE
postgres=# create table trigger_test2 partition of trigger_test for 
values in (2);
CREATE TABLE
postgres=# create trigger before_upd_row_trig before update on 
trigger_test1 for
  each row execute procedure trigger_data (23, 'skidoo');
CREATE TRIGGER
postgres=# create trigger before_del_row_trig before delete on 
trigger_test1 for
  each row execute procedure trigger_data (23, 'skidoo');
CREATE TRIGGER
postgres=# insert into trigger_test values (1, 'foo');
INSERT 0 1
postgres=# explain analyze update trigger_test set a = 2 where a = 1;
NOTICE:  before_upd_row_trig(23, skidoo) BEFORE ROW UPDATE ON trigger_test1
NOTICE:  OLD: (1,foo),NEW: (2,foo)
NOTICE:  before_del_row_trig(23, skidoo) BEFORE ROW DELETE ON trigger_test1
NOTICE:  OLD: (1,foo)
                                                   QUERY PLAN

--------------------------------------------------------------------------------
-------------------------------
  Update on trigger_test  (cost=0.00..25.88 rows=6 width=42) (actual 
time=2.337..
2.337 rows=0 loops=1)
    Update on trigger_test1
    ->  Seq Scan on trigger_test1  (cost=0.00..25.88 rows=6 width=42) 
(actual tim
e=0.009..0.011 rows=1 loops=1)
          Filter: (a = 1)
  Planning time: 0.186 ms
  Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1
  Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1
  Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1
  Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1
  Execution time: 2.396 ms
(10 rows)

Both trigger stats for the on-update and on-delete triggers are doubly 
shown in the above output.  The reason would be that 
ExplainPrintTriggers called report_triggers twice for trigger_test1's 
ResultRelInfo: once for it from queryDesc->estate->es_result_relations 
and once for it from queryDesc->estate->es_leaf_result_relations.  I 
don't think this is intended behavior, so I think we should fix this.  I 
think we could probably address this by modifying ExecInitPartitionInfo 
in your patch so that it doesn't add to the es_leaf_result_relations 
list ResultRelInfos reused from the mtstate->resultRelInfo arrary, as 
your previous version of the patch.  (ExecGetTriggerResultRel looks at 
the list too, but it would probably work well for this change.)  It 
might be better to address this in another patch, though.

>> * In the same function:
>> +   /*
>> +    * Verify result relation is a valid target for INSERT.
>> +    */
>> +   CheckValidResultRel(leaf_part_rri, CMD_INSERT);
>>
>> I think it would be better to leave the previous comments as-is here:
>>
>>          /*
>>           * Verify result relation is a valid target for an INSERT.  An UPDATE
>>           * of a partition-key becomes a DELETE+INSERT operation, so this
>> check
>>           * is still required when the operation is CMD_UPDATE.
>>           */
>
> Oops, my bad.  Didn't notice that I had ended up removing the part about
> UPDATE.

OK.

>> * ExecInitPartitionResultRelInfo does the work other than the
>> initialization of ResultRelInfo for the chosen partition (eg, create a
>> tuple conversion map to convert a tuple routed to the partition from the
>> parent's type to the partition's).  So I'd propose to rename that function
>> to eg, ExecInitPartition.
>
> I went with ExevInitPartitionInfo.

Fine with me.

>> * CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting and
>> ExecFindPartition with a mostly-dummy ModifyTableState node.  I'm not sure
>> that is a good idea.  My concern about that is that might be something
>> like a headache in future development.
>
> OK, I removed those changes.

Thanks.

>> * The patch 0001 and 0002 are pretty small but can't be reviewed without
>> the patch 0003.  The total size of the three patches aren't that large, so
>> I think it would be better to put those patches together into a single patch.
>
> As I said above, I got rid of 0001.  Then, I merged the
> ExecFindPartition() refactoring patch 0002 into 0003.
>
> The code in tupconv_map_for_subplan() currently assumes that it can rely
> on all leaf partitions having been initialized.  Since we're breaking that
> assumption with this proposal, that needed to be changed.  So the patch
> contained some refactoring to make it not rely on that assumption.  I
> carved that out into a separate patch which can be applied and tested
> before the main patch.

OK, will review that patch separately.

> Here is the updated version that contains two patches as described above.

Thanks for updating the patches!  I'll post my next comments in a few days.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

06 February 2018, 04:51:44

On 2018/02/05 19:43, Etsuro Fujita wrote:
> (2018/02/05 14:34), Amit Langote wrote:
>> On 2018/02/02 19:56, Etsuro Fujita wrote:
>>> * In ExecInitPartitionResultRelInfo:
>>> +       /*
>>> +        * Note that the entries in this list appear in no predetermined
>>> +        * order as result of initializing partition result rels as and
>>> when
>>> +        * they're needed.
>>> +        */
>>> +       estate->es_leaf_result_relations =
>>> + lappend(estate->es_leaf_result_relations,
>>> +                                           leaf_part_rri);
>>>
>>> Is it OK to put this within the "if (leaf_part_rri == NULL)" block?
>>
>> Good catch.  I moved it outside the block.  I was under the impression
>> that leaf result relations that were reused from the
>> mtstate->resultRelInfo arrary would have already been added to the list,
>> but it seems they are not.
> 
> I commented this because the update-tuple-routing patch has added to the
> list ResultRelInfos reused from the mtstate->resultRelInfo arrary, but on
> reflection I noticed this would cause oddity in reporting execution stats
> for partitions' triggers in EXPLAIN ANALYZE.  Here is an example using the
> head:
> 
> postgres=# create table trigger_test (a int, b text) partition by list (a);
> CREATE TABLE
> postgres=# create table trigger_test1 partition of trigger_test for values
> in (1);
> CREATE TABLE
> postgres=# create table trigger_test2 partition of trigger_test for values
> in (2);
> CREATE TABLE
> postgres=# create trigger before_upd_row_trig before update on
> trigger_test1 for
>  each row execute procedure trigger_data (23, 'skidoo');
> CREATE TRIGGER
> postgres=# create trigger before_del_row_trig before delete on
> trigger_test1 for
>  each row execute procedure trigger_data (23, 'skidoo');
> CREATE TRIGGER
> postgres=# insert into trigger_test values (1, 'foo');
> INSERT 0 1
> postgres=# explain analyze update trigger_test set a = 2 where a = 1;
> NOTICE:  before_upd_row_trig(23, skidoo) BEFORE ROW UPDATE ON trigger_test1
> NOTICE:  OLD: (1,foo),NEW: (2,foo)
> NOTICE:  before_del_row_trig(23, skidoo) BEFORE ROW DELETE ON trigger_test1
> NOTICE:  OLD: (1,foo)
>                                                   QUERY PLAN
> 
> --------------------------------------------------------------------------------
> 
> -------------------------------
>  Update on trigger_test  (cost=0.00..25.88 rows=6 width=42) (actual
> time=2.337..
> 2.337 rows=0 loops=1)
>    Update on trigger_test1
>    ->  Seq Scan on trigger_test1  (cost=0.00..25.88 rows=6 width=42)
> (actual tim
> e=0.009..0.011 rows=1 loops=1)
>          Filter: (a = 1)
>  Planning time: 0.186 ms
>  Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1
>  Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1
>  Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1
>  Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1
>  Execution time: 2.396 ms
> (10 rows)
> 
> Both trigger stats for the on-update and on-delete triggers are doubly
> shown in the above output.  The reason would be that ExplainPrintTriggers
> called report_triggers twice for trigger_test1's ResultRelInfo: once for
> it from queryDesc->estate->es_result_relations and once for it from
> queryDesc->estate->es_leaf_result_relations.  I don't think this is
> intended behavior, so I think we should fix this.  I think we could
> probably address this by modifying ExecInitPartitionInfo in your patch so
> that it doesn't add to the es_leaf_result_relations list ResultRelInfos
> reused from the mtstate->resultRelInfo arrary, as your previous version of
> the patch.  (ExecGetTriggerResultRel looks at the list too, but it would
> probably work well for this change.)  It might be better to address this
> in another patch, though.

I see.  Thanks for the analysis and the explanation.

Seeing as this bug exists in HEAD, as you also seem to be saying, we'd
need to fix it independently of the patches on this thread.  I've posted a
patch in another thread titled "update tuple routing and triggers".

Thanks,
Amit

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

07 February 2018, 13:36:37

(2018/02/05 19:43), Etsuro Fujita wrote:
> (2018/02/05 14:34), Amit Langote wrote:
>> The code in tupconv_map_for_subplan() currently assumes that it can rely
>> on all leaf partitions having been initialized.

On reflection I noticed this analysis is not 100% correct; I think what 
that function actually assumes is that all sublplans' partitions have 
already been initialized, not all leaf partitions.

>> Since we're breaking that
>> assumption with this proposal, that needed to be changed. So the patch
>> contained some refactoring to make it not rely on that assumption.

I don't think we really need this refactoring because since that as in 
another patch you posted, we could initialize all subplans' partitions 
in ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could 
be called without any changes to that function because of what I said above.

>> Here is the updated version that contains two patches as described above.
>
> Thanks for updating the patches!  I'll post my next comments in a few days.

Here are comments for the other patch (patch 
v24-0002-During-tuple-routing-initialize-per-partition-ob.patch):

o On changes to ExecSetupPartitionTupleRouting:

* The comment below wouldn't be correct; no ExecInitResultRelInfo in the 
patch.

-   proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-                                                  sizeof(ResultRelInfo *));
+   /*
+    * Actual ResultRelInfo's and TupleConversionMap's are allocated in
+    * ExecInitResultRelInfo().
+    */
+   proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+                                                   sizeof(ResultRelInfo 
*));

* The patch removes this from the initialization step for a subplan's 
partition, but I think it would be better to keep this here because I 
think it's a good thing to put the initialization stuff together into 
one place.

-               /*
-                * This is required in order to we convert the partition's
-                * tuple to be compatible with the root partitioned table's
-                * tuple descriptor.  When generating the per-subplan result
-                * rels, this was not set.
-                */
-               leaf_part_rri->ri_PartitionRoot = rel;

* I think it would be better to keep this comment here.

-               /* Remember the subplan offset for this ResultRelInfo */

* Why is this removed from that initialization?

-       proute->partitions[i] = leaf_part_rri;

o On changes to ExecInitPartitionInfo:

* I don't understand the step starting from this, but I'm wondering if 
that step can be removed by keeping the above setup of 
proute->partitions for the subplan's partition in 
ExecSetupPartitionTupleRouting.

+   /*
+    * If we are doing tuple routing for update, try to reuse the
+    * per-subplan resultrel for this partition that ExecInitModifyTable()
+    * might already have created.
+    */
+   if (mtstate && mtstate->operation == CMD_UPDATE)

That's all I have for now.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

08 February 2018, 13:16:04

(2018/02/07 19:36), Etsuro Fujita wrote:
> (2018/02/05 19:43), Etsuro Fujita wrote:
>> (2018/02/05 14:34), Amit Langote wrote:
>>> Here is the updated version that contains two patches as described
>>> above.

Here are some minor comments:

o On changes to ExecInsert

* This might be just my taste, but I think it would be better to (1) 
change ExecInitPartitionInfo so that it creates and returns a 
newly-initialized ResultRelInfo for an initialized partition, and (2) 
rewrite this bit:

+       /* Initialize partition info, if not done already. */
+       ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
+                             leaf_part_index);
+
         /*
          * Save the old ResultRelInfo and switch to the one 
corresponding to
          * the selected partition.
          */
         saved_resultRelInfo = resultRelInfo;
         resultRelInfo = proute->partitions[leaf_part_index];
+       Assert(resultRelInfo != NULL);

to something like this (I would say the same thing to the copy.c changes):

     /*
      * Save the old ResultRelInfo and switch to the one corresponding to
      * the selected partition.
      */
     saved_resultRelInfo = resultRelInfo;
     resultRelInfo = proute->partitions[leaf_part_index];
     if (resultRelInfo == NULL);
     {
         /* Initialize partition info. */
         resultRelInfo = ExecInitPartitionInfo(mtstate,
                                               saved_resultRelInfo,
                                               proute,
                                               estate,
                                               leaf_part_index);
     }

This would make ExecInitPartitionInfo more simple because it can assume 
that the given partition has not been initialized yet.

o On changes to execPartition.h

* Please add a brief decsription about partition_oids to the comments 
for this struct.

@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
  {
     PartitionDispatch *partition_dispatch_info;
     int         num_dispatch;
+   Oid        *partition_oids;

That's it.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Robert Haas

Date:

08 February 2018, 17:21:35

On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>     if (resultRelInfo == NULL);
>     {
>         /* Initialize partition info. */
>         resultRelInfo = ExecInitPartitionInfo(mtstate,
>                                               saved_resultRelInfo,
>                                               proute,
>                                               estate,
>                                               leaf_part_index);
>     }

I'm pretty sure that code has one semicolon too many.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

09 February 2018, 08:32:12

Fujita-san,

Thanks a lot for the review.

I had mistakenly tagged these patches v24, but they were actually supposed
to be v5.  So the attached updated patch is tagged v6.

On 2018/02/07 19:36, Etsuro Fujita wrote:
>> (2018/02/05 14:34), Amit Langote wrote:
>>> The code in tupconv_map_for_subplan() currently assumes that it can rely
>>> on all leaf partitions having been initialized.
>
> On reflection I noticed this analysis is not 100% correct; I think what
> that function actually assumes is that all sublplans' partitions have
> already been initialized, not all leaf partitions.

Yes, you're right.

>>> Since we're breaking that
>>> assumption with this proposal, that needed to be changed. So the patch
>>> contained some refactoring to make it not rely on that assumption.
>
> I don't think we really need this refactoring because since that as in
> another patch you posted, we could initialize all subplans' partitions in
> ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be
> called without any changes to that function because of what I said above.

What my previous approach failed to consider is that in the update case,
we'd already have ResultRelInfo's for some of the leaf partitions
initialized, which could be saved into proute->partitions array right away.

Updated patch does things that way, so all the changes I had proposed to
tupconv_map_for_subplan are rendered unnecessary.

> Here are comments for the other patch (patch
> v24-0002-During-tuple-routing-initialize-per-partition-ob.patch):
>
> o On changes to ExecSetupPartitionTupleRouting:
>
> * The comment below wouldn't be correct; no ExecInitResultRelInfo in the
> patch.
>
> -   proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
> -                                                  sizeof(ResultRelInfo *));
> +   /*
> +    * Actual ResultRelInfo's and TupleConversionMap's are allocated in
> +    * ExecInitResultRelInfo().
> +    */
> +   proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
> +                                                   sizeof(ResultRelInfo
*));

I removed the comment altogether, as the comments elsewhere make the point
clear.

> * The patch removes this from the initialization step for a subplan's
> partition, but I think it would be better to keep this here because I
> think it's a good thing to put the initialization stuff together into one
> place.
>
> -               /*
> -                * This is required in order to we convert the partition's
> -                * tuple to be compatible with the root partitioned table's
> -                * tuple descriptor.  When generating the per-subplan result
> -                * rels, this was not set.
> -                */
> -               leaf_part_rri->ri_PartitionRoot = rel;

It wasn't needed here with the previous approach, because we didn't touch
any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it
back in the updated patch.

> * I think it would be better to keep this comment here.
>
> -               /* Remember the subplan offset for this ResultRelInfo */

Fixed.

> * Why is this removed from that initialization?
>
> -       proute->partitions[i] = leaf_part_rri;

Because of the old approach.  Now it's back in.

> o On changes to ExecInitPartitionInfo:
>
> * I don't understand the step starting from this, but I'm wondering if
> that step can be removed by keeping the above setup of proute->partitions
> for the subplan's partition in ExecSetupPartitionTupleRouting.
>
> +   /*
> +    * If we are doing tuple routing for update, try to reuse the
> +    * per-subplan resultrel for this partition that ExecInitModifyTable()
> +    * might already have created.
> +    */
> +   if (mtstate && mtstate->operation == CMD_UPDATE)

Done, as mentioned above.

On 2018/02/08 19:16, Etsuro Fujita wrote:
> Here are some minor comments:
> 
> o On changes to ExecInsert
> 
> * This might be just my taste, but I think it would be better to (1)
> change ExecInitPartitionInfo so that it creates and returns a
> newly-initialized ResultRelInfo for an initialized partition, and (2)
> rewrite this bit:
> 
> +       /* Initialize partition info, if not done already. */
> +       ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
> +                             leaf_part_index);
> +
>         /*
>          * Save the old ResultRelInfo and switch to the one corresponding to
>          * the selected partition.
>          */
>         saved_resultRelInfo = resultRelInfo;
>         resultRelInfo = proute->partitions[leaf_part_index];
> +       Assert(resultRelInfo != NULL);
> 
> to something like this (I would say the same thing to the copy.c changes):
> 
>     /*
>      * Save the old ResultRelInfo and switch to the one corresponding to
>      * the selected partition.
>      */
>     saved_resultRelInfo = resultRelInfo;
>     resultRelInfo = proute->partitions[leaf_part_index];
>     if (resultRelInfo == NULL);
>     {
>         /* Initialize partition info. */
>         resultRelInfo = ExecInitPartitionInfo(mtstate,
>                                               saved_resultRelInfo,
>                                               proute,
>                                               estate,
>                                               leaf_part_index);
>     }
> 
> This would make ExecInitPartitionInfo more simple because it can assume
> that the given partition has not been initialized yet.

Agree that it's much better to do it this way.  Done.

> o On changes to execPartition.h
> 
> * Please add a brief decsription about partition_oids to the comments for
> this struct.
> 
> @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
>  {
>     PartitionDispatch *partition_dispatch_info;
>     int         num_dispatch;
> +   Oid        *partition_oids;

Done.

Attached v6.

Thanks,
Amit

Attachment

v6-0001-During-tuple-routing-initialize-per-partition-obj.patch

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

09 February 2018, 13:26:01

(2018/02/08 23:21), Robert Haas wrote:
> On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>>      if (resultRelInfo == NULL);
>>      {
>>          /* Initialize partition info. */
>>          resultRelInfo = ExecInitPartitionInfo(mtstate,
>>                                                saved_resultRelInfo,
>>                                                proute,
>>                                                estate,
>>                                                leaf_part_index);
>>      }
>
> I'm pretty sure that code has one semicolon too many.

Good catch!

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

09 February 2018, 15:20:03

(2018/02/09 14:32), Amit Langote wrote:
> I had mistakenly tagged these patches v24, but they were actually supposed
> to be v5.  So the attached updated patch is tagged v6.

OK.

> On 2018/02/07 19:36, Etsuro Fujita wrote:
>>> (2018/02/05 14:34), Amit Langote wrote:
>>>> The code in tupconv_map_for_subplan() currently assumes that it can rely
>>>> on all leaf partitions having been initialized.
>>
>> On reflection I noticed this analysis is not 100% correct; I think what
>> that function actually assumes is that all sublplans' partitions have
>> already been initialized, not all leaf partitions.
>
> Yes, you're right.
>
>>>> Since we're breaking that
>>>> assumption with this proposal, that needed to be changed. So the patch
>>>> contained some refactoring to make it not rely on that assumption.
>>
>> I don't think we really need this refactoring because since that as in
>> another patch you posted, we could initialize all subplans' partitions in
>> ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be
>> called without any changes to that function because of what I said above.
>
> What my previous approach failed to consider is that in the update case,
> we'd already have ResultRelInfo's for some of the leaf partitions
> initialized, which could be saved into proute->partitions array right away.
>
> Updated patch does things that way, so all the changes I had proposed to
> tupconv_map_for_subplan are rendered unnecessary.

OK, thanks for the updated patch!

>> Here are comments for the other patch (patch
>> v24-0002-During-tuple-routing-initialize-per-partition-ob.patch):
>>
>> o On changes to ExecSetupPartitionTupleRouting:
>>
>> * The comment below wouldn't be correct; no ExecInitResultRelInfo in the
>> patch.
>>
>> -   proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
>> -                                                  sizeof(ResultRelInfo *));
>> +   /*
>> +    * Actual ResultRelInfo's and TupleConversionMap's are allocated in
>> +    * ExecInitResultRelInfo().
>> +    */
>> +   proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
>> +                                                   sizeof(ResultRelInfo
> *));
>
> I removed the comment altogether, as the comments elsewhere make the point
> clear.
>
>> * The patch removes this from the initialization step for a subplan's
>> partition, but I think it would be better to keep this here because I
>> think it's a good thing to put the initialization stuff together into one
>> place.
>>
>> -               /*
>> -                * This is required in order to we convert the partition's
>> -                * tuple to be compatible with the root partitioned table's
>> -                * tuple descriptor.  When generating the per-subplan result
>> -                * rels, this was not set.
>> -                */
>> -               leaf_part_rri->ri_PartitionRoot = rel;
>
> It wasn't needed here with the previous approach, because we didn't touch
> any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it
> back in the updated patch.
>
>> * I think it would be better to keep this comment here.
>>
>> -               /* Remember the subplan offset for this ResultRelInfo */
>
> Fixed.
>
>> * Why is this removed from that initialization?
>>
>> -       proute->partitions[i] = leaf_part_rri;
>
> Because of the old approach.  Now it's back in.
>
>> o On changes to ExecInitPartitionInfo:
>>
>> * I don't understand the step starting from this, but I'm wondering if
>> that step can be removed by keeping the above setup of proute->partitions
>> for the subplan's partition in ExecSetupPartitionTupleRouting.
>>
>> +   /*
>> +    * If we are doing tuple routing for update, try to reuse the
>> +    * per-subplan resultrel for this partition that ExecInitModifyTable()
>> +    * might already have created.
>> +    */
>> +   if (mtstate&&  mtstate->operation == CMD_UPDATE)
>
> Done, as mentioned above.
>
> On 2018/02/08 19:16, Etsuro Fujita wrote:
>> Here are some minor comments:
>>
>> o On changes to ExecInsert
>>
>> * This might be just my taste, but I think it would be better to (1)
>> change ExecInitPartitionInfo so that it creates and returns a
>> newly-initialized ResultRelInfo for an initialized partition, and (2)
>> rewrite this bit:
>>
>> +       /* Initialize partition info, if not done already. */
>> +       ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
>> +                             leaf_part_index);
>> +
>>          /*
>>           * Save the old ResultRelInfo and switch to the one corresponding to
>>           * the selected partition.
>>           */
>>          saved_resultRelInfo = resultRelInfo;
>>          resultRelInfo = proute->partitions[leaf_part_index];
>> +       Assert(resultRelInfo != NULL);
>>
>> to something like this (I would say the same thing to the copy.c changes):
>>
>>      /*
>>       * Save the old ResultRelInfo and switch to the one corresponding to
>>       * the selected partition.
>>       */
>>      saved_resultRelInfo = resultRelInfo;
>>      resultRelInfo = proute->partitions[leaf_part_index];
>>      if (resultRelInfo == NULL);
>>      {
>>          /* Initialize partition info. */
>>          resultRelInfo = ExecInitPartitionInfo(mtstate,
>>                                                saved_resultRelInfo,
>>                                                proute,
>>                                                estate,
>>                                                leaf_part_index);
>>      }
>>
>> This would make ExecInitPartitionInfo more simple because it can assume
>> that the given partition has not been initialized yet.
>
> Agree that it's much better to do it this way.  Done.

Thanks for all those changes!

>> o On changes to execPartition.h
>>
>> * Please add a brief decsription about partition_oids to the comments for
>> this struct.
>>
>> @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
>>   {
>>      PartitionDispatch *partition_dispatch_info;
>>      int         num_dispatch;
>> +   Oid        *partition_oids;
>
> Done.

Thanks, but one thing I'm wondering is: do we really need this array?  I 
think we could store into PartitionTupleRouting the list of oids 
returned by RelationGetPartitionDispatchInfo in 
ExecSetupPartitionTupleRouting, instead.  Sorry, I should have commented 
this in a previous email, but what do you think about that?

Here are other comments:

o On changes to ExecSetupPartitionTupleRouting:

* This is nitpicking, but it would be better to define partrel and 
part_tupdesc within the if (update_rri_index < num_update_rri && 
RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == 
leaf_oid) block.

-               ResultRelInfo *leaf_part_rri;
+               ResultRelInfo *leaf_part_rri = NULL;
                 Relation        partrel = NULL;
                 TupleDesc       part_tupdesc;
                 Oid                     leaf_oid = lfirst_oid(cell);

* Do we need this?  For a leaf partition that is already present in the 
subplan resultrels, the partition's indices (if any) would have already 
been opened.

+                               /*
+                                * Open partition indices.  We wouldn't 
need speculative
+                                * insertions though.
+                                */
+                               if 
(leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
+ 
leaf_part_rri->ri_IndexRelationDescs == NULL)
+                                       ExecOpenIndices(leaf_part_rri, 
false);

I'll look at the patch a bit more early next week, but other than that, 
the patch looks fairly in good shape to me.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

13 February 2018, 04:12:40

Fujita-san,

Thanks for the review.

On 2018/02/09 21:20, Etsuro Fujita wrote:
>>> * Please add a brief decsription about partition_oids to the comments for
>>> this struct.
>>>
>>> @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
>>>   {
>>>      PartitionDispatch *partition_dispatch_info;
>>>      int         num_dispatch;
>>> +   Oid        *partition_oids;
>>
>> Done.
> 
> Thanks, but one thing I'm wondering is: do we really need this array?  I
> think we could store into PartitionTupleRouting the list of oids returned
> by RelationGetPartitionDispatchInfo in ExecSetupPartitionTupleRouting,
> instead.  Sorry, I should have commented this in a previous email, but
> what do you think about that?

ExecInitPartitionInfo() will have to iterate the list to get the OID of
the partition to be initialized.  Isn't it much cheaper with the array?

> Here are other comments:
> 
> o On changes to ExecSetupPartitionTupleRouting:
> 
> * This is nitpicking, but it would be better to define partrel and
> part_tupdesc within the if (update_rri_index < num_update_rri &&
> RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) ==
> leaf_oid) block.
> 
> -               ResultRelInfo *leaf_part_rri;
> +               ResultRelInfo *leaf_part_rri = NULL;
>                 Relation        partrel = NULL;
>                 TupleDesc       part_tupdesc;
>                 Oid                     leaf_oid = lfirst_oid(cell);

Sure, done.

> * Do we need this?  For a leaf partition that is already present in the
> subplan resultrels, the partition's indices (if any) would have already
> been opened.
> 
> +                               /*
> +                                * Open partition indices.  We wouldn't
> need speculative
> +                                * insertions though.
> +                                */
> +                               if
> (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
> + leaf_part_rri->ri_IndexRelationDescs == NULL)
> +                                       ExecOpenIndices(leaf_part_rri,
> false);

You're right.  Removed the call.

Updated patch is attached.

Thanks,
Amit

Attachment

v7-0001-During-tuple-routing-initialize-per-partition-obj.patch

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

15 February 2018, 15:10:41

(2018/02/13 10:12), Amit Langote wrote:
> On 2018/02/09 21:20, Etsuro Fujita wrote:
>>>> * Please add a brief decsription about partition_oids to the comments for
>>>> this struct.
>>>>
>>>> @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
>>>>    {
>>>>       PartitionDispatch *partition_dispatch_info;
>>>>       int         num_dispatch;
>>>> +   Oid        *partition_oids;
>>>
>>> Done.
>>
>> Thanks, but one thing I'm wondering is: do we really need this array?  I
>> think we could store into PartitionTupleRouting the list of oids returned
>> by RelationGetPartitionDispatchInfo in ExecSetupPartitionTupleRouting,
>> instead.  Sorry, I should have commented this in a previous email, but
>> what do you think about that?
>
> ExecInitPartitionInfo() will have to iterate the list to get the OID of
> the partition to be initialized.  Isn't it much cheaper with the array?

Good point!  So I agree with adding that array.

>> Here are other comments:
>>
>> o On changes to ExecSetupPartitionTupleRouting:
>>
>> * This is nitpicking, but it would be better to define partrel and
>> part_tupdesc within the if (update_rri_index<  num_update_rri&&
>> RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) ==
>> leaf_oid) block.
>>
>> -               ResultRelInfo *leaf_part_rri;
>> +               ResultRelInfo *leaf_part_rri = NULL;
>>                  Relation        partrel = NULL;
>>                  TupleDesc       part_tupdesc;
>>                  Oid                     leaf_oid = lfirst_oid(cell);
>
> Sure, done.
>
>> * Do we need this?  For a leaf partition that is already present in the
>> subplan resultrels, the partition's indices (if any) would have already
>> been opened.
>>
>> +                               /*
>> +                                * Open partition indices.  We wouldn't
>> need speculative
>> +                                * insertions though.
>> +                                */
>> +                               if
>> (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex&&
>> + leaf_part_rri->ri_IndexRelationDescs == NULL)
>> +                                       ExecOpenIndices(leaf_part_rri,
>> false);
>
> You're right.  Removed the call.

Thanks for the above changes!

> Updated patch is attached.

Thanks, here are some minor comments:

o On changes to ExecCleanupTupleRouting:

-       ExecCloseIndices(resultRelInfo);
-       heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+       if (resultRelInfo)
+       {
+           ExecCloseIndices(resultRelInfo);
+           heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+       }

You might check at the top of the loop whether resultRelInfo is NULL and 
if so do continue.  I think that would save cycles a bit.

o On ExecInitPartitionInfo:

+   int         firstVarno;
+   Relation    firstResultRel;

My old compiler got "variable may be used uninitialized" warnings.

+   /*
+    * Build the RETURNING projection if any for the partition.  Note that
+    * we didn't build the returningList for each partition within the
+    * planner, but simple translation of the varattnos will suffice.
+    * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+    * ExecInitModifyTable() would've initialized this.
+    */

I think the last comment should be the same as for WCO lists: "This only 
occurs for the INSERT case or in the case of UPDATE for which we didn't 
find a result rel above to reuse."

+       /*
+        * Initialize result tuple slot and assign its rowtype using the 
first
+        * RETURNING list.  We assume the rest will look the same.
+        */
+       tupDesc = ExecTypeFromTL(returningList, false);
+
+       /* Set up a slot for the output of the RETURNING projection(s) */
+       ExecInitResultTupleSlot(estate, &mtstate->ps);
+       ExecAssignResultType(&mtstate->ps, tupDesc);
+       slot = mtstate->ps.ps_ResultTupleSlot;
+
+       /* Need an econtext too */
+       if (mtstate->ps.ps_ExprContext == NULL)
+           ExecAssignExprContext(estate, &mtstate->ps);
+       econtext = mtstate->ps.ps_ExprContext;

Do we need this initialization?  I think we would already have the slot 
and econtext initialized when we get here.

Other than that, the patch looks good to me.

Sorry for the delay.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

16 February 2018, 04:49:12

Fujita-san,

Thanks for the review.

On 2018/02/15 21:10, Etsuro Fujita wrote:
> (2018/02/13 10:12), Amit Langote wrote:
>> Updated patch is attached.
> 
> Thanks, here are some minor comments:
> 
> o On changes to ExecCleanupTupleRouting:
> 
> -       ExecCloseIndices(resultRelInfo);
> -       heap_close(resultRelInfo->ri_RelationDesc, NoLock);
> +       if (resultRelInfo)
> +       {
> +           ExecCloseIndices(resultRelInfo);
> +           heap_close(resultRelInfo->ri_RelationDesc, NoLock);
> +       }
> 
> You might check at the top of the loop whether resultRelInfo is NULL and
> if so do continue.  I think that would save cycles a bit.

Good point, done.

> o On ExecInitPartitionInfo:
> 
> +   int         firstVarno;
> +   Relation    firstResultRel;
> 
> My old compiler got "variable may be used uninitialized" warnings.

Fixed.  Actually, I moved those declarations from out here into the blocks
where they're actually needed.

> +   /*
> +    * Build the RETURNING projection if any for the partition.  Note that
> +    * we didn't build the returningList for each partition within the
> +    * planner, but simple translation of the varattnos will suffice.
> +    * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
> +    * ExecInitModifyTable() would've initialized this.
> +    */
> 
> I think the last comment should be the same as for WCO lists: "This only
> occurs for the INSERT case or in the case of UPDATE for which we didn't
> find a result rel above to reuse."

Fixed.  The "above" is no longer needed, because there is no code left in
ExecInitPartitionInfo() to find UPDATE result rels to reuse.  That code is
now in ExecSetupPartitionTupleRouting().

> +       /*
> +        * Initialize result tuple slot and assign its rowtype using the
> first
> +        * RETURNING list.  We assume the rest will look the same.
> +        */
> +       tupDesc = ExecTypeFromTL(returningList, false);
> +
> +       /* Set up a slot for the output of the RETURNING projection(s) */
> +       ExecInitResultTupleSlot(estate, &mtstate->ps);
> +       ExecAssignResultType(&mtstate->ps, tupDesc);
> +       slot = mtstate->ps.ps_ResultTupleSlot;
> +
> +       /* Need an econtext too */
> +       if (mtstate->ps.ps_ExprContext == NULL)
> +           ExecAssignExprContext(estate, &mtstate->ps);
> +       econtext = mtstate->ps.ps_ExprContext;
> 
> Do we need this initialization?  I think we would already have the slot
> and econtext initialized when we get here.

I think you're right.  If node->returningLists is non-NULL at all,
ExecInitModifyTable() would've initialized the needed slot and expression
context.  I added Assert()s to that affect.

> Other than that, the patch looks good to me.
> 
> Sorry for the delay.

No problem! Thanks again.

Attached updated patch.

Thanks,
Amit

Attachment

v8-0001-During-tuple-routing-initialize-per-partition-obj.patch

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

16 February 2018, 06:41:23

(2018/02/16 10:49), Amit Langote wrote:
> On 2018/02/15 21:10, Etsuro Fujita wrote:
>> here are some minor comments:
>>
>> o On changes to ExecCleanupTupleRouting:
>>
>> -       ExecCloseIndices(resultRelInfo);
>> -       heap_close(resultRelInfo->ri_RelationDesc, NoLock);
>> +       if (resultRelInfo)
>> +       {
>> +           ExecCloseIndices(resultRelInfo);
>> +           heap_close(resultRelInfo->ri_RelationDesc, NoLock);
>> +       }
>>
>> You might check at the top of the loop whether resultRelInfo is NULL and
>> if so do continue.  I think that would save cycles a bit.
>
> Good point, done.

Thanks.

>> o On ExecInitPartitionInfo:
>>
>> +   int         firstVarno;
>> +   Relation    firstResultRel;
>>
>> My old compiler got "variable may be used uninitialized" warnings.
>
> Fixed.  Actually, I moved those declarations from out here into the blocks
> where they're actually needed.

OK, my compiler gets no warnings now.

>> +   /*
>> +    * Build the RETURNING projection if any for the partition.  Note that
>> +    * we didn't build the returningList for each partition within the
>> +    * planner, but simple translation of the varattnos will suffice.
>> +    * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
>> +    * ExecInitModifyTable() would've initialized this.
>> +    */
>>
>> I think the last comment should be the same as for WCO lists: "This only
>> occurs for the INSERT case or in the case of UPDATE for which we didn't
>> find a result rel above to reuse."
>
> Fixed.  The "above" is no longer needed, because there is no code left in
> ExecInitPartitionInfo() to find UPDATE result rels to reuse.  That code is
> now in ExecSetupPartitionTupleRouting().

OK, that makes sense.

>> +       /*
>> +        * Initialize result tuple slot and assign its rowtype using the
>> first
>> +        * RETURNING list.  We assume the rest will look the same.
>> +        */
>> +       tupDesc = ExecTypeFromTL(returningList, false);
>> +
>> +       /* Set up a slot for the output of the RETURNING projection(s) */
>> +       ExecInitResultTupleSlot(estate,&mtstate->ps);
>> +       ExecAssignResultType(&mtstate->ps, tupDesc);
>> +       slot = mtstate->ps.ps_ResultTupleSlot;
>> +
>> +       /* Need an econtext too */
>> +       if (mtstate->ps.ps_ExprContext == NULL)
>> +           ExecAssignExprContext(estate,&mtstate->ps);
>> +       econtext = mtstate->ps.ps_ExprContext;
>>
>> Do we need this initialization?  I think we would already have the slot
>> and econtext initialized when we get here.
>
> I think you're right.  If node->returningLists is non-NULL at all,
> ExecInitModifyTable() would've initialized the needed slot and expression
> context.  I added Assert()s to that affect.

OK, but one thing I'd like to ask is:

+       /*
+        * Use the slot that would have been set up in ExecInitModifyTable()
+        * for the output of the RETURNING projection(s).  Just make sure to
+        * assign its rowtype using the RETURNING list.
+        */
+       Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+       tupDesc = ExecTypeFromTL(returningList, false);
+       ExecAssignResultType(&mtstate->ps, tupDesc);
+       slot = mtstate->ps.ps_ResultTupleSlot;

Do we need that assignment here?

> Attached updated patch.

Thanks for updating the patch!

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

16 February 2018, 07:42:12

On 2018/02/16 12:41, Etsuro Fujita wrote:
> (2018/02/16 10:49), Amit Langote wrote:
>> I think you're right.  If node->returningLists is non-NULL at all,
>> ExecInitModifyTable() would've initialized the needed slot and expression
>> context.  I added Assert()s to that affect.
> 
> OK, but one thing I'd like to ask is:
> 
> +       /*
> +        * Use the slot that would have been set up in ExecInitModifyTable()
> +        * for the output of the RETURNING projection(s).  Just make sure to
> +        * assign its rowtype using the RETURNING list.
> +        */
> +       Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
> +       tupDesc = ExecTypeFromTL(returningList, false);
> +       ExecAssignResultType(&mtstate->ps, tupDesc);
> +       slot = mtstate->ps.ps_ResultTupleSlot;
> 
> Do we need that assignment here?

I guess mean the assignment of rowtype, that is, the
ExecAssignResultType() line.  On looking at this some more, it looks like
we don't need to ExecAssignResultType here, as you seem to be suspecting,
because we want the RETURNING projection output to use the rowtype of the
first of returningLists and that's what mtstate->ps.ps_ResultTupleSlot has
been set to use in the first place.  So, removed the ExecAssignResultType().

Attached v9.  Thanks a for the review!

Regards,
Amit

Attachment

v9-0001-During-tuple-routing-initialize-per-partition-obj.patch

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

16 February 2018, 12:12:51

(2018/02/16 13:42), Amit Langote wrote:
> On 2018/02/16 12:41, Etsuro Fujita wrote:
>> (2018/02/16 10:49), Amit Langote wrote:
>>> I think you're right.  If node->returningLists is non-NULL at all,
>>> ExecInitModifyTable() would've initialized the needed slot and expression
>>> context.  I added Assert()s to that affect.
>>
>> OK, but one thing I'd like to ask is:
>>
>> +       /*
>> +        * Use the slot that would have been set up in ExecInitModifyTable()
>> +        * for the output of the RETURNING projection(s).  Just make sure to
>> +        * assign its rowtype using the RETURNING list.
>> +        */
>> +       Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
>> +       tupDesc = ExecTypeFromTL(returningList, false);
>> +       ExecAssignResultType(&mtstate->ps, tupDesc);
>> +       slot = mtstate->ps.ps_ResultTupleSlot;
>>
>> Do we need that assignment here?
>
> I guess mean the assignment of rowtype, that is, the
> ExecAssignResultType() line.

That's right.

> On looking at this some more, it looks like
> we don't need to ExecAssignResultType here, as you seem to be suspecting,
> because we want the RETURNING projection output to use the rowtype of the
> first of returningLists and that's what mtstate->ps.ps_ResultTupleSlot has
> been set to use in the first place.

Year, I think so, too.

> So, removed the ExecAssignResultType().

OK, thinks.

> Attached v9.  Thanks a for the review!

Thanks for the updated patch!  In the patch you added the comments:

+       wcoList = linitial(node->withCheckOptionLists);
+
+       /*
+        * Convert Vars in it to contain this partition's attribute numbers.
+        * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
+        * reference to calculate attno's for the returning expression of
+        * this partition.  In the INSERT case, that refers to the root
+        * partitioned table, whereas in the UPDATE tuple routing case the
+        * first partition in the mtstate->resultRelInfo array.  In any 
case,
+        * both that relation and this partition should have the same 
columns,
+        * so we should be able to map attributes successfully.
+        */
+       wcoList = map_partition_varattnos(wcoList, firstVarno,
+                                         partrel, firstResultRel, NULL);

This would be just nitpicking, but I think it would be better to arrange 
these comments, maybe by dividing these to something like this:

        /*
         * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
         * reference to calculate attno's for the returning expression of
         * this partition.  In the INSERT case, that refers to the root
         * partitioned table, whereas in the UPDATE tuple routing case the
         * first partition in the mtstate->resultRelInfo array.  In any 
case,
         * both that relation and this partition should have the same 
columns,
         * so we should be able to map attributes successfully.
         */
        wcoList = linitial(node->withCheckOptionLists);

        /*
         * Convert Vars in it to contain this partition's attribute numbers.
         */
        wcoList = map_partition_varattnos(wcoList, firstVarno,
                                          partrel, firstResultRel, NULL);

I'd say the same thing to the comments you added for RETURNING.

Another thing I noticed about comments in the patch is:

+        * In the case of INSERT on partitioned tables, there is only one
+        * plan.  Likewise, there is only one WCO list, not one per
+        * partition.  For UPDATE, there would be as many WCO lists as
+        * there are plans, but we use the first one as reference.  Note
+        * that if there are SubPlans in there, they all end up attached
+        * to the one parent Plan node.

The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing 
WCO constraints, which would change the place to attach SubPlans in WCO 
constraints from the parent PlanState to the subplan's PlanState, which 
would make the last comment obsolete.  Since this change would be more 
consistent with PG10, I don't think we need to update the comment as 
such; I would vote for just removing that comment from here.

Best regards,
Etsuro Fujita

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

16 February 2018, 12:23:55

Fujita-san,

Thanks for the review.

On 2018/02/16 18:12, Etsuro Fujita wrote:
> (2018/02/16 13:42), Amit Langote wrote:
>> Attached v9.  Thanks a for the review!
> 
> Thanks for the updated patch!  In the patch you added the comments:
> 
> +       wcoList = linitial(node->withCheckOptionLists);
> +
> +       /*
> +        * Convert Vars in it to contain this partition's attribute numbers.
> +        * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
> +        * reference to calculate attno's for the returning expression of
> +        * this partition.  In the INSERT case, that refers to the root
> +        * partitioned table, whereas in the UPDATE tuple routing case the
> +        * first partition in the mtstate->resultRelInfo array.  In any case,
> +        * both that relation and this partition should have the same
> columns,
> +        * so we should be able to map attributes successfully.
> +        */
> +       wcoList = map_partition_varattnos(wcoList, firstVarno,
> +                                         partrel, firstResultRel, NULL);
> 
> This would be just nitpicking, but I think it would be better to arrange
> these comments, maybe by dividing these to something like this:
> 
>        /*
>         * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
>         * reference to calculate attno's for the returning expression of
>         * this partition.  In the INSERT case, that refers to the root
>         * partitioned table, whereas in the UPDATE tuple routing case the
>         * first partition in the mtstate->resultRelInfo array.  In any case,
>         * both that relation and this partition should have the same columns,
>         * so we should be able to map attributes successfully.
>         */
>        wcoList = linitial(node->withCheckOptionLists);
> 
>        /*
>         * Convert Vars in it to contain this partition's attribute numbers.
>         */
>        wcoList = map_partition_varattnos(wcoList, firstVarno,
>                                          partrel, firstResultRel, NULL);
> 
> I'd say the same thing to the comments you added for RETURNING.

Good idea.  Done.

> Another thing I noticed about comments in the patch is:
> 
> +        * In the case of INSERT on partitioned tables, there is only one
> +        * plan.  Likewise, there is only one WCO list, not one per
> +        * partition.  For UPDATE, there would be as many WCO lists as
> +        * there are plans, but we use the first one as reference.  Note
> +        * that if there are SubPlans in there, they all end up attached
> +        * to the one parent Plan node.
> 
> The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing
> WCO constraints, which would change the place to attach SubPlans in WCO
> constraints from the parent PlanState to the subplan's PlanState, which
> would make the last comment obsolete.  Since this change would be more
> consistent with PG10, I don't think we need to update the comment as such;
> I would vote for just removing that comment from here.

I have thought about removing it too, so done.

Updated patch attached.

Thanks,
Amit

Attachment

v10-0001-During-tuple-routing-initialize-per-partition-ob.patch

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

16 February 2018, 13:50:38

(2018/02/16 18:23), Amit Langote wrote:
> On 2018/02/16 18:12, Etsuro Fujita wrote:
>> In the patch you added the comments:
>>
>> +       wcoList = linitial(node->withCheckOptionLists);
>> +
>> +       /*
>> +        * Convert Vars in it to contain this partition's attribute numbers.
>> +        * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
>> +        * reference to calculate attno's for the returning expression of
>> +        * this partition.  In the INSERT case, that refers to the root
>> +        * partitioned table, whereas in the UPDATE tuple routing case the
>> +        * first partition in the mtstate->resultRelInfo array.  In any case,
>> +        * both that relation and this partition should have the same
>> columns,
>> +        * so we should be able to map attributes successfully.
>> +        */
>> +       wcoList = map_partition_varattnos(wcoList, firstVarno,
>> +                                         partrel, firstResultRel, NULL);
>>
>> This would be just nitpicking, but I think it would be better to arrange
>> these comments, maybe by dividing these to something like this:
>>
>>         /*
>>          * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
>>          * reference to calculate attno's for the returning expression of
>>          * this partition.  In the INSERT case, that refers to the root
>>          * partitioned table, whereas in the UPDATE tuple routing case the
>>          * first partition in the mtstate->resultRelInfo array.  In any case,
>>          * both that relation and this partition should have the same columns,
>>          * so we should be able to map attributes successfully.
>>          */
>>         wcoList = linitial(node->withCheckOptionLists);
>>
>>         /*
>>          * Convert Vars in it to contain this partition's attribute numbers.
>>          */
>>         wcoList = map_partition_varattnos(wcoList, firstVarno,
>>                                           partrel, firstResultRel, NULL);
>>
>> I'd say the same thing to the comments you added for RETURNING.
>
> Good idea.  Done.

Thanks.  I fixed a typo (s/Converti/Convert/) and adjusted these 
comments a bit further to match the preceding code/comments.  Attached 
is the updated version.

>> Another thing I noticed about comments in the patch is:
>>
>> +        * In the case of INSERT on partitioned tables, there is only one
>> +        * plan.  Likewise, there is only one WCO list, not one per
>> +        * partition.  For UPDATE, there would be as many WCO lists as
>> +        * there are plans, but we use the first one as reference.  Note
>> +        * that if there are SubPlans in there, they all end up attached
>> +        * to the one parent Plan node.
>>
>> The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing
>> WCO constraints, which would change the place to attach SubPlans in WCO
>> constraints from the parent PlanState to the subplan's PlanState, which
>> would make the last comment obsolete.  Since this change would be more
>> consistent with PG10, I don't think we need to update the comment as such;
>> I would vote for just removing that comment from here.
>
> I have thought about removing it too, so done.

OK.

> Updated patch attached.

Thanks, I think the patch is in good shape, so I'll mark this as ready 
for committer.

Best regards,
Etsuro Fujita

Attachment

v10-0001-During-tuple-routing-initialize-per-partition-ob-efujita.patch

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

19 February 2018, 07:19:10

Fujita-san,

On 2018/02/16 19:50, Etsuro Fujita wrote:
> (2018/02/16 18:23), Amit Langote wrote:
>> Good idea.  Done.
> Thanks.  I fixed a typo (s/Converti/Convert/) and adjusted these comments
> a bit further to match the preceding code/comments.  Attached is the
> updated version.

Thank you for updating the patch.

>> Updated patch attached.
> 
> Thanks, I think the patch is in good shape, so I'll mark this as ready for
> committer.

Thanks a lot for reviewing!

Attached rebased patch.

Regards,
Amit

Attachment

v11-0001-During-tuple-routing-initialize-per-partition-ob.patch

Re: non-bulk inserts and tuple routing

From

Etsuro Fujita

Date:

20 February 2018, 13:40:49

(2018/02/19 13:19), Amit Langote wrote:
> Attached rebased patch.

Thanks for the rebased patch!

One thing I noticed while updating the 
tuple-routing-for-foreign-partitions patch on top of this is: we should 
switch into the per-query memory context in ExecInitPartitionInfo. 
Attached is an updated version for that.

Best regards,
Etsuro Fujita

Attachment

v11-0001-During-tuple-routing-initialize-per-partition-ob-efujita.patch

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

21 February 2018, 04:06:51

Fujita-san,

On 2018/02/20 19:40, Etsuro Fujita wrote:
> (2018/02/19 13:19), Amit Langote wrote:
>> Attached rebased patch.
> 
> Thanks for the rebased patch!
> 
> One thing I noticed while updating the
> tuple-routing-for-foreign-partitions patch on top of this is: we should
> switch into the per-query memory context in ExecInitPartitionInfo.

Good catch!

> Attached is an updated version for that.

Thanks for updating the patch.

Thanks,
Amit

Re: non-bulk inserts and tuple routing

From

Robert Haas

Date:

22 February 2018, 19:10:57

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Attached is an updated version for that.
>
> Thanks for updating the patch.

Committed with a few changes.  The big one was that I got rid of the
local variable is_update in ExecSetupPartitionTupleRouting.  That
saved a level of indentation on a substantial chunk of code, and it
turns out that test was redundant anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: non-bulk inserts and tuple routing

From

Alvaro Herrera

Date:

22 February 2018, 19:53:15

Robert Haas wrote:
> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> >> Attached is an updated version for that.
> >
> > Thanks for updating the patch.
> 
> Committed with a few changes.

I propose to tweak a few comments to PartitionTupleRouting, as attached.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

comments.patch

Re: non-bulk inserts and tuple routing

From

Robert Haas

Date:

22 February 2018, 19:54:14

On Thu, Feb 22, 2018 at 11:53 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> Robert Haas wrote:
>> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> >> Attached is an updated version for that.
>> >
>> > Thanks for updating the patch.
>>
>> Committed with a few changes.
>
> I propose to tweak a few comments to PartitionTupleRouting, as attached.

Sure, please go ahead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

23 February 2018, 03:44:57

On 2018/02/23 1:53, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>> Attached is an updated version for that.
>>>
>>> Thanks for updating the patch.
>>
>> Committed with a few changes.
> 
> I propose to tweak a few comments to PartitionTupleRouting, as attached.

I'd missed those.  Thank you!

Regards,
Amit

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

23 February 2018, 04:05:43

On 2018/02/23 1:10, Robert Haas wrote:
> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> Attached is an updated version for that.
>>
>> Thanks for updating the patch.
> 
> Committed with a few changes.  The big one was that I got rid of the
> local variable is_update in ExecSetupPartitionTupleRouting.  That
> saved a level of indentation on a substantial chunk of code, and it
> turns out that test was redundant anyway.

Thank you!

Regards,
Amit

Re: non-bulk inserts and tuple routing

From

Andres Freund

Date:

03 March 2018, 07:38:18

Hi,

On 2018-02-22 11:10:57 -0500, Robert Haas wrote:
> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> >> Attached is an updated version for that.
> >
> > Thanks for updating the patch.
> 
> Committed with a few changes.  The big one was that I got rid of the
> local variable is_update in ExecSetupPartitionTupleRouting.  That
> saved a level of indentation on a substantial chunk of code, and it
> turns out that test was redundant anyway.

I noticed that this patch broke my JIT patch when I force JITing to be
used everywhere (obviously pointless for perf reasons, but good for
testing).  Turns out to be a single line.

ExecInitPartitionInfo has the following block:
        foreach(ll, wcoList)
        {
            WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
            ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
                                               mtstate->mt_plans[0]);

            wcoExprs = lappend(wcoExprs, wcoExpr);
        }

note how it is passing mtstate->mt_plans[0] as the parent node for the
expression.  I don't quite know why mtstate->mt_plans[0] was chosen
here, it doesn't seem right. The WCO will not be executed in that
context. Note that the ExecBuildProjectionInfo() call a few lines below
also uses a different context.

For JITing that fails because the compiled deform will assume the tuples
look like mt_plans[0]'s scantuples. But we're not dealing with those,
we're dealing with tuples returned by the relevant subplan.

Also note that the code before this commit used
                ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
                                                   &mtstate->ps);
i.e. the ModifyTableState node, as I'd expect.

Greetings,

Andres Freund

Re: non-bulk inserts and tuple routing

From

Andres Freund

Date:

03 March 2018, 07:48:55

On 2018-02-22 11:10:57 -0500, Robert Haas wrote:
> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> >> Attached is an updated version for that.
> >
> > Thanks for updating the patch.
> 
> Committed with a few changes.  The big one was that I got rid of the
> local variable is_update in ExecSetupPartitionTupleRouting.  That
> saved a level of indentation on a substantial chunk of code, and it
> turns out that test was redundant anyway.

Btw, are there cases where this could change explain output?  If there's
subplan references or such in any of returning / wcte expressions,
they'd not get added at explain time.  It's probably fine because add
the expressions also "staticly" in ExecInitModifyTable()?

Greetings,

Andres Freund

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

05 March 2018, 05:21:46

On 2018/03/03 13:38, Andres Freund wrote:
> Hi,
> 
> On 2018-02-22 11:10:57 -0500, Robert Haas wrote:
>> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>> Attached is an updated version for that.
>>>
>>> Thanks for updating the patch.
>>
>> Committed with a few changes.  The big one was that I got rid of the
>> local variable is_update in ExecSetupPartitionTupleRouting.  That
>> saved a level of indentation on a substantial chunk of code, and it
>> turns out that test was redundant anyway.
> 
> I noticed that this patch broke my JIT patch when I force JITing to be
> used everywhere (obviously pointless for perf reasons, but good for
> testing).  Turns out to be a single line.
> 
> ExecInitPartitionInfo has the following block:
>         foreach(ll, wcoList)
>         {
>             WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
>             ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
>                                                mtstate->mt_plans[0]);
> 
>             wcoExprs = lappend(wcoExprs, wcoExpr);
>         }
> 
> note how it is passing mtstate->mt_plans[0] as the parent node for the
> expression.  I don't quite know why mtstate->mt_plans[0] was chosen
> here, it doesn't seem right. The WCO will not be executed in that
> context. Note that the ExecBuildProjectionInfo() call a few lines below
> also uses a different context.
> 
> For JITing that fails because the compiled deform will assume the tuples
> look like mt_plans[0]'s scantuples. But we're not dealing with those,
> we're dealing with tuples returned by the relevant subplan.
> 
> Also note that the code before this commit used
>                 ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
>                                                    &mtstate->ps);
> i.e. the ModifyTableState node, as I'd expect.

I guess that it was an oversight in my patch.  Please find attached a
patch to fix that.

Thanks,
Amit

Attachment

partition-WCO-qual-parent-fix.patch

Re: non-bulk inserts and tuple routing

From

Amit Langote

Date:

05 March 2018, 07:21:51

On 2018/03/03 13:48, Andres Freund wrote:
> On 2018-02-22 11:10:57 -0500, Robert Haas wrote:
>> On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>> Attached is an updated version for that.
>>>
>>> Thanks for updating the patch.
>>
>> Committed with a few changes.  The big one was that I got rid of the
>> local variable is_update in ExecSetupPartitionTupleRouting.  That
>> saved a level of indentation on a substantial chunk of code, and it
>> turns out that test was redundant anyway.
> 
> Btw, are there cases where this could change explain output?  If there's
> subplan references or such in any of returning / wcte expressions,
> they'd not get added at explain time.  It's probably fine because add
> the expressions also "staticly" in ExecInitModifyTable()?

Yes, I think.

Afaics, explain.c only looks at the information that is "statically" added
to ModifyTableState by ExecInitModifyTable.  It considers information
added by the tuple routing code only when printing information about
invoked triggers, that too, only in the case of EXPLAIN ANALYZE.

Thanks,
Amit