Thread: [HACKERS] expanding inheritance in partition bound order

[HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

04 August 2017, 10:38:46

The current way to expand inherited tables, including partitioned tables,
is to use either find_all_inheritors() or find_inheritance_children()
depending on the context. They return child table OIDs in the (ascending)
order of those OIDs, which means the callers that need to lock the child
tables can do so without worrying about the possibility of deadlock in
some concurrent execution of that piece of code. That's good.

For partitioned tables, there is a possibility of returning child table
(partition) OIDs in the partition bound order, which in addition to
preventing the possibility of deadlocks during concurrent locking, seems
potentially useful for other caller-specific optimizations. For example,
tuple-routing code can utilize that fact to implement binary-search based
partition-searching algorithm. For one more example, refer to the "UPDATE
partition key" thread where it's becoming clear that it would be nice if
the planner had put the partitions in bound order in the ModifyTable that
it creates for UPDATE of partitioned tables [1].

So attached are two WIP patches:

0001 implements two interface functions:

List *get_all_partition_oids(Oid, LOCKMODE)
List *get_partition_oids(Oid, LOCKMODE)

that resemble find_all_inheritors() and find_inheritance_children(),
respectively, but expect that users call them only for partitioned tables.
Needless to mention, OIDs are returned with canonical order determined by
that of the partition bounds and they way partition tree structure is
traversed (top-down, breadth-first-left-to-right). Patch replaces all the
calls of the old interface functions with the respective new ones for
partitioned table parents. That means expand_inherited_rtentry (among
others) now calls get_all_partition_oids() if the RTE is for a partitioned
table and find_all_inheritors() otherwise.

In its implementation, get_all_partition_oids() calls
RelationGetPartitionDispatchInfo(), which is useful to generate the result
list in the desired partition bound order. But the current interface and
implementation of RelationGetPartitionDispatchInfo() needs some rework,
because it's too closely coupled with the executor's tuple routing code.

Applying just 0001 will satisfy the requirements stated in [1], but it
won't look pretty as is for too long.

Re: [HACKERS] expanding inheritance in partition bound order

From

Beena Emerson

Date:

10 August 2017, 12:52:55

Hi Amit,

On Thu, Aug 10, 2017 at 7:41 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/08/05 2:25, Robert Haas wrote:
>> Concretely, my proposal is:
>>
>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>> is a very good idea.
>
> I put this patch ahead in the list and so it's now 0001.
>

FYI, 0001 patch throws the warning:

execMain.c: In function ‘ExecSetupPartitionTupleRouting’:
execMain.c:3342:16: warning: ‘next_ptinfo’ may be used uninitialized
in this function [-Wmaybe-uninitialized]    next_ptinfo->parentid != ptinfo->parentid)

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] expanding inheritance in partition bound order

From

Robert Haas

Date:

15 August 2017, 20:27:30

On Wed, Aug 9, 2017 at 10:11 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>> is a very good idea.
>
> I put this patch ahead in the list and so it's now 0001.

I think what you've currently got as
0003-Relieve-RelationGetPartitionDispatchInfo-of-doing-an.patch is a
bug fix that probably needs to be back-patched into v10, so it should
come first.

I think 0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch and
0005-Store-in-pg_inherits-if-a-child-is-a-partitioned-tab.patch should
be merged into one patch and that should come next, followed by
0004-Teach-expand_inherited_rtentry-to-use-partition-boun.patch and
finally what you now have as
0001-Decouple-RelationGetPartitionDispatchInfo-from-execu.patch.

This patch series is blocking a bunch of other things, so it would be
nice if you could press forward with this quickly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

16 August 2017, 04:26:13

On 2017/08/10 18:52, Beena Emerson wrote:
> Hi Amit,
> 
> On Thu, Aug 10, 2017 at 7:41 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/08/05 2:25, Robert Haas wrote:
>>> Concretely, my proposal is:
>>>
>>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>>> is a very good idea.
>>
>> I put this patch ahead in the list and so it's now 0001.
>>
> 
> FYI, 0001 patch throws the warning:
> 
> execMain.c: In function ‘ExecSetupPartitionTupleRouting’:
> execMain.c:3342:16: warning: ‘next_ptinfo’ may be used uninitialized
> in this function [-Wmaybe-uninitialized]
>      next_ptinfo->parentid != ptinfo->parentid)

Thanks for the review.  Will fix in the updated version of the patch I
will post sometime later today.

Regards,
Amit

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

16 August 2017, 08:36:15

Thanks for the review.

On 2017/08/16 2:27, Robert Haas wrote:
> On Wed, Aug 9, 2017 at 10:11 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>>> is a very good idea.
>>
>> I put this patch ahead in the list and so it's now 0001.
> 
> I think what you've currently got as
> 0003-Relieve-RelationGetPartitionDispatchInfo-of-doing-an.patch is a
> bug fix that probably needs to be back-patched into v10, so it should
> come first.

That makes sense.  That patch is now 0001.  Checked that it can be
back-patched to REL_10_STABLE.

> I think 0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch and
> 0005-Store-in-pg_inherits-if-a-child-is-a-partitioned-tab.patch should
> be merged into one patch and that should come next,

Merged the two into one: attached 0002.

> followed by
> 0004-Teach-expand_inherited_rtentry-to-use-partition-boun.patch and

This one is now 0003.

> finally what you now have as
> 0001-Decouple-RelationGetPartitionDispatchInfo-from-execu.patch.

And 0004.

> This patch series is blocking a bunch of other things, so it would be
> nice if you could press forward with this quickly.

Attached updated patches.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Khandekar

Date:

16 August 2017, 14:30:19

On 16 August 2017 at 11:06, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

> Attached updated patches.

Thanks Amit for the patches.

I too agree with the overall approach taken for keeping the locking
order consistent: it's best to do the locking with the existing
find_all_inheritors() since it is much cheaper than to lock them in
partition-bound order, the later being expensive since it requires
opening partitioned tables.

> I haven't yet done anything about changing the timing of opening and
> locking leaf partitions, because it will require some more thinking about
> the required planner changes.  But the above set of patches will get us
> far enough to get leaf partition sub-plans appear in the partition bound
> order (same order as what partition tuple-routing uses in the executor).

So, I believe none of the changes done in pg_inherits.c are essential
for expanding the inheritence tree in bound order, right ? I am not
sure whether we are planning to commit these two things together or
incrementally :
1. Expand in bound order
2. Allow for locking only the partitioned tables first.

For #1, the changes in pg_inherits.c are not required (viz, keeping
partitioned tables at the head of the list, adding inhchildparted
column, etc).

If we are going to do #2 together with #1, then in the patch set there
is no one using the capability introduced by #2. That is, there are no
callers of find_all_inheritors() that make use of the new
num_partitioned_children parameter. Also, there is no boolean
parameter  for find_all_inheritors() to be used to lock only the
partitioned tables.

I feel we should think about
0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch later, and
first get the review done for the other patches.

-------

I see that RelationGetPartitionDispatchInfo() now returns quite a
small subset of what it used to return, which is good. But I feel for
expand_inherited_rtentry(), RelationGetPartitionDispatchInfo() is
still a bit heavy. We only require the oids, so the
PartitionedTableInfo data structure (including the pd->indexes array)
gets discarded.

Also, RelationGetPartitionDispatchInfo() has to call get_rel_relkind()
for each child. In expand_inherited_rtentry(), we anyway have to open
all the child tables, so we get the partition descriptors for each of
the children for free. So how about, in expand_inherited_rtentry(), we
traverse the partition tree using these descriptors similar to how it
is traversed in RelationGetPartitionDispatchInfo() ? May be to avoid
code duplication for traversing, we can have a common API.

Still looking at RelationGetPartitionDispatchInfo() changes ...

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] expanding inheritance in partition bound order

From

Ashutosh Bapat

Date:

16 August 2017, 15:48:31

On Wed, Aug 16, 2017 at 11:06 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

>
>> This patch series is blocking a bunch of other things, so it would be
>> nice if you could press forward with this quickly.
>
> Attached updated patches.
>

Review for 0001. The attached patch has some minor changes to the
comments and code.

+ * All the relations in the partition tree (including 'rel') must have been
+ * locked (using at least the AccessShareLock) by the caller.
It would be good if we can Assert this in the function. But I couldn't find a
way to check whether the backend holds a lock of required strength. Is there
any?

        /*
-        * We locked all the partitions above including the leaf partitions.
-        * Note that each of the relations in *partitions are eventually
-        * closed by the caller.
+        * All the partitions were locked above.  Note that the relcache
+        * entries will be closed by ExecEndModifyTable().
         */
I don't see much value in this hunk, so removed it in the attached patch.

+   list_free(all_parts);
ExecSetupPartitionTupleRouting() will be called only once per DML statement.
Leaking the memory for the duration of DML may be worth the time spent
in the traversing
the list and freeing each cell independently. So removed the hunk in the
attached set.

0002 review
+
+     <row>
+      <entry><structfield>inhchildparted</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       This is <literal>true</> if the child table is a partitioned table,
+       <literal>false</> otherwise
+      </entry>
+     </row>
In the catalogs we are using full "partitioned" e.g. pg_partitioned_table. May
be we should name the column as "inhchildpartitioned".

+#define OID_CMP(o1, o2) \
+       ((o1) < (o2) ? -1 : ((o1) > (o2) ? 1 : 0));
Instead of duplicating the logic in this macro and oid_cmp(), we may want to
call this macro in oid_cmp()? Or simply call oid_cmp() from inhchildinfo_cmp()
passing pointers to the OIDs; a pointer indirection would be costly, but still
maintainable.

+   if (num_partitioned_children)
+       *num_partitioned_children = my_num_partitioned_children;
+
Instead of this conditional, why not to make every caller pass a pointer to
integer. The callers will just ignore the value if they don't want it. If we do
this change, we can get rid of my_num_partitioned_children variable and
directly update the passed in pointer variable.

        inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
-       if (numoids >= maxoids)
+       is_partitioned = ((Form_pg_inherits)
+                               GETSTRUCT(inheritsTuple))->inhchildparted;
Now that we are fetching two members from Form_pg_inherits structure, may be we
should use a local variable
Form_pg_inherits inherits_tuple = GETSTRUCT(inheritsTuple);
and use that to fetch its members.

I am still reviewing changes in this patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Relieve-RelationGetPartitionDispatchInfo-of-doing-an.patch

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

17 August 2017, 04:09:26

Hi Amit,

Thanks for the comments.

On 2017/08/16 20:30, Amit Khandekar wrote:
> On 16 August 2017 at 11:06, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> 
>> Attached updated patches.
> 
> Thanks Amit for the patches.
> 
> I too agree with the overall approach taken for keeping the locking
> order consistent: it's best to do the locking with the existing
> find_all_inheritors() since it is much cheaper than to lock them in
> partition-bound order, the later being expensive since it requires
> opening partitioned tables.

Yeah.  Per the Robert's design idea, we will always do the *locking* in
the order determined by find_all_inheritors/find_inheritance_children.

>> I haven't yet done anything about changing the timing of opening and
>> locking leaf partitions, because it will require some more thinking about
>> the required planner changes.  But the above set of patches will get us
>> far enough to get leaf partition sub-plans appear in the partition bound
>> order (same order as what partition tuple-routing uses in the executor).
> 
> So, I believe none of the changes done in pg_inherits.c are essential
> for expanding the inheritence tree in bound order, right ?

Right.

The changes to pg_inherits.c are only about recognizing partitioned tables
in an inheritance hierarchy and putting them ahead in the returned list.
Now that I think of it, the title of the patch (teach pg_inherits.c about
"partitioning") sounds a bit confusing.  In particular, the patch does not
teach it things like partition bound order, just that some tables in the
hierarchy could be partitioned tables.

> I am not
> sure whether we are planning to commit these two things together or
> incrementally :
> 1. Expand in bound order
> 2. Allow for locking only the partitioned tables first.
> 
> For #1, the changes in pg_inherits.c are not required (viz, keeping
> partitioned tables at the head of the list, adding inhchildparted
> column, etc).

Yes.  Changes to pg_inherits.c can be committed completely independently
of either 1 or 2, although then there would be nobody using that capability.

About 2: I think the capability to lock only the partitioned tables in
expand_inherited_rtentry() will only be used once we have the patch to do
the necessary planner restructuring that will allow us to defer child
table locking to some place that is not expand_inherited_rtentry().  I am
working on that patch now and should be able to show something soon.

> If we are going to do #2 together with #1, then in the patch set there
> is no one using the capability introduced by #2. That is, there are no
> callers of find_all_inheritors() that make use of the new
> num_partitioned_children parameter. Also, there is no boolean
> parameter  for find_all_inheritors() to be used to lock only the
> partitioned tables.
> 
> I feel we should think about
> 0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch later, and
> first get the review done for the other patches.

I think that makes sense.

> I see that RelationGetPartitionDispatchInfo() now returns quite a
> small subset of what it used to return, which is good. But I feel for
> expand_inherited_rtentry(), RelationGetPartitionDispatchInfo() is
> still a bit heavy. We only require the oids, so the
> PartitionedTableInfo data structure (including the pd->indexes array)
> gets discarded.

Maybe we could make the output argument optional, but I don't see much
point in being too conservative here.  Building the indexes array does not
cost us that much and if a not-too-distant-in-future patch could use that
information somehow, it could do so for free.

> Also, RelationGetPartitionDispatchInfo() has to call get_rel_relkind()
> for each child. In expand_inherited_rtentry(), we anyway have to open
> all the child tables, so we get the partition descriptors for each of
> the children for free. So how about, in expand_inherited_rtentry(), we
> traverse the partition tree using these descriptors similar to how it
> is traversed in RelationGetPartitionDispatchInfo() ? May be to avoid
> code duplication for traversing, we can have a common API.

As mentioned, one goal I'm seeking is to avoid having to open the child
tables in expand_inherited_rtentry().

Thanks,
Amit

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

17 August 2017, 05:12:56

Hi Ashutosh,

Thanks for the review and the updated patch.

On 2017/08/16 21:48, Ashutosh Bapat wrote:
> On Wed, Aug 16, 2017 at 11:06 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> 
>>
>>> This patch series is blocking a bunch of other things, so it would be
>>> nice if you could press forward with this quickly.
>>
>> Attached updated patches.
>>
> 
> Review for 0001. The attached patch has some minor changes to the
> comments and code.
> 
> + * All the relations in the partition tree (including 'rel') must have been
> + * locked (using at least the AccessShareLock) by the caller.
>
> It would be good if we can Assert this in the function. But I couldn't find a
> way to check whether the backend holds a lock of required strength. Is there
> any?

Currently there isn't.  Robert suggested a RelationLockHeldByMe(Oid) [1],
which is still a TODO on my plate.

>         /*
> -        * We locked all the partitions above including the leaf partitions.
> -        * Note that each of the relations in *partitions are eventually
> -        * closed by the caller.
> +        * All the partitions were locked above.  Note that the relcache
> +        * entries will be closed by ExecEndModifyTable().
>          */
> I don't see much value in this hunk,

I thought the new text was a bit clearer, but maybe that's just me.  Will
remove.

> +   list_free(all_parts);
> ExecSetupPartitionTupleRouting() will be called only once per DML statement.
> Leaking the memory for the duration of DML may be worth the time spent
> in the traversing
> the list and freeing each cell independently.

Fair enough, will remove the list_free().

> 0002 review
> +
> +     <row>
> +      <entry><structfield>inhchildparted</structfield></entry>
> +      <entry><type>bool</type></entry>
> +      <entry></entry>
> +      <entry>
> +       This is <literal>true</> if the child table is a partitioned table,
> +       <literal>false</> otherwise
> +      </entry>
> +     </row>
> In the catalogs we are using full "partitioned" e.g. pg_partitioned_table. May
> be we should name the column as "inhchildpartitioned".

Sure.

> +#define OID_CMP(o1, o2) \
> +       ((o1) < (o2) ? -1 : ((o1) > (o2) ? 1 : 0));
> Instead of duplicating the logic in this macro and oid_cmp(), we may want to
> call this macro in oid_cmp()? Or simply call oid_cmp() from inhchildinfo_cmp()
> passing pointers to the OIDs; a pointer indirection would be costly, but still
> maintainable.

Actually, I avoided using oid_cmp exactly for that reason.

> +   if (num_partitioned_children)
> +       *num_partitioned_children = my_num_partitioned_children;
> +
> Instead of this conditional, why not to make every caller pass a pointer to
> integer. The callers will just ignore the value if they don't want it. If we do
> this change, we can get rid of my_num_partitioned_children variable and
> directly update the passed in pointer variable.

There are a bunch of callers of find_all_inheritors() and
find_inheritance_children.  Changes to make them all declare a pointless
variable seemed off to me.  The conditional in question doesn't seem to be
that expensive.  (To be fair, the one introduced in find_all_inheritors()
kind of is as implemented by the patch, because it's executed for every
iteration of the foreach(l, rels_list) loop, which I will fix.)

> 
>         inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
> -       if (numoids >= maxoids)
> +       is_partitioned = ((Form_pg_inherits)
> +                               GETSTRUCT(inheritsTuple))->inhchildparted;
> Now that we are fetching two members from Form_pg_inherits structure, may be we
> should use a local variable
> Form_pg_inherits inherits_tuple = GETSTRUCT(inheritsTuple);
> and use that to fetch its members.

Sure, will do.

> I am still reviewing changes in this patch.

Okay, will wait for more comments before sending the updated patches.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmobwbh12OJerqAGyPEjb_%2B2y7T0nqRKTcjed6L4NTET6Fg%40mail.gmail.com

Re: [HACKERS] expanding inheritance in partition bound order

From

Robert Haas

Date:

17 August 2017, 05:22:13

On Wed, Aug 16, 2017 at 10:12 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> In the catalogs we are using full "partitioned" e.g. pg_partitioned_table. May
>> be we should name the column as "inhchildpartitioned".
>
> Sure.

I suggest inhpartitioned or inhispartition.  inhchildpartitioned seems too long.

> There are a bunch of callers of find_all_inheritors() and
> find_inheritance_children.  Changes to make them all declare a pointless
> variable seemed off to me.  The conditional in question doesn't seem to be
> that expensive.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

17 August 2017, 05:36:04

On 2017/08/17 11:22, Robert Haas wrote:
> On Wed, Aug 16, 2017 at 10:12 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> In the catalogs we are using full "partitioned" e.g. pg_partitioned_table. May
>>> be we should name the column as "inhchildpartitioned".
>>
>> Sure.
> 
> I suggest inhpartitioned or inhispartition.  inhchildpartitioned seems too long.

inhchildpartitioned indeed seems long.

Since we storing if the child table (one with the OID inhrelid) is
partitioned, inhpartitioned seems best to me.  Will implement that.

Thanks,
Amit

Re: [HACKERS] expanding inheritance in partition bound order

From

Ashutosh Bapat

Date:

17 August 2017, 07:56:53

On Thu, Aug 17, 2017 at 8:06 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/08/17 11:22, Robert Haas wrote:
>> On Wed, Aug 16, 2017 at 10:12 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>> In the catalogs we are using full "partitioned" e.g. pg_partitioned_table. May
>>>> be we should name the column as "inhchildpartitioned".
>>>
>>> Sure.
>>
>> I suggest inhpartitioned or inhispartition.  inhchildpartitioned seems too long.
>
> inhchildpartitioned indeed seems long.
>
> Since we storing if the child table (one with the OID inhrelid) is
> partitioned, inhpartitioned seems best to me.  Will implement that.

inhchildpartitioned is long but clearly tells that the child table is
partitioned, not the parent. pg_inherit can have parents which are not
partitioned, so it's better to have self-explanatory catalog name. I
am fine with some other name as long as it's clear.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

17 August 2017, 08:24:14

On 2017/08/17 13:56, Ashutosh Bapat wrote:
> On Thu, Aug 17, 2017 at 8:06 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/08/17 11:22, Robert Haas wrote:
>>> On Wed, Aug 16, 2017 at 10:12 PM, Amit Langote
>>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>>> In the catalogs we are using full "partitioned" e.g. pg_partitioned_table. May
>>>>> be we should name the column as "inhchildpartitioned".
>>>>
>>>> Sure.
>>>
>>> I suggest inhpartitioned or inhispartition.  inhchildpartitioned seems too long.
>>
>> inhchildpartitioned indeed seems long.
>>
>> Since we storing if the child table (one with the OID inhrelid) is
>> partitioned, inhpartitioned seems best to me.  Will implement that.
> 
> inhchildpartitioned is long but clearly tells that the child table is
> partitioned, not the parent. pg_inherit can have parents which are not
> partitioned, so it's better to have self-explanatory catalog name. I
> am fine with some other name as long as it's clear.

OTOH, the pg_inherits field that stores the OID of the child table does
not mention "child" in its name (inhrelid), although you are right that
inhpartitioned can be taken to mean that the inheritance parent
(inhparent) is partitioned.  In any case, system catalog documentation
which clearly states what's what might be the best guide for the confused.

Of course, we can add a comment in pg_inherits.h next to the field
explaining what it is for those reading the source code and confused about
what inhpartitioned means.

Thanks,
Amit

Re: [HACKERS] expanding inheritance in partition bound order

From

Ashutosh Bapat

Date:

17 August 2017, 08:28:14

On Thu, Aug 17, 2017 at 10:54 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/08/17 13:56, Ashutosh Bapat wrote:
>> On Thu, Aug 17, 2017 at 8:06 AM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> On 2017/08/17 11:22, Robert Haas wrote:
>>>> On Wed, Aug 16, 2017 at 10:12 PM, Amit Langote
>>>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>>>> In the catalogs we are using full "partitioned" e.g. pg_partitioned_table. May
>>>>>> be we should name the column as "inhchildpartitioned".
>>>>>
>>>>> Sure.
>>>>
>>>> I suggest inhpartitioned or inhispartition.  inhchildpartitioned seems too long.
>>>
>>> inhchildpartitioned indeed seems long.
>>>
>>> Since we storing if the child table (one with the OID inhrelid) is
>>> partitioned, inhpartitioned seems best to me.  Will implement that.
>>
>> inhchildpartitioned is long but clearly tells that the child table is
>> partitioned, not the parent. pg_inherit can have parents which are not
>> partitioned, so it's better to have self-explanatory catalog name. I
>> am fine with some other name as long as it's clear.
>
> OTOH, the pg_inherits field that stores the OID of the child table does
> not mention "child" in its name (inhrelid), although you are right that
> inhpartitioned can be taken to mean that the inheritance parent
> (inhparent) is partitioned.  In any case, system catalog documentation
> which clearly states what's what might be the best guide for the confused.
>
Sorry, I overlooked this detail. To me it means that the table is
driven by the child and inhpartitioned looks good then.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

17 August 2017, 10:29:18

On 2017/08/17 10:09, Amit Langote wrote:
> On 2017/08/16 20:30, Amit Khandekar wrote:
>> On 16 August 2017 at 11:06, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> I am not
>> sure whether we are planning to commit these two things together or
>> incrementally :
>> 1. Expand in bound order
>> 2. Allow for locking only the partitioned tables first.
>>
>> For #1, the changes in pg_inherits.c are not required (viz, keeping
>> partitioned tables at the head of the list, adding inhchildparted
>> column, etc).
> 
> Yes.  Changes to pg_inherits.c can be committed completely independently
> of either 1 or 2, although then there would be nobody using that capability.
> 
> About 2: I think the capability to lock only the partitioned tables in
> expand_inherited_rtentry() will only be used once we have the patch to do
> the necessary planner restructuring that will allow us to defer child
> table locking to some place that is not expand_inherited_rtentry().  I am
> working on that patch now and should be able to show something soon.
> 
>> If we are going to do #2 together with #1, then in the patch set there
>> is no one using the capability introduced by #2. That is, there are no
>> callers of find_all_inheritors() that make use of the new
>> num_partitioned_children parameter. Also, there is no boolean
>> parameter  for find_all_inheritors() to be used to lock only the
>> partitioned tables.
>>
>> I feel we should think about
>> 0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch later, and
>> first get the review done for the other patches.
> 
> I think that makes sense.

After thinking some more on this, I think Amit's suggestion to put this
patch at the end of the priority list is good (that is, the patch that
teaches pg_inherits infrastructure to list partitioned tables ahead in the
list.)  Its purpose is mainly to fulfill the requirement that partitioned
tables be able to be locked ahead of any leaf partitions in the list (per
the design idea Robert suggested [1]).  Patch which requires that
capability is not in the picture yet.  Perhaps, we could review and commit
this patch only when that future patch shows up.  So, I will hold that
patch for now.

Thoughts?

Attached rest of the patches.  0001 has changes per Ashutosh's review
comments [2]:

0001: Relieve RelationGetPartitionDispatchInfo() of doing any locking

0002: Teach expand_inherited_rtentry to use partition bound order

0003: Decouple RelationGetPartitionDispatchInfo() from executor

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmobwbh12OJerqAGyPEjb_%2B2y7T0nqRKTcjed6L4NTET6Fg%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/CAFjFpRdXn7w7nVKv4qN9fa%2BtzRi_mJFNCsBWy%3Dbd0SLbYczUfA%40mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On 2017/08/18 4:54, Robert Haas wrote:
> On Thu, Aug 17, 2017 at 8:39 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> [2] had a patch with some changes to the original patch you posted. I
>> didn't describe those changes in my mail, since they rearranged the
>> comments. Those changes are not part of this patch and you haven't
>> comments about those changes as well. If you have intentionally
>> excluded those changes, it's fine. In case, you haven't reviewed them,
>> please see if they are good to be incorporated.
> 
> I took a quick look at your version but I think I like Amit's fine the
> way it is, so committed that and back-patched it to v10.

Thanks for committing.

> I find 0002 pretty ugly as things stand.  We get a bunch of tuple maps
> that we don't really need, only to turn around and free them.  We get
> a bunch of tuple slots that we don't need, only to turn around and
> drop them.  We don't really need the PartitionDispatch objects either,
> except for the OIDs they contain.  There's a lot of extra stuff being
> computed here that is really irrelevant for this purpose.  I think we
> should try to clean that up somehow.

One way to do that might be to reverse the order of the remaining patches
and put the patch to refactor RelationGetPartitionDispatchInfo() first.
With that refactoring, PartitionDispatch itself has become much simpler in
that it does not contain a relcache reference to be closed eventually by
the caller, the tuple map, and the tuple table slot.  Since those things
are required for tuple-routing, the refactoring makes
ExecSetupPartitionTupleRouting itself create them from the (minimal)
information returned by RelationGetPartitionDispatchInfo and ultimately
destroy when done using it.  I kept the indexes field in
PartitionDispatchData though, because it's essentially free to create
while we are walking the partition tree in
RelationGetPartitionDispatchInfo() and it seems undesirable to make the
caller compute that information (indexes) by traversing the partition tree
all over again, if it doesn't otherwise have to.  I am still considering
some counter-arguments raised by Amit Khandekar about this last assertion.

Thoughts?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Hi Amit,

On 2017/08/17 21:18, Amit Khandekar wrote:
> Anyways, some more comments :
> 
> In ExecSetupPartitionTupleRouting(), not sure why ptrinfos array is an
> array of pointers. Why can't it be an array of
> PartitionTupleRoutingInfo structure  rather than pointer to that
> structure ?

AFAIK, assigning pointers is less expensive than assigning struct and we
end up doing a lot of assigning of the members of that array to a local
variable in get_partition_for_tuple(), for example.  Perhaps, we could
avoid those assignments and implement it the way you suggest.

> diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
> + * Close all the leaf partitions and their indices.
> *
> Above comment needs to be shifted a bit down to the subsequent "for"
> loop where it's actually applicable.

That's right, done.

> * node->mt_partition_dispatch_info[0] corresponds to the root partitioned
> * table, for which we didn't create tupslot.
> Above : node->mt_partition_dispatch_info[0] => node->mt_ptrinfos[0]

Oops, fixed.

> /*
>  * XXX- do we need a pinning mechanism for partition descriptors
>  * so that there references can be managed independently of
>  * the parent relcache entry? Like PinPartitionDesc(partdesc)?
>  */
> pd->partdesc = partdesc;
> 
> Any idea if the above can be handled ? I am not too sure.

A similar mechanism exists for TupleDesc ref-counting (see the usage of
PinTupleDesc and ReleaseTupleDesc across the backend code.)  I too am
currently unsure if such an elaborate mechanism is actually *necessary*
for rd_partdesc.

Attached updated patches.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On 2017/08/26 3:28, Robert Haas wrote:
> On Mon, Aug 21, 2017 at 2:10 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> [ new patches ]
> 
> I am failing to understand the point of separating PartitionDispatch
> into PartitionDispatch and PartitionTableInfo.  That seems like an
> unnecessary multiplication of entities, as does the introduction of
> PartitionKeyInfo.  I also think that replacing reldesc with reloid is
> not really an improvement; any places that gets the relid has to go
> open the relation to get the reldesc, whereas without that it has a
> direct pointer to the information it needs.

I am worried about the open relcache reference in PartitionDispatch when
we start using it in the planner.  Whereas there is a ExecEndModifyTable()
as a suitable place to close that reference, there doesn't seem to exist
one within the planner, but I guess we will have to figure something out.
For time being, the second patch closes the same in
expand_inherited_rtentry() right after picking up the OID using
RelationGetRelid(pd->reldesc).

> I suggest that this patch just focus on removing the following things
> from PartitionDispatchData: keystate, tupslot, tupmap.  Those things
> are clearly executor-specific stuff that makes sense to move to a
> different structure, what you're calling PartitionTupleRoutingInfo
> (not sure that's the best name).  The other stuff all seems fine.
> You're going to have to open the relation anyway, so keeping the
> reldesc around seems like an optimization, if anything.  The
> PartitionKey and PartitionDesc pointers may not really be needed --
> they're just pointers into reldesc -- but they're trivial to compute,
> so it doesn't hurt anything to have them either as a
> micro-optimization for performance or even just for readability.

OK, done this way in the attached updated patch.  Any suggestions about a
better name for what the patch calls PartitionTupleRoutingInfo?

> That just leaves indexes.  In a world where keystate, tupslot, and
> tupmap are removed from the PartitionDispatchData, you must need
> indexes or there would be no point in constructing a
> PartitionDispatchData object in the first place; any application that
> needs neither indexes nor the executor-specific stuff could just use
> the Relation directly.

Agreed.

> Regarding your XXX comments, note that if you've got a lock on a
> relation, the pointers to the PartitionKey and PartitionDesc are
> stable.  The PartitionKey can't change once it's established, and the
> PartitionDesc can't change while we've got a lock on the relation
> unless we change it ourselves (and any places that do should have
> CheckTableNotInUse checks).  The keep_partkey and keep_partdesc
> handling in relcache.c exists exactly so that we can guarantee that
> the pointer won't go stale under us.  Now, if we *don't* have a lock
> on the relation, then those pointers can easily be invalidated -- so
> you can't hang onto a PartitionDispatch for longer than you hang onto
> the lock on the Relation.  But that shouldn't be a problem.  I think
> you only need to hang onto PartitionDispatch pointers for the lifetime
> of a single query.  One can imagine optimizations where we try to
> avoid rebuilding that for subsequent queries but I'm not sure there's
> any demonstrated need for such a system at present.

Here too.

Attached are the updated patches.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] expanding inheritance in partition bound order

From

Robert Haas

Date:

28 August 2017, 22:26:58

On Mon, Aug 28, 2017 at 6:38 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> I am worried about the open relcache reference in PartitionDispatch when
> we start using it in the planner.  Whereas there is a ExecEndModifyTable()
> as a suitable place to close that reference, there doesn't seem to exist
> one within the planner, but I guess we will have to figure something out.

Yes, I think there's no real way to avoid having to figure that out.

> OK, done this way in the attached updated patch.  Any suggestions about a
> better name for what the patch calls PartitionTupleRoutingInfo?

I think this patch could be further simplified by continuing to use
the existing function signature for RelationGetPartitionDispatchInfo
instead of changing it to return a List * rather than an array.  I
don't see any benefit to such a change.  The current system is more
efficient.

I keep having the feeling that this is a big patch with a small patch
struggling to get out.  Is it really necessary to change
RelationGetPartitionDispatchInfo so much or could you just do a really
minimal surgery to remove the code that sets the stuff we don't need?
Like this:

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 96a64ce6b2..4fabcf9f32 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1089,29 +1089,7 @@ RelationGetPartitionDispatchInfo(Relation rel,        pd[i] = (PartitionDispatch)
palloc(sizeof(PartitionDispatchData));       pd[i]->reldesc = partrel;        pd[i]->key = partkey;
 
-        pd[i]->keystate = NIL;        pd[i]->partdesc = partdesc;
-        if (parent != NULL)
-        {
-            /*
-             * For every partitioned table other than root, we must store a
-             * tuple table slot initialized with its tuple descriptor and a
-             * tuple conversion map to convert a tuple from its parent's
-             * rowtype to its own. That is to make sure that we are looking at
-             * the correct row using the correct tuple descriptor when
-             * computing its partition key for tuple routing.
-             */
-            pd[i]->tupslot = MakeSingleTupleTableSlot(tupdesc);
-            pd[i]->tupmap = convert_tuples_by_name(RelationGetDescr(parent),
-                                                   tupdesc,
-
gettext_noop("could not convert row type"));
-        }
-        else
-        {
-            /* Not required for the root partitioned table */
-            pd[i]->tupslot = NULL;
-            pd[i]->tupmap = NULL;
-        }        pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
        /*


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

30 August 2017, 05:36:19

On 2017/08/29 4:26, Robert Haas wrote:
> I think this patch could be further simplified by continuing to use
> the existing function signature for RelationGetPartitionDispatchInfo
> instead of changing it to return a List * rather than an array.  I
> don't see any benefit to such a change.  The current system is more
> efficient.

OK, restored the array way.

> I keep having the feeling that this is a big patch with a small patch
> struggling to get out.  Is it really necessary to change
> RelationGetPartitionDispatchInfo so much or could you just do a really
> minimal surgery to remove the code that sets the stuff we don't need?
> Like this:

Sure, done in the attached updated patch.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Aug 30, 2017 at 12:47 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> +1. I think we should just pull out the OIDs from partition descriptor.

Like this?  The first patch refactors the expansion of a single child
out into a separate function, and the second patch implements EIBO on
top of it.

I realized while doing this that we really want to expand the
partitioning hierarchy depth-first, not breadth-first.  For some
things, like partition-wise join in the case where all bounds match
exactly, we really only need a *predictable* ordering that will be the
same for two equi-partitioned tables.  A breadth-first expansion will
give us that.  But it's not actually in bound order.  For example:

create table foo (a int, b text) partition by list (a);
create table foo1 partition of foo for values in (2);
create table foo2 partition of foo for values in (1) partition by range (b);
create table foo2a partition of foo2 for values from ('b') to ('c');
create table foo2b partition of foo2 for values from ('a') to ('b');
create table foo3 partition of foo for values in (3);

The correct bound-order expansion of this is foo2b - foo2a - foo1 -
foo3, which is indeed what you get with the attached patch.  But if we
did the expansion in breadth-first fashion, we'd get foo1 - foo3 -
foo2a, foo2b, which is, well, not in bound order.  If the idea is that
you see a > 2 and rule out all partitions that appear before the first
one with an a-value >= 2, it's not going to work.

Mind you, that idea has some problems anyway in the face of default
partitions, null partitions, and list partitions which accept
non-contiguous values (e.g. one partition for 1, 3, 5; another for 2,
4, 6).  We might need to mark the PartitionDesc to indicate whether
PartitionDesc-order is in fact bound-order in a particular instance,
or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] expanding inheritance in partition bound order

From

Ashutosh Bapat

Date:

31 August 2017, 09:56:16

On Thu, Aug 31, 2017 at 1:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Aug 30, 2017 at 12:47 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> +1. I think we should just pull out the OIDs from partition descriptor.
>
> Like this?  The first patch refactors the expansion of a single child
> out into a separate function, and the second patch implements EIBO on
> top of it.
>
> I realized while doing this that we really want to expand the
> partitioning hierarchy depth-first, not breadth-first.  For some
> things, like partition-wise join in the case where all bounds match
> exactly, we really only need a *predictable* ordering that will be the
> same for two equi-partitioned table.

+1. Spotted right!

> A breadth-first expansion will
> give us that.  But it's not actually in bound order.  For example:
>
> create table foo (a int, b text) partition by list (a);
> create table foo1 partition of foo for values in (2);
> create table foo2 partition of foo for values in (1) partition by range (b);
> create table foo2a partition of foo2 for values from ('b') to ('c');
> create table foo2b partition of foo2 for values from ('a') to ('b');
> create table foo3 partition of foo for values in (3);
>
> The correct bound-order expansion of this is foo2b - foo2a - foo1 -
> foo3, which is indeed what you get with the attached patch.  But if we
> did the expansion in breadth-first fashion, we'd get foo1 - foo3 -
> foo2a, foo2b, which is, well, not in bound order.  If the idea is that
> you see a > 2 and rule out all partitions that appear before the first
> one with an a-value >= 2, it's not going to work.

Here are the patches revised a bit. I have esp changed the variable
names and arguments to reflect their true role in the functions. Also
updated prologue of expand_single_inheritance_child() to mention
"has_child". Let me know if those changes look good.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On 2017/09/05 14:11, Amit Khandekar wrote:
> Great, thanks. Just wanted to make sure someone is working on that,
> because, as you said, it is no longer an EIBO patch. Since you are
> doing that, I won't work on that.

Here is that patch (actually two patches). Sorry it took me a bit.

Description:

[PATCH 1/2] Decouple RelationGetPartitionDispatchInfo() from executor

Currently it and the structure it generates viz. PartitionDispatch
objects are too coupled with the executor's tuple-routing code. In
particular, it's pretty undesirable that it makes it the responsibility
of the caller to release some resources, such as executor tuple table
slots, tuple-conversion maps, etc. After this refactoring,
ExecSetupPartitionTupleRouting() now needs to
do some of the work that was previously done in
RelationGetPartitionDispatchInfo().

[PATCH 2/2] Make RelationGetPartitionDispatch expansion order
depth-first

This is so as it matches what the planner is doing with partitioning
inheritance expansion. Matching with planner order helps because
it helps ease matching the executor's per-partition objects with
planner-created per-partition nodes.

Actually, I'm coming to a conclusion that we should keep any
whole-partition-tree stuff out of partition.c and its interface, as Robert
has also alluded to in an earlier message on this thread [1]. But since
that's a different topic, I'll shut up about it on this thread and start a
new thread to discuss what kind of code rearrangement is possible.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoafr%3DhUrM%3Dcbx-k%3DBDHOF2OfXaw95HQSNAK4mHBwmSjtw%40mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Khandekar

Date:

11 September 2017, 10:16:06

Thanks Amit for the patch. I am still reviewing it, but meanwhile
below are a few comments so far ...

On 8 September 2017 at 15:53, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> [PATCH 2/2] Make RelationGetPartitionDispatch expansion order
>  depth-first
>
> This is so as it matches what the planner is doing with partitioning
> inheritance expansion.  Matching with planner order helps because
> it helps ease matching the executor's per-partition objects with
> planner-created per-partition nodes.
>
>

+   next_parted_idx += (list_length(*pds) - next_parted_idx - 1);

I think this can be replaced just by :
+   next_parted_idx = list_length(*pds) - 1;
Or, how about removing this variable next_parted_idx altogether ?
Instead, we can just do this :
pd->indexes[i] = -(1 + list_length(*pds));
If that is not possible, I may be missing something.

-----------

+ next_leaf_idx += (list_length(*leaf_part_oids) - next_leaf_idx);

Didn't understand why next_leaf_idx needs to be updated in case when
the current partrelid is partitioned. I think it should be incremented
only for leaf partitions, no ? Or for that matter, again, how about
removing the variable 'next_leaf_idx' and doing this :
*leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
pd->indexes[i] = list_length(*leaf_part_oids) - 1;

-----------

* For every partitioned table in the tree, starting with the root
* partitioned table, add its relcache entry to parted_rels, while also
* queuing its partitions (in the order in which they appear in the
* partition descriptor) to be looked at later in the same loop.  This is
* a bit tricky but works because the foreach() macro doesn't fetch the
* next list element until the bottom of the loop.

I think the above comment needs to be modified with something
explaining the relevant changed code. For e.g. there is no
parted_rels, and the "tricky" part was there earlier because of the
list being iterated and at the same time being appended.

------------

I couldn't see the existing comments like "Indexes corresponding to
the internal partitions are multiplied by" anywhere in the patch. I
think those comments are still valid, and important.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

11 September 2017, 12:56:22

Hi Amit,

On 2017/09/11 16:16, Amit Khandekar wrote:
> Thanks Amit for the patch. I am still reviewing it, but meanwhile
> below are a few comments so far ...

Thanks for the review.

> +   next_parted_idx += (list_length(*pds) - next_parted_idx - 1);
> 
> I think this can be replaced just by :
> +   next_parted_idx = list_length(*pds) - 1;
> Or, how about removing this variable next_parted_idx altogether ?
> Instead, we can just do this :
> pd->indexes[i] = -(1 + list_length(*pds));

That seems like the simplest possible way to do it.

> + next_leaf_idx += (list_length(*leaf_part_oids) - next_leaf_idx);
> 
> Didn't understand why next_leaf_idx needs to be updated in case when
> the current partrelid is partitioned. I think it should be incremented
> only for leaf partitions, no ? Or for that matter, again, how about
> removing the variable 'next_leaf_idx' and doing this :
> *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
> pd->indexes[i] = list_length(*leaf_part_oids) - 1;

Yep.

Attached updated patch does it that way for both partitioned table indexes
and leaf partition indexes.  Thanks for pointing it out.


> -----------
> 
> * For every partitioned table in the tree, starting with the root
> * partitioned table, add its relcache entry to parted_rels, while also
> * queuing its partitions (in the order in which they appear in the
> * partition descriptor) to be looked at later in the same loop.  This is
> * a bit tricky but works because the foreach() macro doesn't fetch the
> * next list element until the bottom of the loop.
> 
> I think the above comment needs to be modified with something
> explaining the relevant changed code. For e.g. there is no
> parted_rels, and the "tricky" part was there earlier because of the
> list being iterated and at the same time being appended.
> 
> ------------

I think I forgot to update this comment.

> I couldn't see the existing comments like "Indexes corresponding to
> the internal partitions are multiplied by" anywhere in the patch. I
> think those comments are still valid, and important.

Again, I failed to keep this comment.  Anyway, I reworded the comments a
bit to describe what the code is doing more clearly.  Hope you find it so too.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

13 September 2017, 13:02:27

On 2017/09/11 18:56, Amit Langote wrote:
> Attached updated patch does it that way for both partitioned table indexes
> and leaf partition indexes.  Thanks for pointing it out.

It seems to me we don't really need the first patch all that much.  That
is, let's keep PartitionDispatchData the way it is for now, since we don't
really have any need for it beside tuple-routing (EIBO as committed didn't
need it for one).  So, let's forget about "decoupling
RelationGetPartitionDispatchInfo() from the executor" thing for now and
move on.

So, attached is just the patch to make RelationGetPartitionDispatchInfo()
traverse the partition tree in depth-first manner to be applied on HEAD.

Thoughts?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Make-RelationGetPartitionDispatch-expansion-order-de.patch

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Khandekar

Date:

13 September 2017, 13:16:27

On 13 September 2017 at 15:32, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/09/11 18:56, Amit Langote wrote:
>> Attached updated patch does it that way for both partitioned table indexes
>> and leaf partition indexes.  Thanks for pointing it out.
>
> It seems to me we don't really need the first patch all that much.  That
> is, let's keep PartitionDispatchData the way it is for now, since we don't
> really have any need for it beside tuple-routing (EIBO as committed didn't
> need it for one).  So, let's forget about "decoupling
> RelationGetPartitionDispatchInfo() from the executor" thing for now and
> move on.
>
> So, attached is just the patch to make RelationGetPartitionDispatchInfo()
> traverse the partition tree in depth-first manner to be applied on HEAD.
>
> Thoughts?

+1. If at all we need the decoupling later for some reason, we can do
that incrementally.

Will review your latest patch by tomorrow.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

From

Robert Haas

Date:

13 September 2017, 19:42:14

On Wed, Sep 13, 2017 at 6:02 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> It seems to me we don't really need the first patch all that much.  That
> is, let's keep PartitionDispatchData the way it is for now, since we don't
> really have any need for it beside tuple-routing (EIBO as committed didn't
> need it for one).  So, let's forget about "decoupling
> RelationGetPartitionDispatchInfo() from the executor" thing for now and
> move on.
>
> So, attached is just the patch to make RelationGetPartitionDispatchInfo()
> traverse the partition tree in depth-first manner to be applied on HEAD.

I like this patch.  Not only does it improve the behavior, but it's
actually less code than we have now, and in my opinion, the new code
is easier to understand, too.

A few suggestions:

- I think get_partition_dispatch_recurse() get a check_stack_depth()
call just like expand_partitioned_rtentry() did, and for the same
reasons: if the catalog contents are corrupted so that we have an
infinite loop in the partitioning hierarchy, we want to error out, not
crash.

- I think we should add a comment explaining that we're careful to do
this in the same order as expand_partitioned_rtentry() so that callers
can assume that the N'th entry in the leaf_part_oids array will also
be the N'th child of an Append node.

+         * For every partitioned table other than root, we must store a

other than the root

+     * partitioned table.  The value multiplied back by -1 is returned as the

multiplied by -1, not multiplied back by -1

+     * tables in the tree, using which, search is continued further down the
+     * partition tree.

Period after "in the tree".  Then continue: "This value is used to
continue the search in the next level of the partition tree."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

14 September 2017, 04:13:45

On 2017/09/14 1:42, Robert Haas wrote:
> On Wed, Sep 13, 2017 at 6:02 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> It seems to me we don't really need the first patch all that much.  That
>> is, let's keep PartitionDispatchData the way it is for now, since we don't
>> really have any need for it beside tuple-routing (EIBO as committed didn't
>> need it for one).  So, let's forget about "decoupling
>> RelationGetPartitionDispatchInfo() from the executor" thing for now and
>> move on.
>>
>> So, attached is just the patch to make RelationGetPartitionDispatchInfo()
>> traverse the partition tree in depth-first manner to be applied on HEAD.
> 
> I like this patch.  Not only does it improve the behavior, but it's
> actually less code than we have now, and in my opinion, the new code
> is easier to understand, too.
> 
> A few suggestions:

Thanks for the review.

> - I think get_partition_dispatch_recurse() get a check_stack_depth()
> call just like expand_partitioned_rtentry() did, and for the same
> reasons: if the catalog contents are corrupted so that we have an
> infinite loop in the partitioning hierarchy, we want to error out, not
> crash.

Ah, missed that.  Done.

> - I think we should add a comment explaining that we're careful to do
> this in the same order as expand_partitioned_rtentry() so that callers
> can assume that the N'th entry in the leaf_part_oids array will also
> be the N'th child of an Append node.

Done.  Since the Append/ModifyTable may skip some leaf partitions due to
pruning, added a note about that too.

> +         * For every partitioned table other than root, we must store a
> 
> other than the root
> 
> +     * partitioned table.  The value multiplied back by -1 is returned as the
> 
> multiplied by -1, not multiplied back by -1
> 
> +     * tables in the tree, using which, search is continued further down the
> +     * partition tree.
> 
> Period after "in the tree".  Then continue: "This value is used to
> continue the search in the next level of the partition tree."

Fixed.

Attached updated patch.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Make-RelationGetPartitionDispatch-expansion-order-de.patch

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Khandekar

Date:

14 September 2017, 14:56:12

On 14 September 2017 at 06:43, Amit Langote
> Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Attached updated patch.

@@ -1222,151 +1209,130 @@ PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel,
                                  int

*num_parted, List **leaf_part_oids){
+       List   *pdlist;       PartitionDispatchData **pd;

+       get_partition_dispatch_recurse(rel, NULL, &pdlist, leaf_part_oids);

Above, pdlist is passed uninitialized. And then inside
get_partition_dispatch_recurse(), it is used here :
*pds = lappend(*pds, pd);

--------

pg_indent says more alignments needed. For e.g. gettext_noop() call
below needs to be aligned:
pd->tupmap = convert_tuples_by_name(RelationGetDescr(parent),
tupdesc,
gettext_noop("could not convert row type"));

--------

Other than that, the patch looks good to me. I verified that the leaf
oids are ordered exaclty in the order of the UPDATE subplans output.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

From

Robert Haas

Date:

14 September 2017, 19:37:00

On Thu, Sep 14, 2017 at 7:56 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 14 September 2017 at 06:43, Amit Langote
>> Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Attached updated patch.
>
> @@ -1222,151 +1209,130 @@ PartitionDispatch *
>  RelationGetPartitionDispatchInfo(Relation rel,
>                                                                  int
> *num_parted, List **leaf_part_oids)
>  {
> +       List   *pdlist;
>         PartitionDispatchData **pd;
>
> +       get_partition_dispatch_recurse(rel, NULL, &pdlist, leaf_part_oids);
>
> Above, pdlist is passed uninitialized. And then inside
> get_partition_dispatch_recurse(), it is used here :
> *pds = lappend(*pds, pd);
>
> --------
>
> pg_indent says more alignments needed. For e.g. gettext_noop() call
> below needs to be aligned:
> pd->tupmap = convert_tuples_by_name(RelationGetDescr(parent),
> tupdesc,
> gettext_noop("could not convert row type"));
>
> --------
>
> Other than that, the patch looks good to me. I verified that the leaf
> oids are ordered exaclty in the order of the UPDATE subplans output.

Committed with fixes for those issues and a few other cosmetic changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

From

Amit Langote

Date:

15 September 2017, 03:20:07

On 2017/09/15 1:37, Robert Haas wrote:
> On Thu, Sep 14, 2017 at 7:56 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> On 14 September 2017 at 06:43, Amit Langote
>>> Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> Attached updated patch.
>>
>> @@ -1222,151 +1209,130 @@ PartitionDispatch *
>>  RelationGetPartitionDispatchInfo(Relation rel,
>>                                                                  int
>> *num_parted, List **leaf_part_oids)
>>  {
>> +       List   *pdlist;
>>         PartitionDispatchData **pd;
>>
>> +       get_partition_dispatch_recurse(rel, NULL, &pdlist, leaf_part_oids);
>>
>> Above, pdlist is passed uninitialized. And then inside
>> get_partition_dispatch_recurse(), it is used here :
>> *pds = lappend(*pds, pd);
>>
>> --------
>>
>> pg_indent says more alignments needed. For e.g. gettext_noop() call
>> below needs to be aligned:
>> pd->tupmap = convert_tuples_by_name(RelationGetDescr(parent),
>> tupdesc,
>> gettext_noop("could not convert row type"));
>>
>> --------
>>
>> Other than that, the patch looks good to me. I verified that the leaf
>> oids are ordered exaclty in the order of the UPDATE subplans output.
> 
> Committed with fixes for those issues and a few other cosmetic changes.

Thanks Amit for the review and Robert for committing.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers