Thread: [HACKERS] paths in partitions of a dummy partitioned table

[HACKERS] paths in partitions of a dummy partitioned table

From
Ashutosh Bapat
Date:
If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the
partition relations dummy and thus doesn't set any (dummy) paths in the
partition relations. The lack of paths in the partitions means that we can
not use partition-wise join while joining this table with some other similarly
partitioned table as the partitions do not have any paths that child-joins can
use. This means that we can not create partition relations for such a join and
thus can not consider the join to be partitioned. This doesn't matter much when
the dummy relation is the outer relation, since the resultant join is also
dummy. But when the dummy relation is inner relation, the resultant join is not
dummy and can be considered to be partitioned with same partitioning scheme as
the outer relation to be joined with other similarly partitioned table. Not
having paths in the partitions deprives us of this future optimization.
Without partition-wise join, not setting dummy paths in the partition relations
makes sense, since those do not have any use. But with partition-wise join that
changes.

If we always mark the partitions dummy, that effort may get wasted if the
partitioned table doesn't participate in the partition-wise join. A possible
solution would be to set the partitions dummy when only the partitioned table
participates in the join, but that happens during make_join_rel(), much after
we have set up the simple relations. So, I am not sure, whether that's a good
option. But I am open to suggestions.

What if we always mark the partition relations of a dummy partitioned table
dummy? I tried attached path on a thousand partition table, the planning time
increased by 3-5%. That doesn't look that bad for a thousand partitions.

Any other options/comments?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] paths in partitions of a dummy partitioned table

From
Kyotaro HORIGUCHI
Date:
Hello,

At Thu, 6 Jul 2017 21:05:21 +0530, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote in
<CAFjFpRd5+zroxY7UMGTR2M=rjBV4aBOCxQg3+1rBmTPLK5mpDg@mail.gmail.com>
> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the
> partition relations dummy and thus doesn't set any (dummy) paths in the
> partition relations. The lack of paths in the partitions means that we can

A parent won't be proven dummy directly but be a dummy rel (means
IS_DUMMY_REL(rel) returns true) if no child is attached as the
result of constarint exlusion.

> not use partition-wise join while joining this table with some other similarly
> partitioned table as the partitions do not have any paths that child-joins can
> use. This means that we can not create partition relations for such a join and
> thus can not consider the join to be partitioned. This doesn't matter much when
> the dummy relation is the outer relation, since the resultant join is also
> dummy. But when the dummy relation is inner relation, the resultant join is not
> dummy and can be considered to be partitioned with same partitioning scheme as
> the outer relation to be joined with other similarly partitioned table. Not
> having paths in the partitions deprives us of this future optimization.
> Without partition-wise join, not setting dummy paths in the partition relations
> makes sense, since those do not have any use. But with partition-wise join that
> changes.

Of course for inner-joins and even for outer-joins, there's no
point in considering the children of a dummy parent as a side of
a join. For what reason, or in what case does partition-wise join
need to consider children of a proven-dummy parent?

> If we always mark the partitions dummy, that effort may get wasted if the
> partitioned table doesn't participate in the partition-wise join. A possible
> solution would be to set the partitions dummy when only the partitioned table
> participates in the join, but that happens during make_join_rel(), much after
> we have set up the simple relations. So, I am not sure, whether that's a good
> option. But I am open to suggestions.
> 
> What if we always mark the partition relations of a dummy partitioned table
> dummy? I tried attached path on a thousand partition table, the planning time
> increased by 3-5%. That doesn't look that bad for a thousand partitions.
> 
> Any other options/comments?

Since I don't understand the meaning of the patch, the comments
below are just from a micro-viewpoint.

-    if (IS_DUMMY_REL(rel))
+    if (IS_DUMMY_REL(rel) && !rte->inh)

In set_rel_pathlist, if want to exlude the inh==true case from
the first if, just inverting the first and second if caluses
would be enough, like this.

|  if (rte->inh)
|    set_append_rel_pathlist(root, rel, rti, rte);
|  else if (IS_DUMMY_REL(rel))
|    /* We already proved the relation empty, so nothing more to do */


+        if (IS_DUMMY_REL(rel))
+            set_dummy_rel_pathlist(childrel);

This is equivalent of just returning before looking over
append_rel_list when rel is a dummy. It doesn't seem to me to add
marked-as-dummy children to a dummy parent. (I understand that is
the objective of this patch.)

# Maybe I'm looking a different version of the source code, or
# I'm terriblly misunderstanding something..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] paths in partitions of a dummy partitioned table

From
Ashutosh Bapat
Date:
On Mon, Jul 10, 2017 at 2:59 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,

Thanks for the review.

>> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the
>> partition relations dummy and thus doesn't set any (dummy) paths in the
>> partition relations. The lack of paths in the partitions means that we can
>
> A parent won't be proven dummy directly but be a dummy rel (means
> IS_DUMMY_REL(rel) returns true) if no child is attached as the
> result of constarint exlusion.

In a query like SELECT * FROM parent WHERE 1 = 2; the parent is marked
dummy in set_rel_size()
 319 set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 320              Index rti, RangeTblEntry *rte)
 321 {
 322     if (rel->reloptkind == RELOPT_BASEREL &&
 323         relation_excluded_by_constraints(root, rel, rte))
 324     {
 325         /*
 326          * We proved we don't need to scan the rel via constraint
exclusion,
 327          * so set up a single dummy path for it.  Here we only
check this for
 328          * regular baserels; if it's an otherrel, CE was already checked in
 329          * set_append_rel_size().
 330          *
 331          * In this case, we go ahead and set up the relation's
path right away
 332          * instead of leaving it for set_rel_pathlist to do.
This is because
 333          * we don't have a convention for marking a rel as dummy except by
 334          * assigning a dummy path to it.
 335          */
 336         set_dummy_rel_pathlist(rel);
 337     }

It's in such cases, that a parent is proven dummy without looking at
the child relations.

>
>> not use partition-wise join while joining this table with some other similarly
>> partitioned table as the partitions do not have any paths that child-joins can
>> use. This means that we can not create partition relations for such a join and
>> thus can not consider the join to be partitioned. This doesn't matter much when
>> the dummy relation is the outer relation, since the resultant join is also
>> dummy. But when the dummy relation is inner relation, the resultant join is not
>> dummy and can be considered to be partitioned with same partitioning scheme as
>> the outer relation to be joined with other similarly partitioned table. Not
>> having paths in the partitions deprives us of this future optimization.
>> Without partition-wise join, not setting dummy paths in the partition relations
>> makes sense, since those do not have any use. But with partition-wise join that
>> changes.
>
> Of course for inner-joins and even for outer-joins, there's no
> point in considering the children of a dummy parent as a side of
> a join. For what reason, or in what case does partition-wise join
> need to consider children of a proven-dummy parent?

Consider a join A LEFT JOIN B LEFT JOIN C where B is proven dummy and
A, B and C are all partitioned with the same partitioning scheme. The
result of A LEFT JOIN B is same as appending Ak LEFT JOIN Bk where Ak
and Bk are matching partitions of A and B resp. When we join this with
C, the result would be same as appending Ak LEFT JOIN Bk LEFT JOIN Ck
where Ck is partition matching Ak and Bk. So, even though B is proven
dummy, we can execute A LEFT JOIN B LEFT JOIN C as a partition-wise
join, if we can execute A LEFT JOIN B as a partition-wise join.

Does that answer your question?

>
>> If we always mark the partitions dummy, that effort may get wasted if the
>> partitioned table doesn't participate in the partition-wise join. A possible
>> solution would be to set the partitions dummy when only the partitioned table
>> participates in the join, but that happens during make_join_rel(), much after
>> we have set up the simple relations. So, I am not sure, whether that's a good
>> option. But I am open to suggestions.
>>
>> What if we always mark the partition relations of a dummy partitioned table
>> dummy? I tried attached path on a thousand partition table, the planning time
>> increased by 3-5%. That doesn't look that bad for a thousand partitions.
>>
>> Any other options/comments?
>
> Since I don't understand the meaning of the patch, the comments
> below are just from a micro-viewpoint.
>
> -       if (IS_DUMMY_REL(rel))
> +       if (IS_DUMMY_REL(rel) && !rte->inh)
>
> In set_rel_pathlist, if want to exlude the inh==true case from
> the first if, just inverting the first and second if caluses
> would be enough, like this.
>
> |  if (rte->inh)
> |    set_append_rel_pathlist(root, rel, rti, rte);
> |  else if (IS_DUMMY_REL(rel))
> |    /* We already proved the relation empty, so nothing more to do *

Right. Done in the attached patch.

>
> +               if (IS_DUMMY_REL(rel))
> +                       set_dummy_rel_pathlist(childrel);
>
> This is equivalent of just returning before looking over
> append_rel_list when rel is a dummy. It doesn't seem to me to add
> marked-as-dummy children to a dummy parent. (I understand that is
> the objective of this patch.)

Not really. It sets the dummy path list in child, which won't be done
if we return without traversing append_rel_list, esp. when the parent
is marked dummy without marking any of its children dummy as explained
above.

But we could avoid marking a child dummy twice, in case the parent was
marked dummy because all of its children were marked dummy.  I have
fixed this case in the attached patch.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] paths in partitions of a dummy partitioned table

From
Robert Haas
Date:
On Thu, Jul 6, 2017 at 11:35 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the
> partition relations dummy and thus doesn't set any (dummy) paths in the
> partition relations. The lack of paths in the partitions means that we can
> not use partition-wise join while joining this table with some other similarly
> partitioned table as the partitions do not have any paths that child-joins can
> use. This means that we can not create partition relations for such a join and
> thus can not consider the join to be partitioned. This doesn't matter much when
> the dummy relation is the outer relation, since the resultant join is also
> dummy. But when the dummy relation is inner relation, the resultant join is not
> dummy and can be considered to be partitioned with same partitioning scheme as
> the outer relation to be joined with other similarly partitioned table. Not
> having paths in the partitions deprives us of this future optimization.

I think it's wrong for any code to be examining the path list for a
rel marked dummy, so I would suggest approaching this from a different
direction.  Given A LEFT JOIN B where Bk is dummy, I suggest
constructing the path for (AB)k by taking a path from Ak and applying
an appropriate PathTarget.  You don't really need a join path at all;
a path for the non-dummy input is fine - and, in fact, better, since
it will be cheaper to execute.  One problem is that it may not produce
the correct output columns.  (AB) may drop some columns that were
being generated by A because they were only needed to perform the
join, and it may add additional columns taken from B.  But I think
that if you take the default PathTarget for (AB) and replace
references to columns of B with NULL constants, you should be able to
apply the resulting PathTarget to a path for Ak and get a valid path
for (AB)k.  Maybe there is some reason why that won't work exactly,
but I think it is probably more promising to attack the problem from
this angle than to do what you propose.  Sticking dummy joins into the
query plan that are really just projecting out NULLs is not appealing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] paths in partitions of a dummy partitioned table

From
Ashutosh Bapat
Date:
On Fri, Aug 25, 2017 at 10:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jul 6, 2017 at 11:35 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the
>> partition relations dummy and thus doesn't set any (dummy) paths in the
>> partition relations. The lack of paths in the partitions means that we can
>> not use partition-wise join while joining this table with some other similarly
>> partitioned table as the partitions do not have any paths that child-joins can
>> use. This means that we can not create partition relations for such a join and
>> thus can not consider the join to be partitioned. This doesn't matter much when
>> the dummy relation is the outer relation, since the resultant join is also
>> dummy. But when the dummy relation is inner relation, the resultant join is not
>> dummy and can be considered to be partitioned with same partitioning scheme as
>> the outer relation to be joined with other similarly partitioned table. Not
>> having paths in the partitions deprives us of this future optimization.
>
> I think it's wrong for any code to be examining the path list for a
> rel marked dummy, so I would suggest approaching this from a different
> direction.

Me and Robert had an offline discussion about this. I am summarizing
it here for the sake of completeness.

A dummy relation is identified by the only dummy path that exists in
its pathlist. There is no flag in RelOptInfo which tells whether a
given relation is dummy or not, it's the dummy path which tells that.
A dummy path is an Append path with no subpaths. Planner doesn't treat
dummy relations any different from other relations when it comes to
using paths. When a dummy relation participates in a join, the dummy
path is used as one of the joining paths and converted to a Result
plan at the time of planning. So, for a partition-wise join where one
of the joining relations is dummy, its every child must have dummy
path which can be used to construct child-join paths.

But we don't need to mark partition relations dummy (if their parent
is dummy) even when it's not going to participate in partition-wise
join. The partition relations will be marked dummy when we know that
they will be required for partition-wise join. I was worried that we
might mark base relation dummy during join planning this way, but we
already have a precedence for that in add_paths_to_join_rel(). So,
shouldn't be a problem. So, I have now added a patch in partition-wise
join set to mark partition relations dummy when their parent is dummy.

> Given A LEFT JOIN B where Bk is dummy, I suggest
> constructing the path for (AB)k by taking a path from Ak and applying
> an appropriate PathTarget.  You don't really need a join path at all;
> a path for the non-dummy input is fine - and, in fact, better, since
> it will be cheaper to execute.  One problem is that it may not produce
> the correct output columns.  (AB) may drop some columns that were
> being generated by A because they were only needed to perform the
> join, and it may add additional columns taken from B.  But I think
> that if you take the default PathTarget for (AB) and replace
> references to columns of B with NULL constants, you should be able to
> apply the resulting PathTarget to a path for Ak and get a valid path
> for (AB)k.  Maybe there is some reason why that won't work exactly,
> but I think it is probably more promising to attack the problem from
> this angle than to do what you propose.  Sticking dummy joins into the
> query plan that are really just projecting out NULLs is not appealing.
>

This might help in the cases when the RelOptInfo itself is missing
e.g. missing partitions in partition matching as discussed in [1]. I
will discuss this approach on that thread.

[1] https://www.postgresql.org/message-id/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company