Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled. - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
Date
Msg-id 5B3F4BAA.3050601@lab.ntt.co.jp
Whole thread Raw
In response to Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
(2018/07/04 21:37), Ashutosh Bapat wrote:
> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/07/04 19:04), Ashutosh Bapat wrote:
>>> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>>>> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>>>> +       if (enable_partitionwise_join&&
>>>>> rel->top_parent_is_partitioned)
>>>>> +       {
>>>>> +           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>>>>> +       }
>>>>>
>>>>> Why do we need rel->top_parent_is_partitioned? If a relation is
>>>>> partitioned (if (rel->part_scheme), it's either the top parent or is
>>>>> partition of some other partitioned table. In either case this
>>>>> condition will be true.

>>>> This would be needed to avoid unnecessarily applying build_childrel_tlist
>>>> to
>>>> child rels of a partitioned table for which we don't consider
>>>> partitionwise
>>>> join.  Consider:
>>>>
>>>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>>>> CREATE TABLE
>>>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>>>> CREATE TABLE
>>>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>>>> CREATE TABLE
>>>> postgres=# create table test (c1 int, c2 text);
>>>> CREATE TABLE
>>>> postgres=# explain verbose select * from (select * from lpt union all
>>>> select
>>>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);

>> I might misunderstand your words, but in the above example the patch doesn't
>> apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for that is
>> because we can avoid adjusting the tlists for the corresponding subplans at
>> plan creation time so that whole-row Vars in the tlists are transformed into
>> CREs.  I think the overhead of the adjustment is not that big, but not zero,
>> so it would be worth avoiding applying build_childrel_tlist to partitions if
>> the top parent won't participate in a partitionwise-join at all.
>
> I don't think that answers my question. When we join lpt with test,
> your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even
> when join between lpt and test is not going to use partition-wise
> join. Why?

Maybe my explanation including the example was not good.  Sorry about 
that, but my patch will *not* apply build_childrel_tlist to lpt_p1 and 
lpt_p2 since the top parent of lpt_p1 and lpt_p2 is the UNION ALL 
subquery and hence not a partitioned table (ie, we have 
rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2).

> As per your explanation, the condition "if
> (enable_partitionwise_join&&   rel->top_parent_is_partitioned)" is
> used to avoid applying build_childrel_tlist() when partition-wise join
> won't be possible. But it's not covering all the cases.

Let me explain about that: 1) my patch won't apply that function to a 
child if its top parent is an appendrel built from a UNION ALL subquery, 
even though the child is a partition of a partitioned table pulled up 
from a leaf subquery into the parent query, like lpt_p1, and 2) my patch 
will apply that function to a child if its top parent is a partitioned 
table, whether or not the partitioned table is involved in a 
partitionwise join.  By #1, we avoid the adjustment work at plan 
creation time, as explained above.  It might be worth trying to be 
smarter about #2 (for example, in the case of a join of a partitioned 
table and a non-partitioned table, since we don't consider a 
partitionwise join for that join, it's better to not apply that function 
to partitions of the partitioned table, to avoid the adjustment work at 
plan creation time), but ISTM we don't have enough information to be 
smarter.

>>> An
>>> in-between state will produce a hell lot of confusion for any further
>>> optimization. Whenever we change the code around partition-wise
>>> operations in future, we will have to check whether or not a given
>>> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
>>> have mentioned earlier, I am also not comfortable with the targetlist
>>> of child relations being type inconsistent with that of the parent,
>>> which is a fundamental rule in inheritance. Worst keep them
>>> inconsistent during path creation and make them consistent at the time
>>> of creating plans. A child's whole-row Var has datatype of the child
>>> where as that of parent has datatype of parent.
>>
>>
>> I don't see any critical issue here.  Could you elaborate a bit more on that
>> point?
>
> I think breaking a fundamental rule like this itself is critical. But
> interestingly I am not able to find a case where it becomes a problem.
> But may be I haven't tried enough. And the question is if it's not
> required to have the targetlists type consistent, why even bother with
> ConvertRowtypeExpr addition there? We can use your approach of adding
> ConvertRowtypeExpr at the end in all the cases.

I think that the tlist of a (not-the-topmost) child relation doesn't 
need to be type-consistent with that of the parent; it has only to 
include all Vars that are needed for higher joinquals and final output 
to the parent appendrel.  (In other words, I think we can build the 
tlist in the same manner as we build tlists of base or join relations in 
the main join tree.)  On the other hand, other expressions such as WHERE 
quals need to be type-consistent with those of the parent; else we would 
create a plan that produces the wrong result.  So, with my patch we 
carry out special handling for the tlist; it includes a child whole-row 
Var, if needed, instead of a CRE converting the Var to the parent 
rowtype.  That allows us to create/process child-join plans using the 
existing planner functions as-is.

>>> A ConvertRowtypeExpr
>>> is used to fix this inconsistency. That's why I chose to use
>>> pull_var_clause() as a place to fix the problem and not fix
>>> ConvertRowtypeExpr in targetlist itself.
>>
>>
>> I think the biggest issue in the original patch you proposed is that we
>> spend extra cycles where partitioning is not involved, which is the biggest
>> reason why I think the original patch isn't the right way to go.
>
> When there are no partitions involved, there won't be any
> ConvertRowtypeExprs there which means the function
> is_converted_whole_row_reference() would just return from the first
> line checking IsA() and nullness of node.

Really?  IIRC, with the original patch we would spend extra cycles in a 
non-partitioned inheritance processing [1].

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5AFC0865.8050802%40lab.ntt.co.jp


pgsql-hackers by date:

Previous
From: "Kato, Sho"
Date:
Subject: RE: Speeding up INSERTs and UPDATEs to partitioned tables
Next
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] Crash on promotion when recovery.conf is renamed