Home > mailing lists

Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date	April 20, 2017 16:16:50
Msg-id	CAFjFpRfxaCSaLOxUx4Ou9bSHYt12+yNjZpOGoggm8hqo6Z3X_g@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List	pgsql-hackers

Tree view

On Thu, Apr 20, 2017 at 3:35 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/04/20 15:45, Ashutosh Bapat wrote:
>> On Thu, Apr 20, 2017 at 10:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I don't understand why you think that partition-wise join needs any
>>> new logic here; if this were a non-partitionwise join, we'd similarly
>>> need to use the correct operator, but the existing code handles that
>>> just fine.  If the join is performed partition-wise, it should use the
>>> same operators that would have been used by a non-partitionwise join
>>> between the same tables.
>>>
>>> I think the choice of operator depends only on the column types, and
>>> that the "width" of those types has nothing to do with it.  For
>>> example, if the user writes .WHERE A.x = B.x AND B.x = C.x, the
>>> operator for an A/B join or a B/C join will be the one that appears in
>>> the query; parse analysis will have identified which specific operator
>>> is meant based on the types of the columns.  If the optimizer
>>> subsequently decides to reorder the joins and perform the A/C join
>>> first, it will go hunt down the operator with the same strategy number
>>> in the same operator family that takes the type of A.x on one side and
>>> the type of C.x on the other side.  No problem.  A partition-wise join
>>> between A and C will use that same operator; again, no problem.
>>>
>>> Your example involves joining the output of a join between i4 and i8
>>> against i2, so it seems there is some ambiguity about what the input
>>> type should be.  But, again, the planner already copes with this
>>> problem.  In fact, the join is performed either using i4.x or i8.x --
>>> I don't know what happens, or whether it depends on other details of
>>> the query or the plan -- and the operator which can accept that value
>>> on one side and i2.x on the other side is the one that gets used.
>>
>> I think you are confusing join condition application and partition
>> bounds of a join relation. What you have described above is how
>> operators are chosen to apply join conditions - it picks up the
>> correct operator from the operator family based on the column types
>> being used in join condition. That it can do because the columns being
>> joined are both present the relations being joined, irrespective of
>> which pair of relations is being joined. In your example, A.x, B.x and
>> C.x are all present on one of the sides of join irrespective of
>> whether the join is executed as (AB)C, A(BC) or (AC)B.
>>
>> But the problem we are trying to solve here about partition bounds of
>> the join relation: what should be the partition bounds of AB, BC or
>> AC? When we compare partition bounds of and intermediate join with
>> other intermediate join (e.g. AB with those of C) what operator should
>> be used? You seem to be suggesting that we keep as many sets of
>> partition bounds as there are base relations participating in the join
>> and then use appropriate partition bounds based on the columns in the
>> join conditions, so that we can use the same operator as used in the
>> join condition. That doesn't seem to be a good option since the
>> partition bounds will all have same values, only differing in their
>> binary representation because of differences in data types. I am of
>> the opinion that we save a single set of partition bounds. We have to
>> then associate a data type with bounds to know binary representation
>> of partition bound datums. That datatype would be one of the partition
>> key types of joining relations. I may be wrong in using term "wider"
>> since its associated with the length of binary reprentation. But we
>> need some logic to coalesce the two data types based on the type of
>> join and key type on the outer side.
>
> FWIW, I think that using any one of the partition bounds of the baserels
> being partitionwise-joined should suffice as the partition bound of any
> combination of joins involving two or more of those baserels, as long as
> the partitioning operator of each of the baserels is in the same operator
> family (I guess that *is* checked somewhere in the partitionwise join
> consideration flow).  IOW, partopfamily[] of all of the baserels should
> match and then the join clause operators involved should belong to the
> same respective operator families.

The partition bounds of different base rels may be different and we
have to compare them. Even we say that we join two tables with same
partition bounds using partitio-wise join, we need to make sure that
those partition bounds are indeed same, thus requiring to compare. And
to compare any datum we need to know its type.

>
> ISTM, the question here is about how to derive the partitioning properties
> of joinrels from those of the baserels involved.  Even if the join
> conditions refer to columns of different types on two sides, as long as
> the partitioning and joining is known to occur using operators of
> compatible semantics, I don't understand what more needs to be considered
> or done.  Although, I haven't studied things in enough detail to say
> anything confidently about whether join being INNER or OUTER has any
> bearing on the semantics of the partitioning of the joinrels in question.
> IIUC, using partitioning properties to apply partitionwise join technique
> at successive join levels will be affected by the OUTER considerations
> similar to how they affect what levels a give EquivalenceClass clause
> could be applied without causing any semantics violations.  As already
> mentioned upthread, it would be a good idea to have some integration of
> the partitioning considerations with the equivalence class mechanism (how
> ForeignKeyOptInfo contains links to ECs comes to mind).

This has been already discussed. I have showed earlier why equivalence
classes are not useful in this case.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

From: Amit Langote
Date: 20 April 2017, 16:05:14
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables

From: Suraj Kharage
Date: 20 April 2017, 16:38:18
Subject: [HACKERS] statement_timeout is not working as expected with postgres_fdw

Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

Previous

Next