Re: [HACKERS] advanced partition matching algorithm forpartition-wise join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
Date
Msg-id CAFjFpRdEawGntehu9oKNFZvm2JAS5wJ_qT+5vrvwWt=h8GKUbw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Jul 26, 2018 at 8:37 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>>
>> On Fri, Jul 20, 2018 at 3:13 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>> >
>> > It's of course wrong, it's going to be O(max(m, n)) as you said, but
>> > the point is still valid - if we have partitions A1, A2 from one side
>> > and B1, ..., BN on another side, we can skip necessary the
>> > partitions from B that are between e.g. A1 and A2 faster with
>> > binary search.
>>
>> That's possible only when there is no default partition and the join
>> is an inner join. For an outer join, we need to include all the
>> partitions on the outer side, so we can't just skip over some
>> partitions. In case of a default partition, it can take place of a
>> missing partition, so we can't skip partitions using binary search.
>
> But in this case I described above (when we have a partition A3 on one side,
> and another matching partition B3 from other side, but separated by some other
> partitions, e.g. B1, B2) it means that we will merge A3 with a default
> partition from other side without actually needing that, am I right? In this
> sense it's an overhead out of nothing.

No. We will join A3 with B3 since they have matching bounds. We will
compare B1 and B2's bounds with A3 (assuming that there are no bounds
before A3). Since they won't be compatible we will match default of A
to B1 and B2. That will of-course fail since we will try to match
multiple partitions to a single partition, but that's not the point of
your question I think. You are right that we could skip comparing A3
with B1 and B2 and directly land on B3. Any partitions skipped in
between will be matched with A's default partition. But as I have said
this would be rare and complicating the logic for a rare case doesn't
seem practical at this stage.

Apart from the complexity there's also a possibility that this
skipping will reduce the efficiency actually in normal cases. Consider
a case where A and B have exactly matching partitions. Current
partition matching algorithm compare any given range/bound only once
and we will have O(n) algorithm. If we use a binary search, however,
for every range comparison, it will be O(n log n) algorithm. There
will be unnecessary comparisons during binary search. The current
algorithm is always O(n), whereas binary search would be O(n log(n))
with a possibility of having sub-O(n) complexity in some cases. I
would go for an algorithm which has a consistent time complexity
always and which is efficient in normal cases, rather than worrying
about some cases which are not practical.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Speeding up INSERTs and UPDATEs to partitioned tables
Next
From: David Fetter
Date:
Subject: Re: Early WIP/PoC for inlining CTEs