Re: [HACKERS] advanced partition matching algorithm forpartition-wise join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
Date
Msg-id CAFjFpRfA35neUCYfptkdhEFobdrSnbobcknfXsMpjJsYXDokWg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
List pgsql-hackers
On Thu, Aug 23, 2018 at 4:01 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>> On Fri, 27 Jul 2018 at 20:13, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Fri, Jul 27, 2018 at 3:17 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>> > Apart from the complexity there's also a possibility that this
>> > skipping will reduce the efficiency actually in normal cases. Consider
>> > a case where A and B have exactly matching partitions. Current
>> > partition matching algorithm compare any given range/bound only once
>> > and we will have O(n) algorithm. If we use a binary search, however,
>> > for every range comparison, it will be O(n log n) algorithm. There
>> > will be unnecessary comparisons during binary search. The current
>> > algorithm is always O(n), whereas binary search would be O(n log(n))
>> > with a possibility of having sub-O(n) complexity in some cases. I
>> > would go for an algorithm which has a consistent time complexity
>> > always and which is efficient in normal cases, rather than worrying
>> > about some cases which are not practical.
>>
>> Yeah, I think that's a good point.  The normal case here will be that
>> the partition bounds are equal, or that there are a few extra
>> partitions on one side that don't exist on the other.  We don't want
>> other cases to be crazily inefficient, but I suspect in practice that
>> if the partitioning bounds aren't pretty close to matching up exactly,
>> we're going to end up failing to be able to do a partition-wise join
>> at all.  It's not very likely that somebody happens to have a case
>> where they've partitioned two tables in randomly different ways, but
>> then they decide to join them anyway, but then it turns out that the
>> partition bounds happen to be compatible enough that this algorithm
>> works.
>
>> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>> 1. those cases are rare enough that we can ignore those right now. How
>> many times we would encounter the case you have quoted, for example?
>> Usually the ranges will be continuous only differing in the first or
>> last partition e.g time-series data.
>
> Ok, but what about list partitioning? I believe the area of application for it
> can be more diverse than mostly just for time-series, and in the patch almost
> the same approach is used to merge list partitions.

Same arguments hold for list partitioning as well. For a list
partitioned table the bounds are ordered by list datums and not
partitions, so it will be rather odd to have large runs of mismatching
datums, the case where binary search fares, from one of side of join.
So, my following argument still holds true

---
>> > I would go for an algorithm which has a consistent time complexity
>> > always and which is efficient in normal cases, rather than worrying
>> > about some cases which are not practical.
---


>
> Other than that everything seems fine from functionality point of view, and so
> far I couldn't find any situation that produces a wrong plan. Some of the joins
> were not that intuitive from the first glance, but at the end everything was
> according to the documentation.

Thanks a lot for your tests and I am glad that you didn't find any failures.

> Taking this into account I wonder if it's possible somehow to give any hints in
> an explain result about why it wasn't possible to apply partition wise join? If
> something wasn't clear for me I ended up looking at the value of "merged" flag,
> and to figure out actual reason one needs to trace the entire algorithm.

That's kind of tricky in PostgreSQL. The optimizer doesn't always
report why a particular path was not chosen. Said that we could add
trace logs printing that information, however, the main difficulty is
reporting the relations being joined.See 0004 for example.

>
> Besides that I checked the performance in some simple cases, no problems on
> this side (but I'll also do some checks for more complicated joins).

Thanks a lot.

>
> And I still have some complaints about readability, although I can imagine that
> it's just me:
>
> * Many functions carry some unrelated arguments just to pass them through,
>   which obscures the purpose of a function.

Can you please provide some examples?

>
> * I want to suggest to introduce a new data structure for partitions mapping
>   instead of just keeping them in arrays (was it discussed already before?).

The best I could think of was a structure with just two arrays. So,
instead of encapsulating the arrays within a structure, I thought it
best to pass bare arrays around. If you have any other ideas, please
let me know.

>
> * What is the reason that almost everywhere in the patch there is a naming
>   convention "outer/inner" partition maps, and sometimes (e.g. in
>   generate_matching_part_pairs) it's "map1/map2"?

You will find that the optimizer in general uses outer/inner,
rel1/rel2 nomenclature interchangeably. This patch-set just inherits
that. But yes, we should do more to straighten it out.

I won't be working on this actively in the next commitfest. I will be
glad if somebody else wants to take this up. If there's nobody,
probably we should mark this entry as "returned with feedback" in the
next commitfest.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Mariel Cherkassky
Date:
Subject: Re: Catalog corruption
Next
From: Christoph Berg
Date:
Subject: Re: logical decoding: ABI break in 10.5 et al