Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers
From | Jesper Pedersen |
---|---|
Subject | Re: [HACKERS] path toward faster partition pruning |
Date | |
Msg-id | 08062340-3e9a-302a-8b63-c4c27aa8196b@redhat.com Whole thread Raw |
In response to | Re: [HACKERS] path toward faster partition pruning (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] path toward faster partition pruning
|
List | pgsql-hackers |
On 09/26/2017 10:33 AM, Robert Haas wrote: > On Tue, Sep 26, 2017 at 9:00 AM, Jesper Pedersen > <jesper.pedersen@redhat.com> wrote: >> Could you share your thoughts on the usage of PartitionAppendInfo's >> min_datum_idx / max_datum_idx ? Especially in relation to hash partitions. > > This brings up something that I've kind of been thinking about. There > are sort of four cases when it comes to partition pruning: > > 1. There is exactly one matching partition. For example, this happens > when there is an equality constraint on every partition column. > > 2. There are multiple matching partitions which are consecutive. For > example, there is a single level of range partitioning with no default > partition and the single partitioning column is constrained by < > <= > or >=. > > 3. There are multiple matching partitions which are not consecutive. > This case is probably rare, but it can happen if there is a default > partition, if there are list partitions with multiple bounds that are > interleaved (e.g. p1 allows (1, 4), p2 allows (2), p3 allows (3, 5), > and the query allows values >= 4 and <= 5), if the query involves OR > conditions, or if there are multiple levels of partitioning (e.g. > partition by a, subpartition by b, put a range constraint on a and an > equality constraint on b). > > 4. There are no matching partitions. > > One of the goals of this algorithm is to be fast. The obvious way to > cater to case (3) is to iterate through all partitions and test > whether each one works, returning a Bitmapset, but that is O(n). > Admittedly, it might be O(n) with a pretty small constant factor, but > it still seems like exactly the sort of thing that we want to avoid > given the desire to scale to higher partition counts. > > I propose that we create a structure that looks like this: > > struct foo { > int min_partition; > int max_partition; > Bitmapset *extra_partitions; > }; > > This indicates that all partitions from min_partition to max_partition > need to be scanned, and in addition any partitions in extra_partitions > need to be scanned. Assuming that we only consider cases where all > partition keys or a leading subset of the partition keys are > constrained, we'll generally be able to get by with just setting > min_partition and max_partition, but extra_partitions can be used to > handle default partitions and interleaved list bounds. For equality > on all partitioning columns, we can do a single bsearch of the bounds > to identify the target partition at a given partitioning level, and > the same thing works for a single range-bound. If there are two > range-bounds (< and > or <= and >= or whatever) we need to bsearch > twice. The default partition, if any and if matched, must also be > included. When there are multiple levels of partitioning things get a > bit more complex -- if someone wants to knock out a partition that > breaks up the range, we might need to shrink the main range to cover > part of it and kick the other indexes out to extra_partitions. > > But the good thing is that in common cases with only O(lg n) effort we > can return O(1) data that describes what will be scanned. In cases > where that's not practical we expend more effort but still prune with > maximal effectiveness. > For OLTP style applications 1) would be the common case, and with hash partitions it would be one equality constraint. So, changing the method signature to use a data type as you described above instead of the explicit min_datum_idx / max_datum_idx output parameters would be more clear. One could advocate (*cough*) that the hash partition patch [1] should be merged first in order to find other instances of where other CommitFest entries doesn't account for hash partitions at the moment in their method signatures; Beena noted something similar in [2]. I know that you said otherwise [3], but this is CommitFest 1, so there is time for a revert later, and hash partitions are already useful in internal testing. [1] https://commitfest.postgresql.org/14/1059/ [2] https://www.postgresql.org/message-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A%40mail.gmail.com [3] http://rhaas.blogspot.com/2017/08/plans-for-partitioning-in-v11.html Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: