Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CAEepm=16SDdJ9yy7Dmoc+Y6yxsv+f2RDbgOoNHa_8Yz3m61DPA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
|
List | pgsql-hackers |
On Tue, Aug 8, 2017 at 8:51 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > Updated patches attached. Hi, I started reviewing this. It's nicely commented, but it's also very complicated, and it's going to take me several rounds to understand what all the parts do, but here's some assorted feedback after reading some parts of the patches, some tinkering and quite a bit of time spent trying to break it (so far unsuccessfully). On my computer it took ~1.5 seconds to plan a 1000 partition join, ~7.1 seconds to plan a 2000 partition join, and ~50 seconds to plan a 4000 partition join. I poked around in a profiler a bit and saw that for the 2000 partition case I spent almost half the time in create_plan->...->prepare_sort_from_pathkeys->find_ec_member_for_tle, and about half of that was in bms_is_subset. The other half the time was in query_planner->make_one_rel which spent 2/3 of its time in set_rel_size->add_child_rel_equivalences->bms_overlap and the other 1/3 in standard_join_search. One micro-optimisation opportunity I noticed was in those bms_is_subset and bms_overlap calls. The Bitmapsets don't tend to have trailing words but often have hundreds of empty leading words. If I hack bitmapset.{c,h} so that it tracks first_non_empty_wordnum and then adjust bms_is_subset and bms_overlap so that they start their searches at Min(a->first_non_empty_wordnum, b->first_non_empty_wordnum) then the planning time improves measurably: 1000 partitions: ~1.5s -> 1.3s 2000 partitions: ~7.1s -> 5.8s 4000 partitions: ~50s -> ~44s When using list-based partitions, it must be possible to omit the part of a join key that is implied by the partition because the partition has only one list value. For example, if I create a two level hierarchy with one partition per US state and then time-based range partitions under that, the state part of this merge condition is redundant: Merge Cond: ((sales_wy_2017_10.state = purchases_wy_2017_10.state) AND (sales_wy_2017_10.created = purchases_wy_2017_10.created)) 0003-Refactor-partition_bounds_equal-to-be-used-without-P.patch -partition_bounds_equal(PartitionKey key, +partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval, PartitionBoundInfob1, PartitionBoundInfo b2) I wonder is there any value in creating a struct to represent the common part of PartitionKey and PartitionScheme that functions like this and others need? Maybe not. Perhaps you didn't want to make PartitionKey contain a PartitionScheme because then you'd lose the property that every pointer to PartitionScheme in the system must be a pointer to an interned (canonical) PartitionScheme, so it's always safe to compare pointers to test for equality? 0005-Canonical-partition-scheme.patch: +/* + * get_relation_partition_info + * + * Retrieves partitioning information for a given relation. + * + * For a partitioned table it sets partitioning scheme, partition key + * expressions, number of partitions and OIDs of partitions in the given + * RelOptInfo. + */ +static void +get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, + Relation relation) Would this be better called "set_relation_partition_info"? It doesn't really "retrieve" the information, it "installs" it. +{ + PartitionDesc part_desc; + + /* No partitioning information for an unpartitioned relation. */ + if (relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE || + !(rel->part_scheme = find_partition_scheme(root, relation))) + return; Here and elsewhere you use the idiom !(foo = bar), which is perfectly good C in my book but I understand the project convention is to avoid implicit pointer->boolean conversion and to prefer expressions like (foo = bar) != NULL and there is certainly a lot more code like that. 0007-Partition-wise-join-implementation.patch + {"enable_partition_wise_join", PGC_USERSET, QUERY_TUNING_METHOD, This GUC should appear in postgresql.conf.sample. I'm chewing on 0007. More soon. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: