Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CAFjFpRfkr7igCGBBWH1PQ__W-XPy1O79Y-qxCmJc6FizLqFz7Q@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
|
List | pgsql-hackers |
On Wed, Aug 9, 2017 at 7:09 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > > I started reviewing this. It's nicely commented, but it's also very > complicated, and it's going to take me several rounds to understand > what all the parts do, but here's some assorted feedback after reading > some parts of the patches, some tinkering and quite a bit of time > spent trying to break it (so far unsuccessfully). > Thanks for testing the patch. Good to know it has withstood your testing. > On my computer it took ~1.5 seconds to plan a 1000 partition join, > ~7.1 seconds to plan a 2000 partition join, and ~50 seconds to plan a > 4000 partition join. I poked around in a profiler a bit and saw that > for the 2000 partition case I spent almost half the time in > create_plan->...->prepare_sort_from_pathkeys->find_ec_member_for_tle, > and about half of that was in bms_is_subset. The other half the time > was in query_planner->make_one_rel which spent 2/3 of its time in > set_rel_size->add_child_rel_equivalences->bms_overlap and the other > 1/3 in standard_join_search. Thanks for profiling. I have separately mailed about bitmapset improvements. Equivalence classes contain all the expressions which are known to be equal in EquivalenceClass::ec_members. For a partitioned table, there will be as many expressions as the number of children. The child expressions are marked as em_is_child and are looked at only when child relids are available to the function scanning the members. The number of equivalence members increases linearly with the number of partitions, and the number of words in the bitmaps increases linearly with the number of partitions, effectively the the number of words scanned increases quadratically. Hence the superlinear increase in time with the number of partitions. When I took separate profiles with 1000, 2000 and 4000 partitions resp. I see that 15%, 29% and 40% time spent in bms_is_subset() resp. I am not sure how much we can do in this patchset to reduce this problem. Apart from your bitmapset optimization, we could perhaps use some more efficient data structure other than list to search members based on the relids OR re-use parent's expressions for child somehow. I have been thinking about the second option, but never got a chance to work on it. > > When using list-based partitions, it must be possible to omit the part > of a join key that is implied by the partition because the partition > has only one list value. For example, if I create a two level > hierarchy with one partition per US state and then time-based range > partitions under that, the state part of this merge condition is > redundant: > > Merge Cond: ((sales_wy_2017_10.state = > purchases_wy_2017_10.state) AND (sales_wy_2017_10.created = > purchases_wy_2017_10.created)) That's a good idea. In fact, we could use a similar trick when the condition is sales_wy_2017_10.state = 'state'. We can not use the trick in case of DML or when there are locking clauses, since we need to evaluate the qual in case the row underneath changes while locking it. We also can not do this when one of the keys being compared is a nullable partition key (a concept explained in partition-wise join implementation patch), since a partition can have also have rows with NULL values for such partition keys in that partition. I think the idea has merit, although, I think we should handle it targetting more generic cases like the one stated above. > > 0003-Refactor-partition_bounds_equal-to-be-used-without-P.patch > > -partition_bounds_equal(PartitionKey key, > +partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval, > PartitionBoundInfo b1, > PartitionBoundInfo b2) > > I wonder is there any value in creating a struct to represent the > common part of PartitionKey and PartitionScheme that functions like > this and others need? Maybe not. Perhaps you didn't want to make > PartitionKey contain a PartitionScheme because then you'd lose the > property that every pointer to PartitionScheme in the system must be a > pointer to an interned (canonical) PartitionScheme, so it's always > safe to compare pointers to test for equality? Right. Other reason to keep those two separate, is we might change the contents of PartitionScheme as we move forward with the reviews. May be we should revisit it after we have finalised the design. > > 0005-Canonical-partition-scheme.patch: > > +/* > + * get_relation_partition_info > + * > + * Retrieves partitioning information for a given relation. > + * > + * For a partitioned table it sets partitioning scheme, partition key > + * expressions, number of partitions and OIDs of partitions in the given > + * RelOptInfo. > + */ > +static void > +get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, > + Relation relation) > > Would this be better called "set_relation_partition_info"? It doesn't > really "retrieve" the information, it "installs" it. Yes. Done. > > +{ > + PartitionDesc part_desc; > + > + /* No partitioning information for an unpartitioned relation. */ > + if (relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE || > + !(rel->part_scheme = find_partition_scheme(root, relation))) > + return; > > Here and elsewhere you use the idiom !(foo = bar), which is perfectly > good C in my book but I understand the project convention is to avoid > implicit pointer->boolean conversion and to prefer expressions like > (foo = bar) != NULL and there is certainly a lot more code like that. PG code uses both the styles, search "if (!" in execExpr.c, createplan.c for example. I find this style useful, when I want to code, say "if this relation does not have a partitioning scheme" rather than "if this relation have NULL partitioning scheme". > > 0007-Partition-wise-join-implementation.patch > > + {"enable_partition_wise_join", PGC_USERSET, QUERY_TUNING_METHOD, > > This GUC should appear in postgresql.conf.sample. Done. Attached patches with the comments addressed. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: