Re: [HACKERS] advanced partition matching algorithm forpartition-wise join - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: [HACKERS] advanced partition matching algorithm forpartition-wise join |
Date | |
Msg-id | 48CB1153-65B0-4BC9-843D-C8935A2CA44C@enterprisedb.com Whole thread Raw |
In response to | Re: [HACKERS] advanced partition matching algorithm forpartition-wise join (Etsuro Fujita <etsuro.fujita@gmail.com>) |
Responses |
Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
|
List | pgsql-hackers |
> On Feb 5, 2020, at 4:51 AM, Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Jan 29, 2020 at 9:15 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: >> On Tue, Jan 28, 2020 at 1:39 PM Mark Dilger >> <mark.dilger@enterprisedb.com> wrote: >>> I have added tests checking correctness and showing some partition pruning limitations. Find three patches, attached. >>> >>> The v31-0001-… patch merely applies your patches as a starting point for the next two patches. It is your work, notmine. >>> >>> The v31-0002-… patch adds more regression tests for range partitioning. The commit message contains my comments aboutthat. >>> >>> The v31-0003-… patch adds more regression tests for list partitioning, and again, the commit message contains my commentsabout that. >> >> I'll dig into it more closely. > > I tested the new test patches. The patches are applied to the > partition_join.sql regression test cleanly, but the new tests failed > in my environment (even with make check LANG=C). I think I should set > the right locale for the testing. Which one did you use? I did not set a locale in the tests. My environment has: LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL= > Maybe I'm > missing something, but let me comment on the new tests. This is the > one proposed in the v31-0002 patch: > > +EXPLAIN (COSTS OFF) > +SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) > WHERE t1.a IN ('äbç', 'ὀδυσσεύς'); > + QUERY PLAN > +------------------------------------------------------------------ > + Hash Join > + Hash Cond: (t2.a = t1.a) > + -> Append > + -> Seq Scan on beta_a t2_1 > + -> Seq Scan on beta_b t2_2 > + -> Seq Scan on beta_c t2_3 > + -> Seq Scan on beta_d t2_4 > + -> Seq Scan on beta_e t2_5 > + -> Seq Scan on beta_f t2_6 > + -> Seq Scan on beta_g t2_7 > + -> Seq Scan on beta_h t2_8 > + -> Seq Scan on beta_default t2_9 > + -> Hash > + -> Append > + -> Seq Scan on alpha_e t1_1 > + Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[])) > + -> Seq Scan on alpha_default t1_2 > + Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[])) > +(18 rows) > > The commit message says: > > When joining with > > alpha.a = beta.a and alpha.a IN ('äbç', 'ὀδυσσεύς') > > the planner does the right thing for one side of the query, but > hits all partitions for the other side, which it doesn't need to > do. > > Yeah, I agree that the resulting plan is not efficient. The reason > for that would be that the planner doesn't generate a qual on the > outer side matching the ScalarArrayOpExpr qual "a = ANY > ('{äbç,ὀδυσσεύς}'::text[])" on the inner side, which I think would be > a restriction caused by the equivalence machinery not by the > partitionwise join logic IIUC. It’s fine if this is beyond the scope of the patch. > I think the critique would be useful, > so I don't object to adding this test case, but the critique would be > more about query planning that is actually not related to the > partitionwise join logic, so I'm not sure that the partition_join.sql > regression test is the best place to add it. I guess that there would > be much better places than partition_join.sql. You don’t need to add the test anywhere. It’s enough for me that you looked at it and considered whether the partition-wisejoin patch should do anything differently in this case. Again, it sounds like this is beyond the scope ofthe patch. > (This is nitpicking; > but another thing I noticed about this test case is that the join > query contains only a single join condition "t1.a = t2.a", but the > queried tables alpha and beta are range-partitioned by multiple > columns a and b, so the query should have a join condition for each of > the columns like "t1.a = t2.a AND t1.b = t2.b" if adding this as a > test case for partitionwise join.) Well, it is important that partition-wise join does not break such queries. I added the column ‘b’ to the partitioning logicto verify that did not confuse the code in your patch. > I feel almost the same to other > test cases in the patch (and the v31-0003 patch), except this one > proposed in the v31-0003 patch: > > +CREATE TABLE alpha (a TEXT) PARTITION BY LIST(a); > +CREATE TABLE alpha_a PARTITION OF alpha FOR VALUES IN ('Turkiye', 'TURKIYE'); > +CREATE TABLE alpha_b PARTITION OF alpha FOR VALUES IN ('b?t', 'BIT'); > +CREATE TABLE alpha_c PARTITION OF alpha FOR VALUES IN ('abc', 'ABC'); > +CREATE TABLE alpha_d PARTITION OF alpha FOR VALUES IN ('aaa', 'cote', 'Gotz'); > +CREATE TABLE alpha_e PARTITION OF alpha FOR VALUES IN ('?δυσσε??', '?ΔΥΣΣΕ?Σ'); > +CREATE TABLE alpha_f PARTITION OF alpha FOR VALUES IN ('を読み取り用', > 'にオープンできませんでした', NULL); > +CREATE TABLE alpha_default PARTITION OF alpha DEFAULT; > +CREATE TABLE beta (a TEXT) PARTITION BY LIST(a); > +CREATE TABLE beta_a PARTITION OF beta FOR VALUES IN ('Turkiye', > 'cote', '?ΔΥΣΣΕ?Σ'); > +CREATE TABLE beta_b PARTITION OF beta FOR VALUES IN ('b?t', 'TURKIYE'); > +CREATE TABLE beta_c PARTITION OF beta FOR VALUES IN ('abc', 'を読み取り用', > 'にオープンできませんでした'); > +CREATE TABLE beta_d PARTITION OF beta FOR VALUES IN ('aaa', 'Gotz', > 'BIT', '?δυσσε??', 'ABC', NULL); > +CREATE TABLE beta_default PARTITION OF beta DEFAULT; > > +EXPLAIN (COSTS OFF) > +SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) > WHERE t1.a IS NULL; > + QUERY PLAN > +------------------------------------------- > + Hash Join > + Hash Cond: (t2.a = t1.a) > + -> Append > + -> Seq Scan on beta_d t2_1 > + -> Seq Scan on beta_b t2_2 > + -> Seq Scan on beta_a t2_3 > + -> Seq Scan on beta_c t2_4 > + -> Seq Scan on beta_default t2_5 > + -> Hash > + -> Seq Scan on alpha_f t1 > + Filter: (a IS NULL) > +(11 rows) > > Which made me notice an issue in the partitionwise join logic: > partition_bounds_merge() assumes that all partitions of given > relations are always present; in other words, it doesn't consider > cases where some of the partitions have been pruned entirely. :-( If > that function considered the cases, the above query would use > partitionwise join, because alpha_f only matches beta_c. I don't > think this is a bug, but it causes the planner not only to fail to > chose partitionwise join but to waste CPU cycles to process pruned > partitions, so I'll propose to address it. Attached is a patch for > that (the v31-0004 patch) created on top of the main patch (the > v31-0001 patch), which is also attached. With the attached, the plan > for the above query would be changed to something like this: > > EXPLAIN (COSTS OFF) > SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = > t2.a) WHERE t1.a IS NULL; > QUERY PLAN > ------------------------------ > Nested Loop > Join Filter: (t1.a = t2.a) > -> Seq Scan on alpha_f t1 > Filter: (a IS NULL) > -> Seq Scan on beta_c t2 > (5 rows) > > Thanks again, Mark! Thank you for working on this important patch! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: