Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series) - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series) |
Date | |
Msg-id | CAKU4AWoP6aJ+4GHzXm=aCFwiO2o_4Q0dpxw58ad0qwnskT=9NA@mail.gmail.com Whole thread Raw |
In response to | Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series) (Andy Fan <zhihui.fan1213@gmail.com>) |
Responses |
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
|
List | pgsql-hackers |
Hi: I have finished the parts for baserel, joinrel, subquery, distinctrel. I think the hardest ones have been verified. Here is the design overview. 1. Use EC instead of expr to cover more UniqueKey cases. 2. Redesign the UniqueKey as below: @@ -246,6 +246,7 @@ struct PlannerInfo List *eq_classes; /* list of active EquivalenceClasses */ + List *unique_exprs; /* List of unique expr */ bool ec_merging_done; /* set true once ECs are canonical */ +typedef struct UniqueKey +{ + NodeTag type; + Bitmapset *unique_expr_indexes; + bool multi_nulls; +} UniqueKey; + PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set of EquivalenceClass. for example: CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary key); CREATE UNIQUE INDEX ON t1(a, b); SELECT DISTINCT * FROM T1 WHERE a = c; Then we would have PlannerInfo.unique_exprs as below [ [EC(a, c), EC(b)], [EC(pk)] ] RelOptInfo(t1) would have 2 UniqueKeys. UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false] UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false] The design will benefit many table joins cases. For instance a 10- tables join, each table has a primary key (a, b). Then we would have a UniqueKey like this. JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b} JoinRel{1,2,3,4,5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b} When more tables are joined, the list would be longer and longer, build the list consumes both CPU cycles and memory. With the method as above, we can store it as: root->unique_exprs = /* All the UniqueKey is stored once */ [ [t1.a, t1.b], -- EC is ignored in document. [t2.a, t2.b], [t3.a, t3.b], [t4.a, t4.b], [t5.a, t5.b], [t6.a, t6.b], [t7.a, t7.b], [t8.a, t8.b], [t9.a, t9.b], [t10.a, t10.b], ] JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword. JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword. The member in the bitmap is the index of root->unique_exprs rather than root->eq_classes because we need to store the SingleRow case in root->unqiue_exprs as well. 3. Define a new SingleRow node and use it in joinrel as well. +typedef struct SingleRow +{ + NodeTag type; + Index relid; +} SingleRow; SELECT * FROM t1, t2 WHERE t2.pk = 1; root->unique_exprs [ [t1.a, t1.b], SingleRow{relid=2} ] JoinRel{t1} - Bitmapset{0} JoinRel{t2} - Bitmapset{1} JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated exprs. 4. Interesting UniqueKey to remove the Useless UniqueKey as soon as possible. The overall idea is similar with PathKey, I distinguish the UniqueKey for distinct purpose and useful_for_merging purpose. SELECT DISTINCT pk FROM t; -- OK, maintain it all the time, just like root->query_pathkey. SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK is use before t1 join t2, but not useful after it. Currently the known issue I didn't pass the "interesting UniqueKey" info to subquery well [2], I'd like to talk more about this when we discuss about UnqiueKey on subquery part. 5. relation_is_distinct_for Now I design the function as + bool + relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *distinct_pathkey) It is "List *distinct_pathkey", rather than "List *exprs". The reason pathkey has EC in it, and all the UniqueKey has EC as well. if so, the subset-like checking is very effective. As for the distinct/group as no-op case, we have pathkey all the time. The only drawback of it is some clauses are not-sortable, in this case, the root->distinct_pathkey and root->group_pathkey is not set. However it should be rare by practice, so I ignore this part for now. After all, I can have a relation_is_disticnt_for version for Exprs. I just not implemented it so far. 6. EC overhead in UnqiueKey & UNION case. Until now I didn't create any new EC for the UniqueKey case, I just used the existing ones. However I can't handle the case like SELECT a, b FROM t1 UNION SELECT x, y FROM t2; In this case, there is no EC created with existing code. and I don't want to create them for the UniqueKey case as well. so my plan is just not to handle the case for UNION. Since we need some big effort from the reviewer, I split the patch into many smaller chunks. Patch 1 / Patch 2: I just split some code which UniqueKey uses but not very interrelated. Splitting them out to reduce the core patch size. Patch 3: still the notnull stuff. This one doesn't play a big role overall, even if the design is changed at last, we can just modify some small stuff. IMO, I don't think it is a blocker issue to move on. Patch 4: Support the UniqueKey for baserel. Patch 5: Support the UniqueKey for joinrel. Patch 6: Support the UniqueKey for some upper relation, like distinctrel, groupby rel. I'd suggest moving on like this: 1. Have an overall review to see if any outstanding issues the patch has. 2. If not, we can review and commit patch 1 & patch 2 to reduce the patch size. 3. Decide which method to take for not null stuff. IMO Tom's method would be a bit complex and the benefit is not very clear to me[1]. So the choices include: a). move on UniqueKey stuff until Tom's method is ready. b). Move on the UniqueKey with my notnull way, and changes to Tom's method when necessary. I prefer method b). 4. Review & Commit the UniqueKey for BaseRel part. 5. Review & Commit the UniqueKey for JoinRel part. 6. Review & Commit the UniqueKey for SubQuery part *without* the Interesting UniqueKey well handled. 7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting UniqueKey well handled. 8. Discuss about the UniqueKey on partitioned rel case and develop/review/commit it. 9. Apply UniqueKey stuff on more user cases rather than DISTINCT. What do you think about this? [1] https://www.postgresql.org/message-id/CAApHDvo5b2pYX%2BdbPy%2BysGBSarezRSfXthX32TZNFm0wPdfKGQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAKU4AWo6-%3D9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN%3D%2Bhb_w%40mail.gmail.com Thanks
Attachment
- v3-0001-Just-refactor-pathkeys_useful_for_merging-split-a.patch
- v3-0002-Just-some-utils-functions.patch
- v3-0005-Support-UniqueKey-on-JoinRel.patch
- v3-0003-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patch
- v3-0004-Support-UniqueKey-on-BaseRel.patch
- v3-0006-Maintain-the-UniqueKey-on-Subquery-and-UpperRel-l.patch
pgsql-hackers by date: