Re: Removing unneeded self joins - Mailing list pgsql-hackers
From | Ronan Dunklau |
---|---|
Subject | Re: Removing unneeded self joins |
Date | |
Msg-id | 2537176.Lt9SDvczpP@aivenronan Whole thread Raw |
In response to | Re: Removing unneeded self joins (Andrey Lepikhov <a.lepikhov@postgrespro.ru>) |
Responses |
Re: Removing unneeded self joins
|
List | pgsql-hackers |
Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit : > New version of the feature. > Here a minor bug with RowMarks is fixed. A degenerated case is fixed, > when uniqueness of an inner deduced not from join quals, but from a > baserestrictinfo clauses 'x=const', where x - unique field. > Code, dedicated to solve second issue is controversial, so i attached > delta.txt for quick observation. > Maybe we should return to previous version of code, when we didn't split > restriction list into join quals and base quals? Hello, I tried to find problematic cases, which would make the planning time grow unacceptably, and couldn't devise it. The worst case scenario I could think of was as follows: - a query with many different self joins - an abundance of unique indexes on combinations of this table columns to consider - additional predicates on the where clause on columns. The base table I used for this was a table with 40 integers. 39 unique indexes were defined on every combination of (c1, cX) with cX being columns c2 to c40. I turned geqo off, set from_collapse_limit and join_collapse_limit to unreasonably high values (30), and tried to run queries of the form: SELECT * FROM test_table t1 JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX ... JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX. So no self join can be eliminated in that case. The performance was very similar with or without the GUC enabled. I tested the same thing without the patch, since the test for uniqueness has been slightly altered and incurs a new allocation, but it doesn't seem to change. One interesting side effect of this patch, is that removing any unneeded self join cuts down the planification time very significantly, as we lower the number of combinations to consider. As for the code: - Comments on relation_has_unique_index_ext and relation_has_unique_index_for should be rewritten, as relation_has_unique_index_for is now just a special case of relation_has_unique_index_ext. By the way, the comment would probably be better read as: "but if extra_clauses isn't NULL". - The whole thing about "extra_clauses", ie, baserestrictinfos which were used to determine uniqueness, is not very clear. Most functions where the new argument has been added have not seen an update in their comments, and the name itself doesn't really convey the intented meaning: perhaps required_non_join_clauses ? The way this works should be explained a bit more thoroughly, for example in remove_self_joins_one_group the purpose of uclauses should be explained. The fact that degenerate_case returns true when we don't have any additional base restrict info is also confusing, as well as the degenerate_case name. I'll update if I think of more interesting things to add. -- Ronan Dunklau
pgsql-hackers by date: