Re: Removing unneeded self joins - Mailing list pgsql-hackers
From | Ronan Dunklau |
---|---|
Subject | Re: Removing unneeded self joins |
Date | |
Msg-id | 13021914.uLZWGnKmhe@aivenronan Whole thread Raw |
In response to | Re: Removing unneeded self joins (Andrey Lepikhov <a.lepikhov@postgrespro.ru>) |
Responses |
Re: Removing unneeded self joins
|
List | pgsql-hackers |
Le jeudi 19 mai 2022, 12:48:18 CEST Andrey Lepikhov a écrit : > On 5/17/22 19:14, Ronan Dunklau wrote: > > Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit : > >> New version of the feature. > >> Here a minor bug with RowMarks is fixed. A degenerated case is fixed, > >> when uniqueness of an inner deduced not from join quals, but from a > >> baserestrictinfo clauses 'x=const', where x - unique field. > >> Code, dedicated to solve second issue is controversial, so i attached > >> delta.txt for quick observation. > >> Maybe we should return to previous version of code, when we didn't split > >> restriction list into join quals and base quals? > > > > Hello, > > > > I tried to find problematic cases, which would make the planning time grow > > unacceptably, and couldn't devise it. > > > > The worst case scenario I could think of was as follows: > > - a query with many different self joins > > - an abundance of unique indexes on combinations of this table columns > > to > > > > consider > > > > - additional predicates on the where clause on columns. > > Looking into the patch I can imagine, that the most difficult case is > when a set of relations with the same OID is huge, but only small part > of them (or nothing) can be removed. > Also, removing a clause from restrictinfo list or from equivalence class > adds non-linear complexity. So, you can dig this way ). > > > The base table I used for this was a table with 40 integers. 39 unique > > indexes were defined on every combination of (c1, cX) with cX being > > columns c2 to c40. > > > > I turned geqo off, set from_collapse_limit and join_collapse_limit to > > unreasonably high values (30), and tried to run queries of the form: > > > > SELECT * FROM test_table t1 > > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX > > ... > > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX. > > > > So no self join can be eliminated in that case. > > I think, you should compare t1.cX with tX.cX to eliminate self-join. > Cross-unique-index proof isn't supported now. Yes, that's the point. I wanted to try to introduce as much complexity as I could, without actually performing any self join elimination. The idea was to try to come up with the worst case scenario. > > > The performance was very similar with or without the GUC enabled. I tested > > the same thing without the patch, since the test for uniqueness has been > > slightly altered and incurs a new allocation, but it doesn't seem to > > change. > > > > One interesting side effect of this patch, is that removing any unneeded > > self join cuts down the planification time very significantly, as we > > lower the number of combinations to consider. > > Even more - removing a join we improve cardinality estimation. > > > As for the code: > > - Comments on relation_has_unique_index_ext and > > relation_has_unique_index_for> > > should be rewritten, as relation_has_unique_index_for is now just a > > special > > case of relation_has_unique_index_ext. By the way, the comment would > > probably be better read as: "but if extra_clauses isn't NULL". > > > > - The whole thing about "extra_clauses", ie, baserestrictinfos which > > were > > > > used to determine uniqueness, is not very clear. Most functions where the > > new argument has been added have not seen an update in their comments, > > and the name itself doesn't really convey the intented meaning: perhaps > > required_non_join_clauses ? > > > > The way this works should be explained a bit more thoroughly, for example > > in remove_self_joins_one_group the purpose of uclauses should be > > explained. The fact that degenerate_case returns true when we don't have > > any additional base restrict info is also confusing, as well as the > > degenerate_case name. > Agree, > but after this case thoughts wander in my head: should we make one step > back to pre-[1] approach? It looks like we have quite similar changes, > but without special function for a 'degenerate case' detection and > restrictlist splitting. I'll take a look at that one. > > > I'll update if I think of more interesting things to add. > > Thank you for your efforts! > > See in attachment next version which fixes mistakes detected by > zyu@yugabyte.com. > > [1] > https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW > 5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com -- Ronan Dunklau
pgsql-hackers by date: