EquivalenceClasses and subqueries and PlaceHolderVars, oh my - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | EquivalenceClasses and subqueries and PlaceHolderVars, oh my |
Date | |
Msg-id | 10461.1331774989@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: EquivalenceClasses and subqueries and PlaceHolderVars,
oh my
|
List | pgsql-hackers |
I looked into the problem complained of here: http://archives.postgresql.org/pgsql-bugs/2012-03/msg00016.php It's not at all specific to custom types; you can exhibit it with this query in the regression database: explain select * from(select 1 as t, unique1 from tenk1 a union all select 2 as t, unique1 from tenk1 b) c where t = 2; 9.0 successfully optimizes away the excluded subquery: QUERY PLAN -------------------------------------------------------------------------Result (cost=0.00..458.00 rows=10000 width=8) -> Append (cost=0.00..458.00 rows=10000 width=8) -> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=8) (3 rows) but 9.1 and HEAD not so much: QUERY PLAN ----------------------------------------------------------------------Result (cost=0.00..966.00 rows=100 width=8) -> Append (cost=0.00..966.00 rows=100 width=8) -> Seq Scan on tenk1 a (cost=0.00..483.00 rows=50 width=8) Filter: (1 = 2) -> Seq Scan on tenk1 b (cost=0.00..483.00 rows=50 width=8) Filter: (2 = 2) (6 rows) This is a consequence of commit 57664ed25e5dea117158a2e663c29e60b3546e1c, which was already known to cause some issues as per commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c. This gripe is basically the same problem: when we push the "t = 2" condition down into the subqueries, "t" is no longer replaced with just constant 1 or constant 2, but with those constants wrapped in PlaceHolderVar nodes. In this case that prevents constant-folding from realizing it can simplify "1 = 2" or "2 = 2" to constant false or true, whereas in the previous complaint the PHV wrappers were interfering with matching expressions to indexes. I spent a fair amount of time thinking about whether we could revert that patch and solve the original problem some other way, but with no real success. The original problem was reported here: http://archives.postgresql.org/pgsql-hackers/2011-11/msg00419.php with an example equivalent to this variant of the previous query: explain select * from(select thousand as t1, tenthous as t2 from tenk1 a union all select 42 as t1, 42 as t2 from tenk1 b)c order by t1, t2; There is an EquivalenceClass for each of "t1" and "t2", and if we don't do something like wrapping the constants with distinct PHVs, then add_child_rel_equivalences will end up pushing identical constants into both ECs, thus totally bollixing the fundamental rule that any expression should match at most one EC. Another variant of this is where there are identical Vars rather than constants in one of the subqueries: explain select * from(select thousand as t1, tenthous as t2 from tenk1 a union all select unique2 as t1, unique2 as t2 fromtenk1 b) c order by t1, t2; I chose this example to match existing indexes in the regression database: the ideal plan would do an indexscan on the (thousand, tenthous) index for the first arm, and an indexscan on the (unique2) index for the second arm, and MergeAppend them together. In general the planner is aware that "ORDER BY x, x" is the same as "ORDER BY x", so you'd think it could apply that principle to the second arm of this union ... but it can't. To do that, it would have to realize that the unique2 index matches both of the EquivalenceClasses in this query, and that's totally outside its model of reality. It seems to me that to do a really nice job with this sort of situation, we would need some more general concept than EquivalenceClasses. I'm not sure what, though I have a vague feeling that it might look like EquivalenceClasses that are only valid within some sub-area of a query. Now, this is a sufficiently weird corner case that I'm not desirous of making major planner design changes just to improve this particular outcome (and in any case that doesn't sound like a backpatchable bug fix). But down the road we may think of more reasons why we need a better idea than EquivalenceClasses. In the meantime, the best solution I've been able to think of goes like this: continue to add PHVs on to duplicated or non-Var subquery outputs when propagating those outputs into the outer query, but then strip them off again when propagating transformed outer expressions down into the sub-query. There are basically only two places where we do the latter --- set_append_rel_pathlist in allpaths.c propagates the inheritance parent's baserestrictlist and other attachments to child rels, and match_eclass_clauses_to_index extracts modified join clauses from EquivalenceClasses. So it's a bit ugly but should be a localized fix, and it would allow us to revert b28ffd0fcc583c1811e5295279e7d4366c3cae6c because the problem would be taken care of at a higher level. This would not fix the problem shown in the last example, that ideally we should be able to match an index to more than one EquivalenceClass; the planner would just take the first match. In just about all of the non-contrived cases I've been able to think of, this is all right because there's only one plausible match anyhow, so I'm not terribly concerned. Comments? regards, tom lane
pgsql-hackers by date: