Unknown constants vs UNION - Mailing list pgsql-hackers

From Tom Lane
Subject Unknown constants vs UNION
Date
Msg-id 29148.1300133143@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
I looked into the performance complaint noted here:
http://archives.postgresql.org/pgsql-performance/2011-03/msg00135.php
in which 8.4 and up produce a worse plan than 8.3.

The reason for the change in behavior is that the newer releases
transform the WHERE EXISTS construct to a semi-join.  The best-case
scenario for the old implementation is that the sub-select can be
implemented like a parameterized inner indexscan.  If that happens to
be actually the best plan, you'll only get it out of 8.4+ if the
pulled-up EXISTS can be implemented as an inner indexscan, and there
are restrictions on how complicated a relation we can handle that way.
(I've been hoping for more than a year now to fix that, but the project
is stalled on how to do costing without slowing the planner down a lot.)

So there are variants of this problem where 8.4+ really is stupider than
previous releases, and it's not easy to fix.  But this isn't one of
them.  The reason that the particular case here fails to get optimized
is that flatten_simple_union_all() refuses to flatten the UNION ALL
because there are unknown-literal constants in it, and those don't match
the resolved output type of the UNION, which was kluged to be TEXT way
back in the parser.  The parser however does not try to patch up the
unknown literals themselves to match its decision about the output type.
So the planner punts and doesn't get to the query form that it could
optimize into a plan similar to 8.3's.

So we could fix this either by changing the parser to make the types
match, or by kluging flatten_simple_union_all (and in particular
tlist_same_datatypes) to treat an UNKNOWN-type Const as a match.
The latter is a lot uglier-looking but seems a bit less likely to break
anything: fixing the constants at parse time would require letting
transformSetOperationTree scribble on the leaf SELECTs.  The reason we
have not done that in the past is fear of changing the semantics of the
leaf SELECTs --- for example, if the leaf SELECT includes DISTINCT or
GROUP BY on that output value, changing its type would be bad news.

But reflecting on that now, it seems like it'd be perfectly safe to
coerce UNKNOWN Consts to the resolved type, because if the leaf SELECT
depended in any way at all on the type of that targetlist entry, it'd
have resolved the UNKNOWN to something else (probably TEXT) already.

So I'm thinking it'd be OK to fix this in the parser, but wanted to see
if anyone felt differently.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Stefan Huehner
Date:
Subject: locale operation to be invoked, but no collation was derived (in trigger)
Next
From: Pavel Stehule
Date:
Subject: dependency between numbers keywords and parser speed