On Wed, 18 Dec 2024 at 12:17, Dmytro Astapov <dastapov@gmail.com> wrote:
> Surprisingly, this does SeqScan on partB in NestedLoops over some_ids with a filter `some_ids.id = NULL::bigint`:
>
> Nested Loop (cost=0.29..359.16 rows=200 width=24)
> -> Seq Scan on some_ids i (cost=0.00..1.02 rows=2 width=8)
> -> Append (cost=0.29..178.56 rows=51 width=16)
> -> Index Scan using parta_payload_idx on parta (cost=0.29..8.30 rows=1 width=16)
> Index Cond: (payload = i.id)
> -> Seq Scan on partb (cost=0.00..170.00 rows=50 width=16)
> Filter: (i.id = NULL::bigint)
>
> At the same time `explain select * from vw where payload = 1` correctly skips over partB entirely (the node is
eliminatedfrom execution plan), and so does:
> explain select * from vw where payload in (1,2);
>
> However, any query that does not use explicit literal values still leads to SeqScan access on partB, such as:
> explain select * from vw where payload in (select id from some_ids);
> explain select * from vw where payload = ANY(ARRAY(select id from some_ids));
> or various forms of joins
>
> Do you know if this is expected/documented, or is this a bug?
TL;DR is it's not a bug and expected behaviour.
We tend not to do much in terms of documentation about which
optimisations the query planner does, so it's probably not documented
anywhere aside from perhaps the source code. It might be possible for
us to eliminate the scan to "partb" for the first of the plans shown
above. However, the code that applies in your example case where the
planner does manage to eliminate the scan does so using "base" quals,
i.e. quals that are pushed down into the scan level. See
apply_child_basequals(). For the Nested Loop example, the i.id =
NULL::bigint isn't a base qual, so it does not work for that case.
When we're building paramerised paths, as per what's used in your
Nested Loop example above, we've already done the work to eliminate
non-matching union children. We don't really have any concept of "this
union child does not match for this specific parameterisation", so
we'd need to invent something to do that (which perhaps is just
removing or not adding the particular unneeded subpath from the Append
pathlist.)
For the other cases that depend on the results from subqueries, it's
more tricky and in many cases not possible to eliminate the scans
during query planner for those cases as the planner does not have
information to know what will be returned by the subqueries. There
might be something very limited we can do in terms of looking to see
if the operator is strict or not so that we at least know that NULLs
will never match, but that might be quite a corner case that it might
not be worth the complexity to make that work. Someone might need to
write it and see how complex it is to implement before we'd know if it
was a worthwhile optimisation or not.
David