Re: [BUGS] BUG #14811: Nested IN SUBQERY that returns empty results executed multiple times. - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: [BUGS] BUG #14811: Nested IN SUBQERY that returns empty results executed multiple times. |
Date | |
Msg-id | 9311.1505171327@sss.pgh.pa.us Whole thread Raw |
In response to | [BUGS] BUG #14811: Nested IN SUBQERY that returns empty results executedmultiple times. (serovov@gmail.com) |
Responses |
Re: [BUGS] BUG #14811: Nested IN SUBQERY that returns empty resultsexecuted multiple times.
|
List | pgsql-bugs |
serovov@gmail.com writes: > I have a query planner bug that executes the same subquery multiple times. This is not a bug, it's a RFE, and not really a planner RFE either. AFAICS, the issue is that the scan on betta2zetta returns zero rows. The plan chosen for ANY(ARRAY()) happens to exploit that case successfully: Limit (cost=4155.01..4183.81 rows=6 width=22) (actual time=0.122..0.122 rows=0 loops=1) InitPlan 1 (returns $1) -> Nested Loop (cost=134.14..4154.58 rows=6639 width=4) (actual time=0.078..0.078 rows=0 loops=1) -> Seq Scan onbetta2zetta (cost=0.00..5.75 rows=1 width=4) (actual time=0.077..0.077 rows=0 loops=1) Filter: (zetta_id= 3001) Rows Removed by Filter: 300 -> Bitmap Heap Scan on alpha2betta (cost=134.14..4079.52rows=6931 width=8) (never executed) Recheck Cond: (betta_id = betta2zetta.betta_id) -> Bitmap Index Scan on alpha2betta_betta_id_idx (cost=0.00..132.41 rows=6931 width=0) (never executed) Index Cond: (betta_id = betta2zetta.betta_id) -> Index Scan using alpha_pkey on alpha (cost=0.42..48.43rows=10 width=22) (actual time=0.118..0.118 rows=0 loops=1) Index Cond: (id = ANY ($1))Planning time:0.944 msExecution time: 0.294 ms Note that alpha2betta isn't read at all, and neither is alpha because the ANY doesn't get any values to scan for. But the plan used for IN doesn't optimize this case as well: Limit (cost=0.85..48.91 rows=6 width=22) (actual time=272.964..272.964 rows=0 loops=1) -> Merge Semi Join (cost=0.85..53179.99rows=6639 width=22) (actual time=272.962..272.962 rows=0 loops=1) Merge Cond: (alpha.id = alpha2betta.alpha_id) -> Index Scan using alpha_pkey on alpha (cost=0.42..22654.42 rows=700000 width=22) (actualtime=0.017..0.017 rows=1 loops=1) -> Nested Loop (cost=0.42..28694.18 rows=6639 width=4) (actual time=272.939..272.939rows=0 loops=1) Join Filter: (alpha2betta.betta_id = betta2zetta.betta_id) -> Index Only Scan using alpha2betta_alpha_id_betta_id_key on alpha2betta (cost=0.42..18188.42 rows=700000 width=8) (actualtime=0.100..117.954 rows=700000 loops=1) Heap Fetches: 0 -> Materialize (cost=0.00..5.75rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=700000) -> Seq Scan on betta2zetta (cost=0.00..5.75 rows=1 width=4) (actual time=0.071..0.071 rows=0 loops=1) Filter: (zetta_id= 3001) Rows Removed by Filter: 300Planning time: 1.423 msExecution time: 273.148 ms We do stop short on the alpha scan, but alpha2betta gets read to the end, looking for a joinable row that it won't ever find. The planner doesn't ever optimize on the basis that a subquery will return zero rows (unless that's provably true), and I do not think it would be wise to do so. We have enough trouble with plans being optimized on the assumption of one row out and that proving to be an underestimate. So I would not like to make it prefer the form with an initplan to the form without, even if it were to believe that there are zero rows with zetta_id = 3001. If it's wrong that will be a horribly bad choice, as the estimated costs indicate. We could, however, imagine optimizing this case at runtime. If nodeNestloop.c were to note that it got no rows from the inner scan on the first iteration, and that it isn't passing any nestloop parameters into the inner side, then it could plausibly assume that all future executions of the inner plan will also give zero rows, and so it can't produce any join rows [if it's a standard inner join] and it can stop scanning the outer side. In that case we'd stop the alpha2betta indexscan after the first tuple and win. Merge and hash joins both have optimizations of this ilk for empty input relations, so it's reasonable for nestloop to do so also. Now a possible objection to this argument is that if the inner side contains volatile quals or set-returning functions, it might produce some rows in a future execution even though it didn't this time. But we've already decided that we aren't making any guarantees about such behavior. In this example, we broke any chance of behaving "correctly" for such an inner scan by sticking a Material node on top of it. More generally, if the inner scan doesn't have any parameter dependency on the outer, we're free to implement the join as a merge or hash join, which would read the inner scan only once anyway. So I think that objection can be rejected. Whether this is worth doing, given that it hasn't really come up before, I'm not sure of. It would add a little bit of overhead to nestloops: we'd need at least something like "node->nl_gotInnerTuple = true" added to each successful iteration. That's probably negligible, but maybe not. The equivalent optimizations for merge and hash add only one-time tests, not once-per-tuple overhead, so it's hard to argue that they cost much. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
pgsql-bugs by date: