Thread: Bad cost estimate with FALSE filter condition

Bad cost estimate with FALSE filter condition

From
Josh Berkus
Date:
All:

This got posted to pgsql-bugs, but got no attention there[1], so I'm
sending it to this list.

Test case:

createdb bench
pgbench -i -s bench
\c bench

bench=# explain select * from pgbench_accounts where aid = 2;
                                          QUERY PLAN
---------------------------------------------------------------
 Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.42..8.44 rows=1 width=97)
   Index Cond: (aid = 2)
(2 rows)

bench=# explain select * from pgbench_accounts where aid = 2 and false;

                                QUERY PLAN
-------------------------------------------------
 Result  (cost=0.00..26394.00 rows=1 width=97)
   One-Time Filter: false
   ->  Seq Scan on pgbench_accounts  (cost=0.00..26394.00 rows=1 width=97)
(3 rows)

This seems like a special case of the "aborted plan cost", that is, when
the planner expects to abort a plan early, it nevertheless returns the
full cost for the non-aborted version of the query, rather than the
working cost, which is based on the abort.

For example:

bench=# create index on pgbench_accounts(bid);
CREATE INDEX
bench=# explain select * from pgbench_accounts where bid = 2;
                                                QUERY PLAN

----------------------------------------------------------------------------------------------------------
 Index Scan using pgbench_accounts_bid_idx on pgbench_accounts
(cost=0.42..4612.10 rows=102667 width=97)
   Index Cond: (bid = 2)
(2 rows)

bench=# explain select * from pgbench_accounts where bid = 2 limit 1;
                                   QUERY PLAN

--------------------------------------------------------------------------------
 Limit  (cost=0.00..0.28 rows=1 width=97)
   ->  Seq Scan on pgbench_accounts  (cost=0.00..28894.00 rows=102667
width=97)
         Filter: (bid = 2)
(3 rows)

So in this case, the top-level node returns a lower cost because the
planner knows that it will find a row with bid=2 fairly quickly in the
seq scan.  But in the WHERE FALSE example, that scan *is* the top-level
node, so the planner returns a fictitious cost for the whole query.

Or is there something else at work here?

[1]
http://www.postgresql.org/message-id/20150225194953.2546.86836@wrigleys.postgresql.org

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Bad cost estimate with FALSE filter condition

From
Josh Berkus
Date:
So ... should I assume my diagnosis is correct?  Haven't heard any other
suggestions.

On 02/27/2015 05:28 PM, Josh Berkus wrote:
> All:
>
> This got posted to pgsql-bugs, but got no attention there[1], so I'm
> sending it to this list.
>
> Test case:
>
> createdb bench
> pgbench -i -s bench
> \c bench
>
> bench=# explain select * from pgbench_accounts where aid = 2;
>                                           QUERY PLAN
> ---------------------------------------------------------------
>  Index Scan using pgbench_accounts_pkey on pgbench_accounts
> (cost=0.42..8.44 rows=1 width=97)
>    Index Cond: (aid = 2)
> (2 rows)
>
> bench=# explain select * from pgbench_accounts where aid = 2 and false;
>
>                                 QUERY PLAN
> -------------------------------------------------
>  Result  (cost=0.00..26394.00 rows=1 width=97)
>    One-Time Filter: false
>    ->  Seq Scan on pgbench_accounts  (cost=0.00..26394.00 rows=1 width=97)
> (3 rows)
>
> This seems like a special case of the "aborted plan cost", that is, when
> the planner expects to abort a plan early, it nevertheless returns the
> full cost for the non-aborted version of the query, rather than the
> working cost, which is based on the abort.
>
> For example:
>
> bench=# create index on pgbench_accounts(bid);
> CREATE INDEX
> bench=# explain select * from pgbench_accounts where bid = 2;
>                                                 QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------
>  Index Scan using pgbench_accounts_bid_idx on pgbench_accounts
> (cost=0.42..4612.10 rows=102667 width=97)
>    Index Cond: (bid = 2)
> (2 rows)
>
> bench=# explain select * from pgbench_accounts where bid = 2 limit 1;
>                                    QUERY PLAN
>
> --------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.28 rows=1 width=97)
>    ->  Seq Scan on pgbench_accounts  (cost=0.00..28894.00 rows=102667
> width=97)
>          Filter: (bid = 2)
> (3 rows)
>
> So in this case, the top-level node returns a lower cost because the
> planner knows that it will find a row with bid=2 fairly quickly in the
> seq scan.  But in the WHERE FALSE example, that scan *is* the top-level
> node, so the planner returns a fictitious cost for the whole query.
>
> Or is there something else at work here?
>
> [1]
> http://www.postgresql.org/message-id/20150225194953.2546.86836@wrigleys.postgresql.org
>


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Bad cost estimate with FALSE filter condition

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> So ... should I assume my diagnosis is correct?  Haven't heard any other
> suggestions.

I don't see any reason to think this is worth worrying about, or worth
spending planner cycles on to produce a cosmetically nicer cost estimate.
One-time filters always apply at the top plan level so they're unlikely
to change any planner choices.  Moreover, for any case other than the
not-terribly-interesting constant FALSE case, we're better off assuming
that the filter condition will be true (and so there's nothing to adjust).

            regards, tom lane


Re: Bad cost estimate with FALSE filter condition

From
Josh Berkus
Date:
On 03/16/2015 11:26 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> So ... should I assume my diagnosis is correct?  Haven't heard any other
>> suggestions.
>
> I don't see any reason to think this is worth worrying about, or worth
> spending planner cycles on to produce a cosmetically nicer cost estimate.

I wouldn't say it's critical, but there's two issues:

1) users are confused when they see the plan, especially if it's chosen
in preference to a lower-cost plan.  It's counter-intuitive for EXPLAIN
to not display the "real" estimated cost.

2) Tools which attempt to do some kind of useful aggregation or event
handling around estimated plan cost have to write special workarounds
for these cases.

Is there anything *useful* about the existing behavior such that we'd
like to preserve it?  Or is it just a matter of Nobody's Submitted A
Patch Yet?

I ask because I'm thinking about a patch, so if changing this will break
a lot of stuff, that's a good thing to know.

> One-time filters always apply at the top plan level so they're unlikely
> to change any planner choices.  Moreover, for any case other than the
> not-terribly-interesting constant FALSE case, we're better off assuming
> that the filter condition will be true (and so there's nothing to adjust).

Except that we *don't* get back the same estimate for a TRUE filter
condition.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com