Thread: sunquery and estimated rows

sunquery and estimated rows

From

Litao Wu

Date:

16 April 2004, 18:45:33

Hi,

When I included a subquery, the estimated rows (1240)
is way too high as shown in the following example.
Can someone explain why? Because of this behavior,
some of our queries use hash join instead of nested
loop.

Thanks,

select version();
                           version
-------------------------------------------------------------
 PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by
GCC 2.96
(1 row)
\d test
              Table "public.test"
 Column  |           Type           | Modifiers
---------+--------------------------+-----------
 id      | integer                  |
 name    | character varying(255)   |
 d_id    | integer                  |
 c_id    | integer                  |
 r_id    | integer                  |
 u_id    | integer                  |
 scope   | integer                  |
 active  | integer                  |
 created | timestamp with time zone |
 typ     | integer                  |
Indexes: test_scope_idx btree (scope)

reindex table test;
vacuum full analyze test;

select count(*) from test;
 count
-------
  4959
(1 row)
select count(*) from test where scope=10;
 count
-------
    10
(1 row)

explain analyze
select * from test
where scope=10; -- so far so good, estimate 12 rows,
actual 10 rows

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Index Scan using test_scope_idx on test
(cost=0.00..4.35 rows=12 width=59) (actual
time=0.04..0.11 rows=10 loops=1)
   Index Cond: (scope = 10)
 Total runtime: 0.23 msec
(3 rows)

explain analyze
select * from test
where scope=(select 10); -- estimate rows is way too
high, do not why????

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_scope_idx on test
(cost=0.00..40.74 rows=1240 width=59) (actual
time=0.06..0.13 rows=10 loops=1)
   Index Cond: (scope = $0)
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
(actual time=0.01..0.01 rows=1 loops=1)
 Total runtime: 0.22 msec
(5 rows)






__________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online by April 15th
http://taxes.yahoo.com/filing.html

Re: sunquery and estimated rows

From

Tom Lane

Date:

16 April 2004, 20:45:02

Litao Wu <litaowu@yahoo.com> writes:
> When I included a subquery, the estimated rows (1240)
> is way too high as shown in the following example.

> select * from test
> where scope=(select 10);

The planner sees that as "where scope = <some complicated expression>"
and falls back to a default estimate.  It won't simplify a sub-select
to a constant.  (Some people consider that a feature ;-).)

The estimate should still be derived from the statistics for the
scope column, but it will just depend on the number of distinct
values for the column and not on the specific comparison constant.

            regards, tom lane

Re: sunquery and estimated rows

From

Markus Bertheau

Date:

18 April 2004, 16:22:35

В Сбт, 17.04.2004, в 01:45, Tom Lane пишет:

> The planner sees that as "where scope = <some complicated expression>"
> and falls back to a default estimate.  It won't simplify a sub-select
> to a constant.  (Some people consider that a feature ;-).)

Why?

Thanks

--
Markus Bertheau <twanger@bluetwanger.de>

Re: sunquery and estimated rows

From

Tom Lane

Date:

18 April 2004, 20:09:28

Markus Bertheau <twanger@bluetwanger.de> writes:
> В Сбт, 17.04.2004, в 01:45, Tom Lane пишет:
>> The planner sees that as "where scope = <some complicated expression>"
>> and falls back to a default estimate.  It won't simplify a sub-select
>> to a constant.  (Some people consider that a feature ;-).)

> Why?

It's the only way to prevent it from simplifying when you don't want it
to.

            regards, tom lane

Re: sunquery and estimated rows

From

Tom Lane

Date:

18 April 2004, 23:16:52

Rod Taylor <rbt@rbt.ca> writes:
>> It's the only way to prevent it from simplifying when you don't want it
>> to.

> I'm having a difficult time coming up with a circumstance where that is
> beneficial except when stats are out of whack.

Try trawling the archives --- I recall several cases in which people
were using sub-selects for this purpose.

In any case, I don't see the value of having the planner check to see if
a sub-select is just a trivial arithmetic expression.  The cases where
people write that and expect it to be simplified are so few and far
between that I can't believe it'd be a good use of planner cycles.

            regards, tom lane

Re: sunquery and estimated rows

From

Litao Wu

Date:

20 April 2004, 00:11:13

Well, the example shown is simplified version.
Now, let's see a little 'real' example (still
simplified version):

Table test is same as before:
\d test
              Table "public.test"
 Column  |           Type           | Modifiers
---------+--------------------------+-----------
 id      | integer                  |
 ...
 scope   | integer                  |
 ...
Indexes: test_scope_idx btree (scope)

select count(*) from test;
 count
-------
  4959
(1 row)
select count(*) from test where scope=10;
 count
-------
    10
(1 row)

create table scope_def (scope int primary key, name
varchar(30) unique);
insert into scope_def values (10, 'TEST_SCOPE');

-- This is not a trivial arithmetic expression
explain analyze
select * from test
where scope=(select scope from scope_def where name =
'TEST_SCOPE');

-- estimated row is 1653, returned rows is 10


     QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_scope_idx on test
(cost=0.00..49.91 rows=1653 width=59) (actual
time=0.08..0.15 rows=10 loops=1)
   Index Cond: (scope = $0)
   InitPlan
     ->  Index Scan using scope_def_name_key on
scope_def  (cost=0.00..4.82 rows=1 width=4) (actual
time=0.04..0.04 rows=1 loops=1)
           Index Cond: (name = 'TEST_SCOPE'::character
varying)
 Total runtime: 0.22 msec
(6 rows)


-- trivial arithmetic expression
-- estimated row is 1653, returned rows is 10
explain analyze
select * from test
where scope=(select 10);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_scope_idx on test
(cost=0.00..49.91 rows=1653 width=59) (actual
time=0.06..0.14 rows=10 loops=1)
   Index Cond: (scope = $0)
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
(actual time=0.01..0.01 rows=1 loops=1)
 Total runtime: 0.20 msec
(5 rows)

-- This is the plan I expect to see: estimated rows is
-- close the actual returned rows.
-- Do I have to devide the sub-select into two
-- queries?

explain analyze
select * from test
where scope=10;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Index Scan using test_scope_idx on test
(cost=0.00..3.77 rows=10 width=59) (actual
time=0.05..0.12 rows=10 loops=1)
   Index Cond: (scope = 10)
 Total runtime: 0.18 msec
(3 rows)

-- Rewritten query using join in this case
explain analyze
select test.* from test JOIN scope_def using (scope)
where scope_def.name = 'TEST_SCOPE';

     QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..75.39 rows=5 width=63)
(actual time=0.07..0.19 rows=10 loops=1)
   ->  Index Scan using scope_def_name_key on
scope_def  (cost=0.00..4.82 rows=1 width=4) (actual
time=0.04..0.04 rows=1 loops=1)
         Index Cond: (name = 'TEST_SCOPE'::character
varying)
   ->  Index Scan using test_scope_idx on test
(cost=0.00..49.91 rows=1653 width=59) (actual
time=0.02..0.09 rows=10 loops=1)
         Index Cond: (test.scope = "outer".scope)
 Total runtime: 0.28 msec
(6 rows)





__________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online by April 15th
http://taxes.yahoo.com/filing.html

Re: sunquery and estimated rows

From

Rod Taylor

Date:

20 April 2004, 10:58:29

On Sun, 2004-04-18 at 19:09, Tom Lane wrote:
> Markus Bertheau <twanger@bluetwanger.de> writes:
> >  , 17.04.2004,  01:45, Tom Lane :
> >> The planner sees that as "where scope = <some complicated expression>"
> >> and falls back to a default estimate.  It won't simplify a sub-select
> >> to a constant.  (Some people consider that a feature ;-).)
>
> > Why?
>
> It's the only way to prevent it from simplifying when you don't want it
> to.

I'm having a difficult time coming up with a circumstance where that is
beneficial except when stats are out of whack.

Doesn't a prepared statement also falls back to the default estimate for
variables.

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc

Attachment

signature.asc