Thread: sunquery and estimated rows
Hi, When I included a subquery, the estimated rows (1240) is way too high as shown in the following example. Can someone explain why? Because of this behavior, some of our queries use hash join instead of nested loop. Thanks, select version(); version ------------------------------------------------------------- PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC 2.96 (1 row) \d test Table "public.test" Column | Type | Modifiers ---------+--------------------------+----------- id | integer | name | character varying(255) | d_id | integer | c_id | integer | r_id | integer | u_id | integer | scope | integer | active | integer | created | timestamp with time zone | typ | integer | Indexes: test_scope_idx btree (scope) reindex table test; vacuum full analyze test; select count(*) from test; count ------- 4959 (1 row) select count(*) from test where scope=10; count ------- 10 (1 row) explain analyze select * from test where scope=10; -- so far so good, estimate 12 rows, actual 10 rows QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Index Scan using test_scope_idx on test (cost=0.00..4.35 rows=12 width=59) (actual time=0.04..0.11 rows=10 loops=1) Index Cond: (scope = 10) Total runtime: 0.23 msec (3 rows) explain analyze select * from test where scope=(select 10); -- estimate rows is way too high, do not why???? QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using test_scope_idx on test (cost=0.00..40.74 rows=1240 width=59) (actual time=0.06..0.13 rows=10 loops=1) Index Cond: (scope = $0) InitPlan -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1) Total runtime: 0.22 msec (5 rows) __________________________________ Do you Yahoo!? Yahoo! Tax Center - File online by April 15th http://taxes.yahoo.com/filing.html
Litao Wu <litaowu@yahoo.com> writes: > When I included a subquery, the estimated rows (1240) > is way too high as shown in the following example. > select * from test > where scope=(select 10); The planner sees that as "where scope = <some complicated expression>" and falls back to a default estimate. It won't simplify a sub-select to a constant. (Some people consider that a feature ;-).) The estimate should still be derived from the statistics for the scope column, but it will just depend on the number of distinct values for the column and not on the specific comparison constant. regards, tom lane
В Сбт, 17.04.2004, в 01:45, Tom Lane пишет: > The planner sees that as "where scope = <some complicated expression>" > and falls back to a default estimate. It won't simplify a sub-select > to a constant. (Some people consider that a feature ;-).) Why? Thanks -- Markus Bertheau <twanger@bluetwanger.de>
Markus Bertheau <twanger@bluetwanger.de> writes: > В Сбт, 17.04.2004, в 01:45, Tom Lane пишет: >> The planner sees that as "where scope = <some complicated expression>" >> and falls back to a default estimate. It won't simplify a sub-select >> to a constant. (Some people consider that a feature ;-).) > Why? It's the only way to prevent it from simplifying when you don't want it to. regards, tom lane
Rod Taylor <rbt@rbt.ca> writes: >> It's the only way to prevent it from simplifying when you don't want it >> to. > I'm having a difficult time coming up with a circumstance where that is > beneficial except when stats are out of whack. Try trawling the archives --- I recall several cases in which people were using sub-selects for this purpose. In any case, I don't see the value of having the planner check to see if a sub-select is just a trivial arithmetic expression. The cases where people write that and expect it to be simplified are so few and far between that I can't believe it'd be a good use of planner cycles. regards, tom lane
Well, the example shown is simplified version. Now, let's see a little 'real' example (still simplified version): Table test is same as before: \d test Table "public.test" Column | Type | Modifiers ---------+--------------------------+----------- id | integer | ... scope | integer | ... Indexes: test_scope_idx btree (scope) select count(*) from test; count ------- 4959 (1 row) select count(*) from test where scope=10; count ------- 10 (1 row) create table scope_def (scope int primary key, name varchar(30) unique); insert into scope_def values (10, 'TEST_SCOPE'); -- This is not a trivial arithmetic expression explain analyze select * from test where scope=(select scope from scope_def where name = 'TEST_SCOPE'); -- estimated row is 1653, returned rows is 10 QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Index Scan using test_scope_idx on test (cost=0.00..49.91 rows=1653 width=59) (actual time=0.08..0.15 rows=10 loops=1) Index Cond: (scope = $0) InitPlan -> Index Scan using scope_def_name_key on scope_def (cost=0.00..4.82 rows=1 width=4) (actual time=0.04..0.04 rows=1 loops=1) Index Cond: (name = 'TEST_SCOPE'::character varying) Total runtime: 0.22 msec (6 rows) -- trivial arithmetic expression -- estimated row is 1653, returned rows is 10 explain analyze select * from test where scope=(select 10); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using test_scope_idx on test (cost=0.00..49.91 rows=1653 width=59) (actual time=0.06..0.14 rows=10 loops=1) Index Cond: (scope = $0) InitPlan -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1) Total runtime: 0.20 msec (5 rows) -- This is the plan I expect to see: estimated rows is -- close the actual returned rows. -- Do I have to devide the sub-select into two -- queries? explain analyze select * from test where scope=10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Index Scan using test_scope_idx on test (cost=0.00..3.77 rows=10 width=59) (actual time=0.05..0.12 rows=10 loops=1) Index Cond: (scope = 10) Total runtime: 0.18 msec (3 rows) -- Rewritten query using join in this case explain analyze select test.* from test JOIN scope_def using (scope) where scope_def.name = 'TEST_SCOPE'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..75.39 rows=5 width=63) (actual time=0.07..0.19 rows=10 loops=1) -> Index Scan using scope_def_name_key on scope_def (cost=0.00..4.82 rows=1 width=4) (actual time=0.04..0.04 rows=1 loops=1) Index Cond: (name = 'TEST_SCOPE'::character varying) -> Index Scan using test_scope_idx on test (cost=0.00..49.91 rows=1653 width=59) (actual time=0.02..0.09 rows=10 loops=1) Index Cond: (test.scope = "outer".scope) Total runtime: 0.28 msec (6 rows) __________________________________ Do you Yahoo!? Yahoo! Tax Center - File online by April 15th http://taxes.yahoo.com/filing.html
On Sun, 2004-04-18 at 19:09, Tom Lane wrote: > Markus Bertheau <twanger@bluetwanger.de> writes: > > , 17.04.2004, 01:45, Tom Lane : > >> The planner sees that as "where scope = <some complicated expression>" > >> and falls back to a default estimate. It won't simplify a sub-select > >> to a constant. (Some people consider that a feature ;-).) > > > Why? > > It's the only way to prevent it from simplifying when you don't want it > to. I'm having a difficult time coming up with a circumstance where that is beneficial except when stats are out of whack. Doesn't a prepared statement also falls back to the default estimate for variables. -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc