how can I direct the planner ? - Mailing list pgsql-general
From | Andrei Popescu-Belis |
---|---|
Subject | how can I direct the planner ? |
Date | |
Msg-id | 3A252C57.523FC8B@issco.unige.ch Whole thread Raw |
Responses |
Re: how can I direct the planner ?
|
List | pgsql-general |
Hello everybody, I would like to know why, in the situation described below, the execution planner fails to choose IndexScan, and how it could be forced to choose this option rather than SeqScan. The latter takes about 3 minutes to execute, as opposed to 1-2 seconds for the former. Details below. Thank you very much, Andrei Popescu-Belis -- ISSCO, Universite de Geneve tel: (41 22) 705 86 81 40, bd du Pont d'Arve fax: (41 22) 705 86 89 CH-1211 Geneve 4 http://www.issco.unige.ch/staff/andrei ----------------------------------------------------------- I have a table T1 with about 4 million lines ; none of the columns provides a unique index, but the combination of the first three columns does - that is, each triple is unique. Using the first column as an index seems quite counter- productive, as some values correspond to only one entry, and others to tens of thousands. So, I have created a unique index on T1 using the first three columns: it has about 17 MB, as compared to the 250 MB of T1. Question 1 ++++++++++ Given the two indexes (on C1 and on C1-2-3), how come the following plan is generated ? sylex2000=# explain select * from T1 where C1 = 17000 ; NOTICE: QUERY PLAN: Seq Scan on sol_d (cost=0.00..82357.56 rows=41431 width=30) EXPLAIN This query takes about 3 minutes to solve. To force the use of Index Scan I found the following query (C2 > 0 and C3 > 0 are always true). sylex2000=# explain select * from T1 where C1 = 17000 and C2 > 0 and C3 > 0 ; NOTICE: QUERY PLAN: Index Scan using T1_idx on T1 (cost=0.00..104052.07 rows=4603 width=30) EXPLAIN Note that the estimated cost is higher than previous, but the query is solved in ca. 2 seconds. Question 2 ++++++++++ Is it possible to *force* the planner to always choose the Index Scan ? Question 3 ++++++++++ I was looking for a "trick" to make the planner use both indexes when joining two such tables. The only one I found is to use a temporary table, as shown below. Is there any method to do it in only one query, and have two Index Scans ? sylex2000=# explain select * from T1 , T2 where T1.C1 = T2.C2 and T2.C1 = 'Basel' ; NOTICE: QUERY PLAN: Merge Join (cost=4194.68..707265.19 rows=73512414 width=70) -> Index Scan using T1_idx on T1 (cost=0.00..651259.77 rows=4143085 width=30) -> Sort (cost=4194.68..4194.68 rows=1774 width=40) -> Seq Scan on T2 (cost=0.00..4098.93 rows=1774 width=40) ===> This takes about 1 minute (T2 is smaller) sylex2000=# explain select * from T1 , T2 where T2.C1 = 'Basel' and T1.C1 = T2.C2 and T1.C2 > 0 and T1.C3 > 1 ; NOTICE: QUERY PLAN: Merge Join (cost=174668.31..180444.77 rows=8168046 width=70) -> Sort (cost=170473.63..170473.63 rows=460343 width=30) -> Seq Scan on T1 (cost=0.00..103072.99 rows=460343 width=30) -> Sort (cost=4194.68..4194.68 rows=1774 width=40) -> Seq Scan on T2 (cost=0.00..4098.93 rows=1774 width=40) ===> This takes about 3 minutes ===> The only (poor) solution, which uses IndexScan twice: create temporary table T3 as (select C1 from T1 where C1 = 'Basel' ) ; create index T3_idx on T3 ( C1 ) ; select * from T3 , T2 where T3.C1 = T2.C1 ; drop table T3 ; REMARK ++++++ I have found this: "A Tour of PostgreSQL Internals" at the address: http://www.postgresql.org/osdn/tour.pdf Author: Tom Lane The basic idea of the planner is cost- estimate- based selection of the best plan tree for a given query. Simple example: SELECT * FROM t WHERE f1 < 100 Assuming there is an index on t( f1), two possible execution plans are considered: " sequential scan over all of t " index scan with index restriction f1 < 100 Costs of each plan (in disk page fetches and CPU time) are estimated and the plan with lower estimated cost is selected. Note that the indexscan is not an automatic winner, and should not be. The choice will depend on what fraction of the rows of t are estimated to be retrieved. --------- But this doesn't really answer my questions... Thanks for any help! APB
pgsql-general by date: