how can I direct the planner ? - Mailing list pgsql-general

From Andrei Popescu-Belis
Subject how can I direct the planner ?
Date
Msg-id 3A252C57.523FC8B@issco.unige.ch
Whole thread Raw
Responses Re: how can I direct the planner ?
List pgsql-general
Hello everybody,

I would like to know why, in the situation described below,
the execution planner fails to choose IndexScan, and how it
could be forced to choose this option rather than SeqScan.
The latter takes about 3 minutes to execute, as opposed to
1-2 seconds for the former. Details below.

Thank you very much,
Andrei Popescu-Belis

--
ISSCO, Universite de Geneve
tel: (41 22) 705 86 81       40, bd du Pont d'Arve
fax: (41 22) 705 86 89       CH-1211 Geneve 4
   http://www.issco.unige.ch/staff/andrei

-----------------------------------------------------------

I have a table T1 with about 4 million lines ; none of the
columns provides a unique index, but the combination of the
first three columns does - that is, each triple is unique.
Using the first column as an index seems quite counter-
productive, as some values correspond to only one entry, and
others to tens of thousands.
So, I have created a unique index on T1 using the first three
columns: it has about 17 MB, as compared to the 250 MB of T1.

Question 1
++++++++++
Given the two indexes (on C1 and on C1-2-3), how come the
following plan is generated ?

sylex2000=# explain select * from T1 where C1 = 17000 ;
NOTICE:  QUERY PLAN:
Seq Scan on sol_d  (cost=0.00..82357.56 rows=41431 width=30)
EXPLAIN

This query takes about 3 minutes to solve. To force the
use of Index Scan I found the following query (C2 > 0 and
C3 > 0 are always true).

sylex2000=# explain select * from T1 where C1 = 17000 and
C2 > 0 and C3 > 0 ;
NOTICE:  QUERY PLAN:
Index Scan using T1_idx on T1 (cost=0.00..104052.07 rows=4603 width=30)
EXPLAIN

Note that the estimated cost is higher than previous, but the
query is solved in ca. 2 seconds.

Question 2
++++++++++
Is it possible to *force* the planner to always choose
the Index Scan ?

Question 3
++++++++++
I was looking for a "trick" to make the planner use both
indexes when joining two such tables. The only one I found is
to use a temporary table, as shown below. Is there any method
to do it in only one query, and have two Index Scans ?


sylex2000=# explain select *
from T1 , T2
where T1.C1 = T2.C2 and T2.C1 = 'Basel' ;

NOTICE:  QUERY PLAN:
Merge Join  (cost=4194.68..707265.19 rows=73512414 width=70)
  ->  Index Scan using T1_idx on T1  (cost=0.00..651259.77 rows=4143085
width=30)
  ->  Sort  (cost=4194.68..4194.68 rows=1774 width=40)
        ->  Seq Scan on T2  (cost=0.00..4098.93 rows=1774 width=40)

===> This takes about 1 minute (T2 is smaller)


sylex2000=# explain select * from T1 , T2 where T2.C1 = 'Basel'
and T1.C1 = T2.C2 and T1.C2 > 0 and T1.C3 > 1 ;
NOTICE:  QUERY PLAN:

Merge Join  (cost=174668.31..180444.77 rows=8168046 width=70)
  ->  Sort  (cost=170473.63..170473.63 rows=460343 width=30)
        ->  Seq Scan on T1  (cost=0.00..103072.99 rows=460343 width=30)
  ->  Sort  (cost=4194.68..4194.68 rows=1774 width=40)
        ->  Seq Scan on T2  (cost=0.00..4098.93 rows=1774 width=40)

===> This takes about 3 minutes

===> The only (poor) solution, which uses IndexScan twice:

  create temporary table T3 as (select C1 from T1 where C1 = 'Basel' ) ;
  create index T3_idx on T3 ( C1 ) ;
  select * from T3 , T2 where T3.C1 = T2.C1 ;
  drop table T3 ;


REMARK
++++++
I have found this: "A Tour of PostgreSQL Internals"
at the address:     http://www.postgresql.org/osdn/tour.pdf
Author:             Tom Lane

The basic idea of the planner is cost- estimate- based selection
of the best plan tree for a given query.
Simple example:
SELECT * FROM t WHERE f1 < 100
Assuming there is an index on t( f1), two possible execution plans are
considered:
" sequential scan over all of t
" index scan with index restriction f1 < 100
Costs of each plan (in disk page fetches and CPU time) are estimated and
the plan with lower estimated cost is selected.
Note that the indexscan is not an automatic winner, and should not be.
The choice will depend on what fraction of the rows of t are estimated
to
be retrieved.
---------
But this doesn't really answer my questions...
Thanks for any help!
APB

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Sorting with relevant blank spaces
Next
From: Tom Lane
Date:
Subject: Re: how can I direct the planner ?