Home > mailing lists

How does the query planner make its plan? - Mailing list pgsql-general

From	Christian Schröder
Subject	How does the query planner make its plan?
Date	November 6, 2007 11:30:09
Msg-id	47308755.2050509@deriva.de Whole thread Raw
Responses	Re: How does the query planner make its plan? Re: How does the query planner make its plan?
List	pgsql-general

Tree view

Hi list,
once again I do not understand how the query planner works and why it
apparently does not find the best result.
I have a table with about 125 million rows. There is a char(5) column
with a (non-unique) index. When I try to find the distinct values in
this column using the following sql statement:

select distinct exchange from foo

the query planner chooses not to use the index, but performs a
sequential scan. When I disfavour the use of sequential scans ("set
enable_seqscan = off") the performance is more than 6 times better. Why
does the query planner's plan go wrong? The table has been vacuum
analyzed just before I ran the queries.

Here is the plan when I let the query planner alone:

                                   QUERY PLAN
--------------------------------------------------------------------------------
 Unique  (cost=23057876.40..23683350.48 rows=4 width=9)
   ->  Sort  (cost=23057876.40..23370613.44 rows=125094816 width=9)
         Sort Key: exchange
         ->  Seq Scan on quotes  (cost=0.00..3301683.16 rows=125094816
width=9)
(4 rows)

This is what really happens:

                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=23057876.40..23683350.48 rows=4 width=9) (actual
time=1577159.744..1968911.024 rows=4 loops=1)
   ->  Sort  (cost=23057876.40..23370613.44 rows=125094816 width=9)
(actual time=1577159.742..1927400.118 rows=125094818 loops=1)
         Sort Key: exchange
         ->  Seq Scan on quotes  (cost=0.00..3301683.16 rows=125094816
width=9) (actual time=0.022..169744.162 rows=125094818 loops=1)
 Total runtime: 1969844.753 ms
(5 rows)

With "enable_seqscan = off" I get this plan:

                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..89811549.81 rows=4 width=9)
   ->  Index Scan using quotes_exchange_key on quotes
(cost=0.00..89498812.77 rows=125094816 width=9)
(2 rows)

And again with execution times:


QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..89811549.81 rows=4 width=9) (actual
time=0.079..313068.922 rows=4 loops=1)
   ->  Index Scan using quotes_exchange_key on quotes
(cost=0.00..89498812.77 rows=125094816 width=9) (actual
time=0.078..273787.493 rows=125094818 loops=1)
 Total runtime: 313068.967 ms
(3 rows)

I understand that from looking at the estimations (89811549.81 with
index scan vs. 23683350.48 with sequential scan) the query planner had
to choose the sequential scan. So maybe I have to tune the planner cost
constants? Indeed I did some changes to these values, but in my opinion
this should make index scans preferable:

#seq_page_cost = 1.0                    # measured on an arbitrary scale
#random_page_cost = 4.0                 # same scale as above
random_page_cost = 1.0
#cpu_tuple_cost = 0.01                  # same scale as above
#cpu_index_tuple_cost = 0.005           # same scale as above
cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025             # same scale as above
#effective_cache_size = 128MB
effective_cache_size = 4GB

The machine is a dedicated database server with two dual-core xeon
processors and 8 GB memory.

Thanks for your help,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer

pgsql-general by date:

From: Richard Huxton
Date: 06 November 2007, 11:22:59
Subject: Re: Postgresql simple query performance question

From: "John Smith"
Date: 06 November 2007, 11:40:34
Subject: external editor for psql

How does the query planner make its plan? - Mailing list pgsql-general

Previous

Next