Thread: How does the query planner make its plan?

How does the query planner make its plan?

From
Christian Schröder
Date:
Hi list,
once again I do not understand how the query planner works and why it
apparently does not find the best result.
I have a table with about 125 million rows. There is a char(5) column
with a (non-unique) index. When I try to find the distinct values in
this column using the following sql statement:

select distinct exchange from foo

the query planner chooses not to use the index, but performs a
sequential scan. When I disfavour the use of sequential scans ("set
enable_seqscan = off") the performance is more than 6 times better. Why
does the query planner's plan go wrong? The table has been vacuum
analyzed just before I ran the queries.

Here is the plan when I let the query planner alone:

                                   QUERY PLAN
--------------------------------------------------------------------------------
 Unique  (cost=23057876.40..23683350.48 rows=4 width=9)
   ->  Sort  (cost=23057876.40..23370613.44 rows=125094816 width=9)
         Sort Key: exchange
         ->  Seq Scan on quotes  (cost=0.00..3301683.16 rows=125094816
width=9)
(4 rows)

This is what really happens:

                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=23057876.40..23683350.48 rows=4 width=9) (actual
time=1577159.744..1968911.024 rows=4 loops=1)
   ->  Sort  (cost=23057876.40..23370613.44 rows=125094816 width=9)
(actual time=1577159.742..1927400.118 rows=125094818 loops=1)
         Sort Key: exchange
         ->  Seq Scan on quotes  (cost=0.00..3301683.16 rows=125094816
width=9) (actual time=0.022..169744.162 rows=125094818 loops=1)
 Total runtime: 1969844.753 ms
(5 rows)

With "enable_seqscan = off" I get this plan:

                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..89811549.81 rows=4 width=9)
   ->  Index Scan using quotes_exchange_key on quotes
(cost=0.00..89498812.77 rows=125094816 width=9)
(2 rows)

And again with execution times:


QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..89811549.81 rows=4 width=9) (actual
time=0.079..313068.922 rows=4 loops=1)
   ->  Index Scan using quotes_exchange_key on quotes
(cost=0.00..89498812.77 rows=125094816 width=9) (actual
time=0.078..273787.493 rows=125094818 loops=1)
 Total runtime: 313068.967 ms
(3 rows)

I understand that from looking at the estimations (89811549.81 with
index scan vs. 23683350.48 with sequential scan) the query planner had
to choose the sequential scan. So maybe I have to tune the planner cost
constants? Indeed I did some changes to these values, but in my opinion
this should make index scans preferable:

#seq_page_cost = 1.0                    # measured on an arbitrary scale
#random_page_cost = 4.0                 # same scale as above
random_page_cost = 1.0
#cpu_tuple_cost = 0.01                  # same scale as above
#cpu_index_tuple_cost = 0.005           # same scale as above
cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025             # same scale as above
#effective_cache_size = 128MB
effective_cache_size = 4GB

The machine is a dedicated database server with two dual-core xeon
processors and 8 GB memory.

Thanks for your help,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer



Re: How does the query planner make its plan?

From
Reg Me Please
Date:
It may depend on the index itself against the locales.

1. define the insex with the text_pattern_ops operato class
2. run vacuum analyze on the table
3. re-run the explain

See chapter 11 (especiallu 11.8) for v8.2


Il Tuesday 06 November 2007 16:25:09 Christian Schröder ha scritto:
> Hi list,
> once again I do not understand how the query planner works and why it
> apparently does not find the best result.
> I have a table with about 125 million rows. There is a char(5) column
> with a (non-unique) index. When I try to find the distinct values in
> this column using the following sql statement:
>
> select distinct exchange from foo
>
> the query planner chooses not to use the index, but performs a
> sequential scan. When I disfavour the use of sequential scans ("set
> enable_seqscan = off") the performance is more than 6 times better. Why
> does the query planner's plan go wrong? The table has been vacuum
> analyzed just before I ran the queries.
>
> Here is the plan when I let the query planner alone:
>
>                                    QUERY PLAN
> ---------------------------------------------------------------------------
>----- Unique  (cost=23057876.40..23683350.48 rows=4 width=9)
>    ->  Sort  (cost=23057876.40..23370613.44 rows=125094816 width=9)
>          Sort Key: exchange
>          ->  Seq Scan on quotes  (cost=0.00..3301683.16 rows=125094816
> width=9)
> (4 rows)
>
> This is what really happens:
>
>                                                               QUERY PLAN
> ---------------------------------------------------------------------------
>------------------------------------------------------------ Unique
> (cost=23057876.40..23683350.48 rows=4 width=9) (actual
> time=1577159.744..1968911.024 rows=4 loops=1)
>    ->  Sort  (cost=23057876.40..23370613.44 rows=125094816 width=9)
> (actual time=1577159.742..1927400.118 rows=125094818 loops=1)
>          Sort Key: exchange
>          ->  Seq Scan on quotes  (cost=0.00..3301683.16 rows=125094816
> width=9) (actual time=0.022..169744.162 rows=125094818 loops=1)
>  Total runtime: 1969844.753 ms
> (5 rows)
>
> With "enable_seqscan = off" I get this plan:
>
>                                               QUERY PLAN
> ---------------------------------------------------------------------------
>---------------------------- Unique  (cost=0.00..89811549.81 rows=4 width=9)
>    ->  Index Scan using quotes_exchange_key on quotes
> (cost=0.00..89498812.77 rows=125094816 width=9)
> (2 rows)
>
> And again with execution times:
>
>
> QUERY PLAN
> ---------------------------------------------------------------------------
>----------------------------------------------------------------------------
>------- Unique  (cost=0.00..89811549.81 rows=4 width=9) (actual
> time=0.079..313068.922 rows=4 loops=1)
>    ->  Index Scan using quotes_exchange_key on quotes
> (cost=0.00..89498812.77 rows=125094816 width=9) (actual
> time=0.078..273787.493 rows=125094818 loops=1)
>  Total runtime: 313068.967 ms
> (3 rows)
>
> I understand that from looking at the estimations (89811549.81 with
> index scan vs. 23683350.48 with sequential scan) the query planner had
> to choose the sequential scan. So maybe I have to tune the planner cost
> constants? Indeed I did some changes to these values, but in my opinion
> this should make index scans preferable:
>
> #seq_page_cost = 1.0                    # measured on an arbitrary scale
> #random_page_cost = 4.0                 # same scale as above
> random_page_cost = 1.0
> #cpu_tuple_cost = 0.01                  # same scale as above
> #cpu_index_tuple_cost = 0.005           # same scale as above
> cpu_index_tuple_cost = 0.001
> #cpu_operator_cost = 0.0025             # same scale as above
> #effective_cache_size = 128MB
> effective_cache_size = 4GB
>
> The machine is a dedicated database server with two dual-core xeon
> processors and 8 GB memory.
>
> Thanks for your help,
>     Christian



--
Reg me Please

Re: How does the query planner make its plan?

From
"Scott Marlowe"
Date:
On 11/6/07, Christian Schröder <cs@deriva.de> wrote:
> Hi list,
> once again I do not understand how the query planner works and why it
> apparently does not find the best result.
> I have a table with about 125 million rows. There is a char(5) column
> with a (non-unique) index. When I try to find the distinct values in
> this column using the following sql statement:
>
> select distinct exchange from foo
>
> the query planner chooses not to use the index, but performs a
> sequential scan. When I disfavour the use of sequential scans ("set
> enable_seqscan = off") the performance is more than 6 times better. Why
> does the query planner's plan go wrong? The table has been vacuum
> analyzed just before I ran the queries.

Does the sequential scan stay slow the second time you run it?  It's
possible that if you always run the seq scan first, then the index
scan second, the index scan will benefit from caching.

Assuming that repeated runs of each type shows the index scan to be
faster, then it's likely that it is both fitting into memory AND that
the table data is better ordered than the db thinks it is.

Have you tried upping the stats target on the exchange column and
re-running analyze to see if that helps?

Generally, random_page_cost should not really be 1 unless you're
running a db that wholly fits into memory or is on a SSD.

Note that even then index fetches cost more than seq scan fetches
because with an index fetch you hit the index THEN hit the table (two
fetches) where in a seq fetch you just hit the table.

I'd also try clustiner the table on exchange.