Thread: CREATE INDEX spoils IndexScan planns

CREATE INDEX spoils IndexScan planns

From

Nitz

Date:

31 October 2003, 08:31:25

Hi!

SOME BACKGROUND:
I am having a rather annoying problem which I have no explonation to so
I am calling it a bug.
Currenty I am designing a rather huge database with 700+ tables and
unfortunately I am quite late to discover
this as my test data is imported via \i
the_whole_database_definition_and_test_data.sql

The definition file is getting pretty big, so when I noticed that I
forgot to define one of the indexes I did a
CREATE INDEX by hand. It was than that I noticed that all planns to use
IndexScan on that table had been canceled
and that everything was to be done via SeqScan (full table scan?). Which
renders my table and the database useless
in real life. There are about 4.3 million records added to that table,
per month. SeqScan BAD - IndexScan GOOD! :)

Google time... Users complained about VACUUM (analyze | full)
"corrupting" the index or at least it's
planned usage. There have been some suggestions about strict datatype
casting (helps the optimizer), locale and
collation settings, etc. I've tried every resolution suggestion I was
able to find. Just to be absolutely sure,
I've even donwloaded the 7.4 BETA 5 and verified that the problem is
still present. No luck.


PROBLEM (btree index):
Index Scan plannes are ignored after ANY of the following:
CREATE INDEX... , VACUUM, VACUUM FULL, VACUUM ANALYZE, ANALYZE, REINDEX,
REINDEX TABLE, REINDEX INDEX
and ./contrib/reindexdb (yes, I've tried that too).

I have droped the whole database for each of the above commands and did
traces for all of them
as the one described bellow. What is even stranger is that the even the
new index is never used?


PLATFORM (scope):
- The original problem was discovered on postgres 7.3.4 running on
RedHat 7.3 (Intel)
- This has been verified on another server running postgres 7.3.4 RedHat
7.2 (Intel)
- Problem still present in postgres 7.4.beta5


RESOLUTION ATTEMPTS:
- droping and recreating the database makes it OK (duh?!) until any of
the above commands are executed,
- reindex doesn't help, but is actually a part of the problem,
- iso8859-1 LATIN1 (no change)
- iso8859-2 LATIN2 (no change)
- --collate=C (no change)
- strict type casting the id SERIAL as ::INTEGER and ::INT4 (no change)
- type casting the SELECT query with id::int4 (no change)
- problem persistant throughout all column data types (INT, SERIAL8,
TEXT, VARCHAR, ...)


RESOLUTION (a really bad one):
Someone suggested SET -ing the enable_seqscan to OFF / FALSE, which
seems to do the trick, but I generaly
consider this to be bad idea.


TRACE:
The original tables are much bigger, so I've tried to simplify things here.
Please let me know if there is anything that I could help you with.


In desperate need of help,

Kind regards,

Vince.


Here's the step-by-step trace on an (out of the box clean) 7.4.beta5:
-------------------------------------------------------------------------

[root@charlie test]# psql -U postgres template1
Welcome to psql 7.4beta5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

template1=# create database test;
CREATE DATABASE

template1=# \c test
You are now connected to database "test".

test=# \d
No relations found.

test=# CREATE table test(
test(#  id SERIAL not NULL PRIMARY KEY,
test(#  first_name VARCHAR(50),
test(#  last_name  VARCHAR(50)
test(# )with OIDS;
NOTICE:  CREATE TABLE will create implicit sequence "test_id_seq" for
"serial" column "test.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"test_pkey" for table "test"
CREATE TABLE

test=# INSERT INTO test(first_name, last_name) VALUES('John', 'Smith');
INSERT INTO test(first_name, last_name) VALUES('Rob', 'Roy');
INSERT INTO test(first_name, last_name) VALUES('William', 'McAndrews');
INSERT INTO test(first_name, last_name) VALUES('Sean', 'O''Neil');
INSERT INTO test(first_name, last_name) VALUES('Terrance', 'Phillup');
INSERT 17164 1
test=# INSERT INTO test(first_name, last_name) VALUES('Rob', 'Roy');
INSERT 17165 1
test=# INSERT INTO test(first_name, last_name) VALUES('William',
'McAndrews');
INSERT 17166 1
test=# INSERT INTO test(first_name, last_name) VALUES('Sean', 'O''Neil');
INSERT 17167 1
test=# INSERT INTO test(first_name, last_name) VALUES('Terrance',
'Phillup');
INSERT 17168 1
test=#

test=# select * from test;
 id | first_name | last_name
----+------------+-----------
  1 | John       | Smith
  2 | Rob        | Roy
  3 | William    | McAndrews
  4 | Sean       | O'Neil
  5 | Terrance   | Phillup
(5 rows)

test=# explain analyze select * from test;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..20.00 rows=1000 width=90) (actual
time=0.040..0.077 rows=5 loops=1)
 Total runtime: 0.347 ms
(2 rows)

test=# explain analyze select * from test where id = 3;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..4.82 rows=2 width=90)
(actual time=0.095..0.110 rows=1 loops=1)
   Index Cond: (id = 3)
 Total runtime: 0.466 ms
(3 rows)

test=# select * from test where id = 3;
 id | first_name | last_name
----+------------+-----------
  3 | William    | McAndrews
(1 row)



-- PROBLEM STARTS HERE!
--------------------------


test=# create index my_index on test(last_name);
CREATE INDEX

test=# explain analyze select * from test where id = 3;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1.06 rows=2 width=90) (actual
time=0.098..0.122 rows=1 loops=1)
   Filter: (id = 3)
 Total runtime: 0.524 ms
(3 rows)

test=# \d test
                                    Table "public.test"
   Column   |         Type          |                      Modifiers
------------+-----------------------+------------------------------------------------------
 id         | integer               | not null default
nextval('public.test_id_seq'::text)
 first_name | character varying(50) |
 last_name  | character varying(50) |
Indexes:
    "test_pkey" primary key, btree (id)
    "my_index" btree (last_name)

test=# reindex table test;
REINDEX

test=# explain analyze select * from test where id = 3;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1.06 rows=2 width=90) (actual
time=0.088..0.112 rows=1 loops=1)
   Filter: (id = 3)
 Total runtime: 0.361 ms
(3 rows)

test=# vacuum full;
VACUUM

test=# explain analyze select * from test where id = 3;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1.06 rows=2 width=90) (actual
time=0.099..0.122 rows=1 loops=1)
   Filter: (id = 3)
 Total runtime: 0.527 ms
(3 rows)

test=# vacuum analyze;
VACUUM

test=# explain analyze select * from test where id = 3;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1.06 rows=2 width=23) (actual
time=0.090..0.113 rows=1 loops=1)
   Filter: (id = 3)
 Total runtime: 0.405 ms
(3 rows)

test=# analyze;
ANALYZE

test=# explain analyze select * from test where id = 3;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1.06 rows=2 width=23) (actual
time=0.097..0.121 rows=1 loops=1)
   Filter: (id = 3)
 Total runtime: 2.669 ms
(3 rows)

test=# reindex index test_pkey;
REINDEX

test=# explain analyze select * from test where id = 3;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1.06 rows=2 width=23) (actual
time=0.089..0.113 rows=1 loops=1)
   Filter: (id = 3)
 Total runtime: 0.366 ms
(3 rows)

test=# explain analyze select * from test where last_name = 'McAndrews';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1.06 rows=2 width=23) (actual
time=0.107..0.140 rows=1 loops=1)
   Filter: ((last_name)::text = 'McAndrews'::text)
 Total runtime: 0.393 ms
(3 rows)

test=# insert into test(first_name, last_name) values('Vince', 'K.');
INSERT 17173 1

test=# explain analyze select * from test where last_name = 'K.';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1.06 rows=2 width=23) (actual
time=0.136..0.147 rows=1 loops=1)
   Filter: ((last_name)::text = 'K.'::text)
 Total runtime: 0.393 ms
(3 rows)

-- END: trace
=================================================================================

Re: CREATE INDEX spoils IndexScan planns

From

Rod Taylor

Date:

31 October 2003, 09:30:12

> TRACE:
> The original tables are much bigger, so I've tried to simplify things her=
e.
> Please let me know if there is anything that I could help you with.

You can't do that and expect to get reasonable results. The plans will
change with the volume of data.

Send an explain analyze of the true problem after ANALYZE with seqscan
on and one with it off.

Thanks.

Re: CREATE INDEX spoils IndexScan planns

From

Nitz

Date:

31 October 2003, 13:26:14

Hi Rod,

here is the actual production trace of the problem.
This is a table of mobile network cells and code-names devided into LAC's.

Two test cells to test with are:
test cell id #1: 900 4900035
test cell id #2: 300 5080140

You were right, the volume of the data changes the optimizer's
willingness to use indexes.
Another  funny thing though... I actually did two tests. One with the
actual production
data and the other one using only a slice of that (just 1000 rows). On
the second smaller
test the optimizer insisted to go with the SeqScan eventhou IndexScan
(after forcing it)
turned out to be about 10 times faster. Here are the both traces...

Many thanks for your effort,


Kind regards,

Vince


TRACE #1 (the big one):
-----------------------------------------------------------------------------------------------------------------------
Welcome to psql 7.4beta5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

stats=# select count(*) from omc_cell;
 count
-------
 42843
(1 row)

stats=# show enable_seqscan;
 enable_seqscan
----------------
 on
(1 row)

stats=# explain analyze select * from omc_cell where lac = '900' and
cellid = '4900035';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..4.83 rows=1
width=72) (actual time=0.177..0.194 rows=1 loops=1)
   Index Cond: (((lac)::text = '900'::text) AND ((cellid)::text =
'4900035'::text))
 Total runtime: 0.604 ms
(3 rows)

stats=# explain analyze select * from omc_cell where lac = '300' and
cellid = '5080140';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..4.83 rows=1
width=72) (actual time=0.176..0.194 rows=1 loops=1)
   Index Cond: (((lac)::text = '300'::text) AND ((cellid)::text =
'5080140'::text))
 Total runtime: 0.541 ms
(3 rows)


stats=# analyze;
ANALYZE

stats=# explain analyze select * from omc_cell where lac = '900' and
cellid = '4900035';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..5.00 rows=1
width=34) (actual time=0.175..0.194 rows=1 loops=1)
   Index Cond: (((lac)::text = '900'::text) AND ((cellid)::text =
'4900035'::text))
 Total runtime: 2.044 ms
(3 rows)

stats=# explain analyze select * from omc_cell where lac = '300' and
cellid = '5080140';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..5.00 rows=1
width=34) (actual time=0.179..0.197 rows=1 loops=1)
   Index Cond: (((lac)::text = '300'::text) AND ((cellid)::text =
'5080140'::text))
 Total runtime: 0.549 ms
(3 rows)

stats=# set enable_seqscan to off;
SET

stats=# show enable_seqscan;
 enable_seqscan
----------------
 off
(1 row)

stats=# analyze;
ANALYZE

stats=# explain analyze select * from omc_cell where lac = '900' and
cellid = '4900035';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..5.00 rows=1
width=34) (actual time=0.173..0.192 rows=1 loops=1)
   Index Cond: (((lac)::text = '900'::text) AND ((cellid)::text =
'4900035'::text))
 Total runtime: 1.954 ms
(3 rows)

stats=# explain analyze select * from omc_cell where lac = '300' and
cellid = '5080140';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..5.00 rows=1
width=34) (actual time=0.173..0.191 rows=1 loops=1)
   Index Cond: (((lac)::text = '300'::text) AND ((cellid)::text =
'5080140'::text))
 Total runtime: 0.544 ms
(3 rows)

stats=# create index test_x on omc_cell(cellid);
CREATE INDEX
stats=# explain analyze select * from omc_cell where lac = '900' and
cellid = '4900035';
                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..6.00 rows=1
width=34) (actual time=31.507..31.533 rows=1 loops=1)
   Index Cond: (((lac)::text = '900'::text) AND ((cellid)::text =
'4900035'::text))
 Total runtime: 31.899 ms
(3 rows)

stats=# explain analyze select * from omc_cell where lac = '300' and
cellid = '5080140';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..6.00 rows=1
width=34) (actual time=0.175..0.193 rows=1 loops=1)
   Index Cond: (((lac)::text = '300'::text) AND ((cellid)::text =
'5080140'::text))
 Total runtime: 0.550 ms
(3 rows)

stats=# explain analyze select * from omc_cell where cellid = '4900035';
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Index Scan using test_x on omc_cell  (cost=0.00..6.01 rows=1 width=34)
(actual time=0.156..0.175 rows=1 loops=1)
   Index Cond: ((cellid)::text = '4900035'::text)
 Total runtime: 0.524 ms
(3 rows)







TRACE #2 (the small, funny one)
------------------------------------------------------------------------------------------------------------------
Welcome to psql 7.4beta5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

stats=# explain analyze select * from omc_cell where lac = '500';
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..13.32 rows=4
width=72) (actual time=0.345..2.941 rows=150 loops=1)
   Index Cond: ((lac)::text = '500'::text)
 Total runtime: 3.976 ms
(3 rows)

stats=# explain analyze select * from omc_cell where lac = '600';
                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..13.32 rows=4
width=72) (actual time=0.136..1.112 rows=61 loops=1)
   Index Cond: ((lac)::text = '600'::text)
 Total runtime: 1.714 ms
(3 rows)

stats=# show enable_seqscan;
 enable_seqscan
----------------
 on
(1 row)

stats=# analyze;
ANALYZE
stats=# explain analyze select * from omc_cell where lac = '500';
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Seq Scan on omc_cell  (cost=0.00..14.80 rows=150 width=40) (actual
time=5.330..8.302 rows=150 loops=1)
   Filter: ((lac)::text = '500'::text)
 Total runtime: 10.810 ms
(3 rows)

stats=# explain analyze select * from omc_cell where lac = '600';
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on omc_cell  (cost=0.00..14.80 rows=62 width=40) (actual
time=6.599..7.624 rows=61 loops=1)
   Filter: ((lac)::text = '600'::text)
 Total runtime: 8.142 ms
(3 rows)

stats=# set enable_seqscan to off;
SET
stats=# show enable_seqscan;
 enable_seqscan
----------------
 off
(1 row)

stats=# analyze;
ANALYZE
stats=# explain analyze select * from omc_cell where lac = '500';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..21.05 rows=150
width=40) (actual time=0.133..2.711 rows=150 loops=1)
   Index Cond: ((lac)::text = '500'::text)
 Total runtime: 4.999 ms
(3 rows)

stats=# explain analyze select * from omc_cell where lac = '600';
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_omc_cell on omc_cell  (cost=0.00..19.85 rows=62
width=40) (actual time=0.127..1.106 rows=61 loops=1)
   Index Cond: ((lac)::text = '600'::text)
 Total runtime: 1.642 ms
(3 rows)




Rod Taylor wrote:

>>TRACE:
>>The original tables are much bigger, so I've tried to simplify things here.
>>Please let me know if there is anything that I could help you with.
>>
>>
>
>You can't do that and expect to get reasonable results. The plans will
>change with the volume of data.
>
>Send an explain analyze of the true problem after ANALYZE with seqscan
>on and one with it off.
>
>Thanks.
>
>

Re: CREATE INDEX spoils IndexScan planns

From

Neil Conway

Date:

31 October 2003, 17:35:20

On Fri, 2003-10-31 at 12:25, Nitz wrote:
> You were right, the volume of the data changes the optimizer's
> willingness to use indexes.

AFAICS, the optimizer seems to be making exactly the right guesses for
the production data -- i.e. there's no problem/bug.

> Another  funny thing though... I actually did two tests. One with the
> actual production
> data and the other one using only a slice of that (just 1000 rows). On
> the second smaller
> test the optimizer insisted to go with the SeqScan eventhou IndexScan
> (after forcing it)
> turned out to be about 10 times faster. Here are the both traces...

Well, the principle the optimizer is using (that as the size of the
table shrinks, the usefulness of an index scan shrinks) is correct, but
it's just being a little too enthusiastic about switching to a seqscan.
Try decreasing the random_page_cost GUC variable until the optimizer
starts to use index scans more appropriately for the small table.

-Neil