Thread: CREATE INDEX spoils IndexScan planns
Hi! SOME BACKGROUND: I am having a rather annoying problem which I have no explonation to so I am calling it a bug. Currenty I am designing a rather huge database with 700+ tables and unfortunately I am quite late to discover this as my test data is imported via \i the_whole_database_definition_and_test_data.sql The definition file is getting pretty big, so when I noticed that I forgot to define one of the indexes I did a CREATE INDEX by hand. It was than that I noticed that all planns to use IndexScan on that table had been canceled and that everything was to be done via SeqScan (full table scan?). Which renders my table and the database useless in real life. There are about 4.3 million records added to that table, per month. SeqScan BAD - IndexScan GOOD! :) Google time... Users complained about VACUUM (analyze | full) "corrupting" the index or at least it's planned usage. There have been some suggestions about strict datatype casting (helps the optimizer), locale and collation settings, etc. I've tried every resolution suggestion I was able to find. Just to be absolutely sure, I've even donwloaded the 7.4 BETA 5 and verified that the problem is still present. No luck. PROBLEM (btree index): Index Scan plannes are ignored after ANY of the following: CREATE INDEX... , VACUUM, VACUUM FULL, VACUUM ANALYZE, ANALYZE, REINDEX, REINDEX TABLE, REINDEX INDEX and ./contrib/reindexdb (yes, I've tried that too). I have droped the whole database for each of the above commands and did traces for all of them as the one described bellow. What is even stranger is that the even the new index is never used? PLATFORM (scope): - The original problem was discovered on postgres 7.3.4 running on RedHat 7.3 (Intel) - This has been verified on another server running postgres 7.3.4 RedHat 7.2 (Intel) - Problem still present in postgres 7.4.beta5 RESOLUTION ATTEMPTS: - droping and recreating the database makes it OK (duh?!) until any of the above commands are executed, - reindex doesn't help, but is actually a part of the problem, - iso8859-1 LATIN1 (no change) - iso8859-2 LATIN2 (no change) - --collate=C (no change) - strict type casting the id SERIAL as ::INTEGER and ::INT4 (no change) - type casting the SELECT query with id::int4 (no change) - problem persistant throughout all column data types (INT, SERIAL8, TEXT, VARCHAR, ...) RESOLUTION (a really bad one): Someone suggested SET -ing the enable_seqscan to OFF / FALSE, which seems to do the trick, but I generaly consider this to be bad idea. TRACE: The original tables are much bigger, so I've tried to simplify things here. Please let me know if there is anything that I could help you with. In desperate need of help, Kind regards, Vince. Here's the step-by-step trace on an (out of the box clean) 7.4.beta5: ------------------------------------------------------------------------- [root@charlie test]# psql -U postgres template1 Welcome to psql 7.4beta5, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit template1=# create database test; CREATE DATABASE template1=# \c test You are now connected to database "test". test=# \d No relations found. test=# CREATE table test( test(# id SERIAL not NULL PRIMARY KEY, test(# first_name VARCHAR(50), test(# last_name VARCHAR(50) test(# )with OIDS; NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for "serial" column "test.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" CREATE TABLE test=# INSERT INTO test(first_name, last_name) VALUES('John', 'Smith'); INSERT INTO test(first_name, last_name) VALUES('Rob', 'Roy'); INSERT INTO test(first_name, last_name) VALUES('William', 'McAndrews'); INSERT INTO test(first_name, last_name) VALUES('Sean', 'O''Neil'); INSERT INTO test(first_name, last_name) VALUES('Terrance', 'Phillup'); INSERT 17164 1 test=# INSERT INTO test(first_name, last_name) VALUES('Rob', 'Roy'); INSERT 17165 1 test=# INSERT INTO test(first_name, last_name) VALUES('William', 'McAndrews'); INSERT 17166 1 test=# INSERT INTO test(first_name, last_name) VALUES('Sean', 'O''Neil'); INSERT 17167 1 test=# INSERT INTO test(first_name, last_name) VALUES('Terrance', 'Phillup'); INSERT 17168 1 test=# test=# select * from test; id | first_name | last_name ----+------------+----------- 1 | John | Smith 2 | Rob | Roy 3 | William | McAndrews 4 | Sean | O'Neil 5 | Terrance | Phillup (5 rows) test=# explain analyze select * from test; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..20.00 rows=1000 width=90) (actual time=0.040..0.077 rows=5 loops=1) Total runtime: 0.347 ms (2 rows) test=# explain analyze select * from test where id = 3; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using test_pkey on test (cost=0.00..4.82 rows=2 width=90) (actual time=0.095..0.110 rows=1 loops=1) Index Cond: (id = 3) Total runtime: 0.466 ms (3 rows) test=# select * from test where id = 3; id | first_name | last_name ----+------------+----------- 3 | William | McAndrews (1 row) -- PROBLEM STARTS HERE! -------------------------- test=# create index my_index on test(last_name); CREATE INDEX test=# explain analyze select * from test where id = 3; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1.06 rows=2 width=90) (actual time=0.098..0.122 rows=1 loops=1) Filter: (id = 3) Total runtime: 0.524 ms (3 rows) test=# \d test Table "public.test" Column | Type | Modifiers ------------+-----------------------+------------------------------------------------------ id | integer | not null default nextval('public.test_id_seq'::text) first_name | character varying(50) | last_name | character varying(50) | Indexes: "test_pkey" primary key, btree (id) "my_index" btree (last_name) test=# reindex table test; REINDEX test=# explain analyze select * from test where id = 3; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1.06 rows=2 width=90) (actual time=0.088..0.112 rows=1 loops=1) Filter: (id = 3) Total runtime: 0.361 ms (3 rows) test=# vacuum full; VACUUM test=# explain analyze select * from test where id = 3; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1.06 rows=2 width=90) (actual time=0.099..0.122 rows=1 loops=1) Filter: (id = 3) Total runtime: 0.527 ms (3 rows) test=# vacuum analyze; VACUUM test=# explain analyze select * from test where id = 3; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1.06 rows=2 width=23) (actual time=0.090..0.113 rows=1 loops=1) Filter: (id = 3) Total runtime: 0.405 ms (3 rows) test=# analyze; ANALYZE test=# explain analyze select * from test where id = 3; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1.06 rows=2 width=23) (actual time=0.097..0.121 rows=1 loops=1) Filter: (id = 3) Total runtime: 2.669 ms (3 rows) test=# reindex index test_pkey; REINDEX test=# explain analyze select * from test where id = 3; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1.06 rows=2 width=23) (actual time=0.089..0.113 rows=1 loops=1) Filter: (id = 3) Total runtime: 0.366 ms (3 rows) test=# explain analyze select * from test where last_name = 'McAndrews'; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1.06 rows=2 width=23) (actual time=0.107..0.140 rows=1 loops=1) Filter: ((last_name)::text = 'McAndrews'::text) Total runtime: 0.393 ms (3 rows) test=# insert into test(first_name, last_name) values('Vince', 'K.'); INSERT 17173 1 test=# explain analyze select * from test where last_name = 'K.'; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1.06 rows=2 width=23) (actual time=0.136..0.147 rows=1 loops=1) Filter: ((last_name)::text = 'K.'::text) Total runtime: 0.393 ms (3 rows) -- END: trace =================================================================================
> TRACE: > The original tables are much bigger, so I've tried to simplify things her= e. > Please let me know if there is anything that I could help you with. You can't do that and expect to get reasonable results. The plans will change with the volume of data. Send an explain analyze of the true problem after ANALYZE with seqscan on and one with it off. Thanks.
Hi Rod, here is the actual production trace of the problem. This is a table of mobile network cells and code-names devided into LAC's. Two test cells to test with are: test cell id #1: 900 4900035 test cell id #2: 300 5080140 You were right, the volume of the data changes the optimizer's willingness to use indexes. Another funny thing though... I actually did two tests. One with the actual production data and the other one using only a slice of that (just 1000 rows). On the second smaller test the optimizer insisted to go with the SeqScan eventhou IndexScan (after forcing it) turned out to be about 10 times faster. Here are the both traces... Many thanks for your effort, Kind regards, Vince TRACE #1 (the big one): ----------------------------------------------------------------------------------------------------------------------- Welcome to psql 7.4beta5, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit stats=# select count(*) from omc_cell; count ------- 42843 (1 row) stats=# show enable_seqscan; enable_seqscan ---------------- on (1 row) stats=# explain analyze select * from omc_cell where lac = '900' and cellid = '4900035'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..4.83 rows=1 width=72) (actual time=0.177..0.194 rows=1 loops=1) Index Cond: (((lac)::text = '900'::text) AND ((cellid)::text = '4900035'::text)) Total runtime: 0.604 ms (3 rows) stats=# explain analyze select * from omc_cell where lac = '300' and cellid = '5080140'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..4.83 rows=1 width=72) (actual time=0.176..0.194 rows=1 loops=1) Index Cond: (((lac)::text = '300'::text) AND ((cellid)::text = '5080140'::text)) Total runtime: 0.541 ms (3 rows) stats=# analyze; ANALYZE stats=# explain analyze select * from omc_cell where lac = '900' and cellid = '4900035'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..5.00 rows=1 width=34) (actual time=0.175..0.194 rows=1 loops=1) Index Cond: (((lac)::text = '900'::text) AND ((cellid)::text = '4900035'::text)) Total runtime: 2.044 ms (3 rows) stats=# explain analyze select * from omc_cell where lac = '300' and cellid = '5080140'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..5.00 rows=1 width=34) (actual time=0.179..0.197 rows=1 loops=1) Index Cond: (((lac)::text = '300'::text) AND ((cellid)::text = '5080140'::text)) Total runtime: 0.549 ms (3 rows) stats=# set enable_seqscan to off; SET stats=# show enable_seqscan; enable_seqscan ---------------- off (1 row) stats=# analyze; ANALYZE stats=# explain analyze select * from omc_cell where lac = '900' and cellid = '4900035'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..5.00 rows=1 width=34) (actual time=0.173..0.192 rows=1 loops=1) Index Cond: (((lac)::text = '900'::text) AND ((cellid)::text = '4900035'::text)) Total runtime: 1.954 ms (3 rows) stats=# explain analyze select * from omc_cell where lac = '300' and cellid = '5080140'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..5.00 rows=1 width=34) (actual time=0.173..0.191 rows=1 loops=1) Index Cond: (((lac)::text = '300'::text) AND ((cellid)::text = '5080140'::text)) Total runtime: 0.544 ms (3 rows) stats=# create index test_x on omc_cell(cellid); CREATE INDEX stats=# explain analyze select * from omc_cell where lac = '900' and cellid = '4900035'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..6.00 rows=1 width=34) (actual time=31.507..31.533 rows=1 loops=1) Index Cond: (((lac)::text = '900'::text) AND ((cellid)::text = '4900035'::text)) Total runtime: 31.899 ms (3 rows) stats=# explain analyze select * from omc_cell where lac = '300' and cellid = '5080140'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..6.00 rows=1 width=34) (actual time=0.175..0.193 rows=1 loops=1) Index Cond: (((lac)::text = '300'::text) AND ((cellid)::text = '5080140'::text)) Total runtime: 0.550 ms (3 rows) stats=# explain analyze select * from omc_cell where cellid = '4900035'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Index Scan using test_x on omc_cell (cost=0.00..6.01 rows=1 width=34) (actual time=0.156..0.175 rows=1 loops=1) Index Cond: ((cellid)::text = '4900035'::text) Total runtime: 0.524 ms (3 rows) TRACE #2 (the small, funny one) ------------------------------------------------------------------------------------------------------------------ Welcome to psql 7.4beta5, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit stats=# explain analyze select * from omc_cell where lac = '500'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..13.32 rows=4 width=72) (actual time=0.345..2.941 rows=150 loops=1) Index Cond: ((lac)::text = '500'::text) Total runtime: 3.976 ms (3 rows) stats=# explain analyze select * from omc_cell where lac = '600'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..13.32 rows=4 width=72) (actual time=0.136..1.112 rows=61 loops=1) Index Cond: ((lac)::text = '600'::text) Total runtime: 1.714 ms (3 rows) stats=# show enable_seqscan; enable_seqscan ---------------- on (1 row) stats=# analyze; ANALYZE stats=# explain analyze select * from omc_cell where lac = '500'; QUERY PLAN -------------------------------------------------------------------------------------------------------- Seq Scan on omc_cell (cost=0.00..14.80 rows=150 width=40) (actual time=5.330..8.302 rows=150 loops=1) Filter: ((lac)::text = '500'::text) Total runtime: 10.810 ms (3 rows) stats=# explain analyze select * from omc_cell where lac = '600'; QUERY PLAN ------------------------------------------------------------------------------------------------------ Seq Scan on omc_cell (cost=0.00..14.80 rows=62 width=40) (actual time=6.599..7.624 rows=61 loops=1) Filter: ((lac)::text = '600'::text) Total runtime: 8.142 ms (3 rows) stats=# set enable_seqscan to off; SET stats=# show enable_seqscan; enable_seqscan ---------------- off (1 row) stats=# analyze; ANALYZE stats=# explain analyze select * from omc_cell where lac = '500'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..21.05 rows=150 width=40) (actual time=0.133..2.711 rows=150 loops=1) Index Cond: ((lac)::text = '500'::text) Total runtime: 4.999 ms (3 rows) stats=# explain analyze select * from omc_cell where lac = '600'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using pk_omc_cell on omc_cell (cost=0.00..19.85 rows=62 width=40) (actual time=0.127..1.106 rows=61 loops=1) Index Cond: ((lac)::text = '600'::text) Total runtime: 1.642 ms (3 rows) Rod Taylor wrote: >>TRACE: >>The original tables are much bigger, so I've tried to simplify things here. >>Please let me know if there is anything that I could help you with. >> >> > >You can't do that and expect to get reasonable results. The plans will >change with the volume of data. > >Send an explain analyze of the true problem after ANALYZE with seqscan >on and one with it off. > >Thanks. > >
On Fri, 2003-10-31 at 12:25, Nitz wrote: > You were right, the volume of the data changes the optimizer's > willingness to use indexes. AFAICS, the optimizer seems to be making exactly the right guesses for the production data -- i.e. there's no problem/bug. > Another funny thing though... I actually did two tests. One with the > actual production > data and the other one using only a slice of that (just 1000 rows). On > the second smaller > test the optimizer insisted to go with the SeqScan eventhou IndexScan > (after forcing it) > turned out to be about 10 times faster. Here are the both traces... Well, the principle the optimizer is using (that as the size of the table shrinks, the usefulness of an index scan shrinks) is correct, but it's just being a little too enthusiastic about switching to a seqscan. Try decreasing the random_page_cost GUC variable until the optimizer starts to use index scans more appropriately for the small table. -Neil