Thread: Doesn't use index, why?
Hello, I have some doubt about index. Look the follow example: naslog=# \d desconexao Table "desconexao" Attribute | Type | Modifier ----------------+-------------+------------------------------------------------- id | integer | not null default nextval('id_desconexao'::text) time | timestamp | client_user | varchar(20) | client | varchar(40) | ip_nas | inet | ip_client_user | inet | disconnect | smallint | user_time | interval | data_rate | integer | called_number | varchar(14) | calling_number | varchar(14) | filtrado | boolean | default 'f'::bool Indices: client_desconexao_idx, desconexao_pkey, filtro_idx, time_idx naslog=# \d time_idx Index "time_idx" Attribute | Type -----------+----------- time | timestamp btree naslog=# explain select * from desconexao where time = '2000-12-01'; NOTICE: QUERY PLAN: Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5 width=103) EXPLAIN Every thing is perfect, but when i execute de follow query: naslog=# explain select * from desconexao where time > '2000-12-01'; NOTICE: QUERY PLAN: Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103) EXPLAIN Why postgresql didn't use the time_idx index? I only changed the operator "=" to ">" in clause "where". Thank you. Regards -- Marco Catunda
Marco Catunda wrote: > Hello, > > I have some doubt about index. > Look the follow example: > > > naslog=# \d desconexao > Table "desconexao" > Attribute | Type | Modifier > ----------------+-------------+------------------------------------------------- > id | integer | not null default > nextval('id_desconexao'::text) > time | timestamp | > client_user | varchar(20) | > client | varchar(40) | > ip_nas | inet | > ip_client_user | inet | > disconnect | smallint | > user_time | interval | > data_rate | integer | > called_number | varchar(14) | > calling_number | varchar(14) | > filtrado | boolean | default 'f'::bool > Indices: client_desconexao_idx, > desconexao_pkey, > filtro_idx, > time_idx > > > naslog=# \d time_idx > Index "time_idx" > Attribute | Type > -----------+----------- > time | timestamp > btree > > > naslog=# explain select * from desconexao where time = '2000-12-01'; > NOTICE: QUERY PLAN: > > Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5 > width=103) > > EXPLAIN > > > > Every thing is perfect, but when i execute de follow query: > > > naslog=# explain select * from desconexao where time > '2000-12-01'; > NOTICE: QUERY PLAN: > > Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103) > > EXPLAIN > > > Why postgresql didn't use the time_idx index? I only changed the > operator > "=" to ">" in clause "where". > > > Thank you. > Regards > -- Marco Catunda > > Firstly you should include your version along with the report. Two thoughts ... did you run vacuum? Of the total number of records how many are greater than 2000-12-01? It could be that postgresql figures that a large portion of the result set is greater than this date and it is faster to do a sequential scan
Marco Catunda <catunda@pobox.com> writes: > naslog=# explain select * from desconexao where time = '2000-12-01'; > NOTICE: QUERY PLAN: > Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5 > width=103) > naslog=# explain select * from desconexao where time > '2000-12-01'; > NOTICE: QUERY PLAN: > Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103) > Why postgresql didn't use the time_idx index? I only changed the > operator "=" to ">" in clause "where". Because the '>' condition is much less restrictive (notice the difference in estimated row counts: 5 versus 657958). Unless the planner's row count estimates are way off, it very likely *is* faster to do the second query by sequential scan. regards, tom lane
On 04 Jan 2001 11:55:14 -0500, Dave Smith wrote: > Marco Catunda wrote: > > > Hello, > > > > I have some doubt about index. > > Look the follow example: > > > > > > naslog=# \d desconexao > > Table "desconexao" > > Attribute | Type | Modifier > > ----------------+-------------+------------------------------------------------- > > id | integer | not null default > > nextval('id_desconexao'::text) > > time | timestamp | > > client_user | varchar(20) | > > client | varchar(40) | > > ip_nas | inet | > > ip_client_user | inet | > > disconnect | smallint | > > user_time | interval | > > data_rate | integer | > > called_number | varchar(14) | > > calling_number | varchar(14) | > > filtrado | boolean | default 'f'::bool > > Indices: client_desconexao_idx, > > desconexao_pkey, > > filtro_idx, > > time_idx > > > > > > naslog=# \d time_idx > > Index "time_idx" > > Attribute | Type > > -----------+----------- > > time | timestamp > > btree > > > > > > naslog=# explain select * from desconexao where time = '2000-12-01'; > > NOTICE: QUERY PLAN: > > > > Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5 > > width=103) > > > > EXPLAIN > > > > > > > > Every thing is perfect, but when i execute de follow query: > > > > > > naslog=# explain select * from desconexao where time > '2000-12-01'; > > NOTICE: QUERY PLAN: > > > > Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103) > > > > EXPLAIN > > > > > > Why postgresql didn't use the time_idx index? I only changed the > > operator > > "=" to ">" in clause "where". > > > > > > Thank you. > > Regards > > -- Marco Catunda > > > > > Firstly you should include your version along with the report. > > Two thoughts ... did you run vacuum? > Of the total number of records how many are greater than 2000-12-01? It > could be that postgresql figures that a large portion of the result set > is greater than this date and it is faster to do a sequential scan Sorry, the version of Postgresql is 7.0.3 This is the number of records in database: naslog=# select count(*) from desconexao; count -------- 658617 (1 row) I changed the data of example because all records is greater than '2000-12-01', sorry. But when I try '2000-12-10' the samething happens. naslog=# explain select * from desconexao where time < '2000-12-10'; NOTICE: QUERY PLAN: Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103) EXPLAIN naslog=# explain select * from desconexao where time >= '2000-12-10'; NOTICE: QUERY PLAN: Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103) EXPLAIN The number of records are: naslog=# select count(*) from desconexao where time >= '2000-12-10'; count -------- 585789 (1 row) naslog=# select count(*) from desconexao where time < '2000-12-10'; count ------- 72828 (1 row) I think the estimate rows (231489) in query < '2000-12-10' is far away to real value (72828). So I execute vacuum analyze: naslog=# vacuum verbose analyze desconexao; NOTICE: --Relation desconexao-- NOTICE: Pages 11318: Changed 0, reaped 2616, Empty 0, New 0; Tup 658617: Vac 200, Keep/VTL 0/0, Crash 0, UnUsed 8007, MinLen 105, MaxLen 166; Re-using: Free/Avail. Space 150084/35492; EndEmpty/Avail. Pages 0/105. CPU 3.08s/27.90u sec. NOTICE: Index client_desconexao_idx: Pages 9786; Tuples 658617: Deleted 200. CPU 2.50s/3.33u sec. NOTICE: Index filtro_idx: Pages 2946; Tuples 658617: Deleted 200. CPU 0.70s/3.20u sec. NOTICE: Index time_idx: Pages 5503; Tuples 658617: Deleted 200. CPU 1.52s/3.04u sec. NOTICE: Index desconexao_pkey: Pages 5893; Tuples 658617: Deleted 200. CPU 1.59s/3.12u sec. NOTICE: Rel desconexao: Pages: 11318 --> 11315; Tuple(s) moved: 199. CPU 5.41s/1.03u sec. NOTICE: Index client_desconexao_idx: Pages 9786; Tuples 658617: Deleted 199. CPU 2.85s/1.79u sec. NOTICE: Index filtro_idx: Pages 2946; Tuples 658617: Deleted 199. CPU 0.71s/1.60u sec. NOTICE: Index time_idx: Pages 5503; Tuples 658617: Deleted 199. CPU 1.51s/1.65u sec. NOTICE: Index desconexao_pkey: Pages 5893; Tuples 658617: Deleted 199. CPU 1.59s/1.58u sec. VACUUM naslog=# explain select * from desconexao where time < '2000-12-10'; NOTICE: QUERY PLAN: Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103) EXPLAIN naslog=# explain select * from desconexao where time >= '2000-12-10'; NOTICE: QUERY PLAN: Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103) EXPLAIN The same estimates values happen. Is there a way to force index? Thank you -- Marco Catunda
Marco Catunda <catunda@pobox.com> writes: > naslog=# explain select * from desconexao where time < '2000-12-10'; > NOTICE: QUERY PLAN: > Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103) > naslog=# explain select * from desconexao where time >= > '2000-12-10'; > NOTICE: QUERY PLAN: > Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103) > The number of records are: > naslog=# select count(*) from desconexao where time >= '2000-12-10'; > count > -------- > 585789 > (1 row) > naslog=# select count(*) from desconexao where time < '2000-12-10'; > count > ------- > 72828 > (1 row) In this case the planner is doing *exactly* the right thing; it is smarter than you are. If you want to prove it, force the planner to use an indexscan by doing SET ENABLE_SEQSCAN TO OFF. Then time the query, and compare the runtime against the seqscan version. The bottom line here is that a query that needs to touch more than a few percent of the rows in a table is better off being done as a seqscan. regards, tom lane
> In this case the planner is doing *exactly* the right thing; it is > smarter than you are. If you want to prove it, force the planner to > use an indexscan by doing SET ENABLE_SEQSCAN TO OFF. Then time the > query, and compare the runtime against the seqscan version. > > The bottom line here is that a query that needs to touch more than a > few percent of the rows in a table is better off being done as a > seqscan. The only other workaround is to CLUSTER the table on an index, then force an index scan. That _may_ be faster. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026