Thread: Why does it not use the index?
This one is irritating. Here's some psql output: ================================================================================ dumps=# \d fal_profdel Table "fal_profdel" Attribute | Type | Modifier -----------+--------------------------+---------- sid | character(4) | not null card_num | character(19) | not null date_del | timestamp with time zone | filename | character varying(30) | Indices: fal_prfdel_cn, fal_prfdel_date, fal_prfdel_pk dumps=# \d fal_prfdel_cn Index "fal_prfdel_cn" Attribute | Type -----------+--------------- card_num | character(19) unique btree dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy'; NOTICE: QUERY PLAN: Seq Scan on fal_profdel (cost=0.00..120546.39 rows=46649 width=12) EXPLAIN ================================================================================ Now, why the heck is the select query not using the index? I've tried it by having an exact 19 character card_num as well- still explains as a 'Seq Scan' (tablespace scan) - and each query takes up to 37 seconds (thus confirming that it isindeed doing scans and not using the index). I've tried dropping and re-creating the indexes, still it explains as tablespace scans. I am running postgresql 7.1.3 - a bit old, I know, but I have had no reason to upgrade just yet. By the way, the fal_profdel table has 4,664,867 rows in it currently - thus I really don't want full table scans! -- ----------------------------------------------------------------------------- PG.. philip@tildesoftware.com Law of probable dispersal: Whatever it is that hits the fan will not be evenly distributed.
On Mon, 21 Jul 2003, Philip Greer wrote: > dumps=# \d fal_profdel > Table "fal_profdel" > Attribute | Type | Modifier > -----------+--------------------------+---------- > sid | character(4) | not null > card_num | character(19) | not null > date_del | timestamp with time zone | > filename | character varying(30) | > Indices: fal_prfdel_cn, > fal_prfdel_date, > fal_prfdel_pk > > dumps=# \d fal_prfdel_cn > Index "fal_prfdel_cn" > Attribute | Type > -----------+--------------- > card_num | character(19) > unique btree > > dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy'; > NOTICE: QUERY PLAN: > > Seq Scan on fal_profdel (cost=0.00..120546.39 rows=46649 width=12) > > EXPLAIN > ================================================================================ > > Now, why the heck is the select query not using the index? I've tried > it by having an exact 19 character card_num as well - still explains > as a 'Seq Scan' (tablespace scan) - and each query takes up to 37 > seconds (thus confirming that it is indeed doing scans and not using > the index). Have you vacuum analyzed the table recently? What does explain show if you do set enable_seqscan=off; before the explain and then how long does the query actually take to run with seqscan disabled.
Thanks for the response: I took a look at the table with 'vacuum verbose analyze', here's the results: dumps=# vacuum verbose analyze fal_profdel; NOTICE: --Relation fal_profdel-- NOTICE: Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup 4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447, MinLen103, MaxLen 106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0. CPU 2.53s/0.58u sec. NOTICE: Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec. NOTICE: Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec. NOTICE: Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec. NOTICE: Analyzing... VACUUM Then - afterwards, I ran the explain again: dumps=# explain select card_num from fal_profdel where card_num = '4828820006970'; NOTICE: QUERY PLAN: Index Scan using fal_prfdel_cn on fal_profdel (cost=0.00..4.95 rows=1 width=12) EXPLAIN WTF? Why would a vacuum be necessary in order for it to start using the index? By the way, the actual query takes subseconds to return now. MUCH better. So - let me know why one would have to use vacuum in order for the scans to cease and index use begin. Is it a continualthing? Or does vacuum need to be done after a 'create index' in order for it to begin using the index? Thanks for your assistance, much apprecaited! On Mon, Jul 21, 2003 at 11:00:56AM -0700, Stephan Szabo filled up my inbox with the following: > On Mon, 21 Jul 2003, Philip Greer wrote: > > > dumps=# \d fal_profdel > > Table "fal_profdel" > > Attribute | Type | Modifier > > -----------+--------------------------+---------- > > sid | character(4) | not null > > card_num | character(19) | not null > > date_del | timestamp with time zone | > > filename | character varying(30) | > > Indices: fal_prfdel_cn, > > fal_prfdel_date, > > fal_prfdel_pk > > > > dumps=# \d fal_prfdel_cn > > Index "fal_prfdel_cn" > > Attribute | Type > > -----------+--------------- > > card_num | character(19) > > unique btree > > > > dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy'; > > NOTICE: QUERY PLAN: > > > > Seq Scan on fal_profdel (cost=0.00..120546.39 rows=46649 width=12) > > > > EXPLAIN > > ================================================================================ > > > > Now, why the heck is the select query not using the index? I've tried > > it by having an exact 19 character card_num as well - still explains > > as a 'Seq Scan' (tablespace scan) - and each query takes up to 37 > > seconds (thus confirming that it is indeed doing scans and not using > > the index). > > Have you vacuum analyzed the table recently? What does explain show if you > do set enable_seqscan=off; before the explain and then how long does the > query actually take to run with seqscan disabled. > -- ----------------------------------------------------------------------------- PG.. philip@tildesoftware.com Law of probable dispersal: Whatever it is that hits the fan will not be evenly distributed.
On Monday 21 July 2003 19:51, Philip Greer wrote: > Thanks for the response: > > I took a look at the table with 'vacuum verbose analyze', here's the > results: > > dumps=# vacuum verbose analyze fal_profdel; [snip] > Then - afterwards, I ran the explain again: > > dumps=# explain select card_num from fal_profdel where card_num = > '4828820006970'; NOTICE: QUERY PLAN: > > Index Scan using fal_prfdel_cn on fal_profdel (cost=0.00..4.95 rows=1 > width=12) [snip] > WTF? Why would a vacuum be necessary in order for it to start using the > index? It's not the vacuum - it's the analyse. That builds up statistics on the table in question so the planner knows how many rows there are, what the most common values are etc. That way it can make a "best guess" as to whether scanning the whole table or using the index will be faster. > So - let me know why one would have to use vacuum in order for the scans to > cease and index use begin. Is it a continual thing? Or does vacuum need to > be done after a 'create index' in order for it to begin using the index? You should vacuum to reclaim "deleted" space. You should analyse to update statistics on the table. They both tend to depend on the amount of activity you have. -- Richard Huxton Archonet Ltd
Philip Greer <philip@tildesoftware.com> writes: > WTF? Why would a vacuum be necessary in order for it to start using the index? It's not the VACUUM that's necessary; it's the ANALYZE. The query planner uses table statistics to make its decisions, and ANALYZE is what collects those statistics. Without an ANALYZE the planner will make default assumptions that are rarely correct. :) VACUUM should also be run regularly, of course. -Doug
On Mon, 21 Jul 2003, Philip Greer wrote: > Thanks for the response: > > I took a look at the table with 'vacuum verbose analyze', here's the results: > > dumps=# vacuum verbose analyze fal_profdel; > NOTICE: --Relation fal_profdel-- > NOTICE: Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup > 4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447, MinLen 103, MaxLen > 106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0. > CPU 2.53s/0.58u sec. > NOTICE: Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec. > NOTICE: Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec. > NOTICE: Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec. > NOTICE: Analyzing... > VACUUM > > Then - afterwards, I ran the explain again: > > dumps=# explain select card_num from fal_profdel where card_num = '4828820006970'; > NOTICE: QUERY PLAN: > > Index Scan using fal_prfdel_cn on fal_profdel (cost=0.00..4.95 rows=1 width=12) > > EXPLAIN > > > WTF? Why would a vacuum be necessary in order for it to start using the index? It was the analyze that was important for this probably. If you compare the explain above with the one from before, you'll notice that before it was estimating that around 46000 rows were going to be returned. If that were true (and there weren't clustering effects going on) it's possible that the sequence scan would actually have been faster than scanning the index. Analyze gave it hopefully more reasonable data for the estimate and so it's now guessing that 1 row is returned which is certainly better for the index scan. > So - let me know why one would have to use vacuum in order for the > scans to cease and index use begin. Is it a continual thing? Or does > vacuum need to be done after a 'create index' in order for it to begin > using the index? Well, if you do updates/deletes, vacuum is necessary to reclaim space, so you should probably do it on some scheduled basis for that purpose. A good reason to upgrade is that in 7.1 vacuum gets an exclusive lock whereas in recent versions it doesn't by default and you can analyze without a vacuum. Analyze is necessary to keep the statistics up to date and should also be run on a periodic basis (daily isn't bad).
Philip, In order for psql to work out the most effective way to run queries it relies on statistics about the size of your tables (amounst other things). Running VACUUM ANALYZE does two things: The vacuum removes any tuples you have deleted from the database (before this i think they are just marked as deleted) The analzye bit then rebuilds the statistics which the query planner uses - making your queries faster. If you are only ever adding tuples to the database then you don't need to Vacuum as often, although i think you still need to vacuum every X million queries. I made the same mistake last week... On Mon, 2003-07-21 at 19:51, Philip Greer wrote: > Thanks for the response: > > I took a look at the table with 'vacuum verbose analyze', here's the results: > > dumps=# vacuum verbose analyze fal_profdel; > NOTICE: --Relation fal_profdel-- > NOTICE: Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup 4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447, MinLen103, MaxLen 106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0. CPU 2.53s/0.58u sec. > NOTICE: Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec. > NOTICE: Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec. > NOTICE: Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec. > NOTICE: Analyzing... > VACUUM > > > Then - afterwards, I ran the explain again: > > dumps=# explain select card_num from fal_profdel where card_num = '4828820006970'; > NOTICE: QUERY PLAN: > > Index Scan using fal_prfdel_cn on fal_profdel (cost=0.00..4.95 rows=1 width=12) > > EXPLAIN > > > WTF? Why would a vacuum be necessary in order for it to start using the index? > > > By the way, the actual query takes subseconds to return now. MUCH better. > > > So - let me know why one would have to use vacuum in order for the scans to cease and index use begin. Is it a continualthing? Or does vacuum need to be done after a 'create index' in order for it to begin using the index? > > Thanks for your assistance, much apprecaited! > > > On Mon, Jul 21, 2003 at 11:00:56AM -0700, Stephan Szabo filled up my inbox with the following: > > On Mon, 21 Jul 2003, Philip Greer wrote: > > > > > dumps=# \d fal_profdel > > > Table "fal_profdel" > > > Attribute | Type | Modifier > > > -----------+--------------------------+---------- > > > sid | character(4) | not null > > > card_num | character(19) | not null > > > date_del | timestamp with time zone | > > > filename | character varying(30) | > > > Indices: fal_prfdel_cn, > > > fal_prfdel_date, > > > fal_prfdel_pk > > > > > > dumps=# \d fal_prfdel_cn > > > Index "fal_prfdel_cn" > > > Attribute | Type > > > -----------+--------------- > > > card_num | character(19) > > > unique btree > > > > > > dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy'; > > > NOTICE: QUERY PLAN: > > > > > > Seq Scan on fal_profdel (cost=0.00..120546.39 rows=46649 width=12) > > > > > > EXPLAIN > > > ================================================================================ > > > > > > Now, why the heck is the select query not using the index? I've tried > > > it by having an exact 19 character card_num as well - still explains > > > as a 'Seq Scan' (tablespace scan) - and each query takes up to 37 > > > seconds (thus confirming that it is indeed doing scans and not using > > > the index). > > > > Have you vacuum analyzed the table recently? What does explain show if you > > do set enable_seqscan=off; before the explain and then how long does the > > query actually take to run with seqscan disabled. > >
Doug McNaught wrote: > It's not the VACUUM that's necessary; it's the ANALYZE. The query > planner uses table statistics to make its decisions, and ANALYZE is > what collects those statistics. Without an ANALYZE the planner will > make default assumptions that are rarely correct. :) I am not the original poster, but I am a PG newbie, so: So - are you saying that if you have a table, and you create an index on that table, you need to perform an ANALYZE in order for PG to use the index. Otherwise, the index goes unused (or used improperly)? I am just trying to understand this for a database I have set up, where I set up a variety of indexes, assumming (wrongly?) that the indexes would be used once they were created. Andrew L. Ayers Phoenix, Arizona -- CONFIDENTIALITY NOTICE -- This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use, copy,disclose or distribute to anyone the message or any information contained in the message. If you have received thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you.
Andrew Ayers <aayers@eldocomp.com> writes: > Doug McNaught wrote: > > It's not the VACUUM that's necessary; it's the ANALYZE. The query > > planner uses table statistics to make its decisions, and ANALYZE is > > what collects those statistics. Without an ANALYZE the planner will > > make default assumptions that are rarely correct. :) > > I am not the original poster, but I am a PG newbie, so: > > So - are you saying that if you have a table, and you create an index on > that table, you need to perform an ANALYZE in order for PG to use the > index. Otherwise, the index goes unused (or used improperly)? Actually, the time to ANALYZE is after you make significant changes in the table (bulk load a bunch of data, delete a bunch of rows etc). If you create an index on a table with correct statistics, the planner should start using it without necessarily needing ANALYZE to be run. So run it by hand after making major data changes, and run it periodically depending on how fast your data changes in regular use. Clear? -Doug
Well duh. Crap. I remember that NOW that I've read everyones comments (thanks by the way!). I do remember reading that, andhave placed the 'vacumedb -z' command in crontabs on databases in the past. Just forgot all about that - I guess it isbecause most of the databases I've set up on postgresql get to the point where they just simply run, and run, and run,and run. A nice thing about postgresql - wrap it in some automated maintenance and it becomes so stable it is nearlyforgotten about! Vacuum analyze is one of those items where if you don't use it (manually) often, you lose it (andforget the reason why to use it). Thanks again! -- ----------------------------------------------------------------------------- PG.. philip@tildesoftware.com Law of probable dispersal: Whatever it is that hits the fan will not be evenly distributed.
> So - are you saying that if you have a table, and you create an index on > that table, you need to perform an ANALYZE in order for PG to use the > index. Otherwise, the index goes unused (or used improperly)? it is easy enough to demonstrate that creating an index will result in immediate improvements in query times. The internals wizards would have to answer the question as to whether 'create index' also creates the initial stats on that index, though. (If it doesn't, maybe that should be a configuration option.) However, you do need the stats to take best advantage of the index over time and a wide range of queries, so regularly scheduled 'vacuum analyze's are desirable. What I don't know is whether there is a way pass along hints to the optimizer or to write a query to FORCE it to use an index on some part of a query despite what the optimizer decides. -- Mike Nolan