Thread: Why does it not use the index?

Why does it not use the index?

From
Philip Greer
Date:
This one is irritating. Here's some psql output:

================================================================================
dumps=# \d fal_profdel
               Table "fal_profdel"
 Attribute |           Type           | Modifier
-----------+--------------------------+----------
 sid       | character(4)             | not null
 card_num  | character(19)            | not null
 date_del  | timestamp with time zone |
 filename  | character varying(30)    |
Indices: fal_prfdel_cn,
         fal_prfdel_date,
         fal_prfdel_pk

dumps=# \d fal_prfdel_cn
   Index "fal_prfdel_cn"
 Attribute |     Type
-----------+---------------
 card_num  | character(19)
unique btree

dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy';
NOTICE:  QUERY PLAN:

Seq Scan on fal_profdel  (cost=0.00..120546.39 rows=46649 width=12)

EXPLAIN
================================================================================

Now, why the heck is the select query not using the index? I've tried it by having an exact 19 character card_num as
well- still explains as a 'Seq Scan' (tablespace scan) - and each query takes up to 37 seconds (thus confirming that it
isindeed doing scans and not using the index). 

I've tried dropping and re-creating the indexes, still it explains as tablespace scans.

I am running postgresql 7.1.3 - a bit old, I know, but I have had no reason to upgrade just yet.

By the way, the fal_profdel table has 4,664,867 rows in it currently - thus I really don't want full table scans!

--
-----------------------------------------------------------------------------
PG..                                                 philip@tildesoftware.com
Law of probable dispersal: Whatever it is that hits the fan will not be
evenly distributed.

Re: Why does it not use the index?

From
Stephan Szabo
Date:
On Mon, 21 Jul 2003, Philip Greer wrote:

> dumps=# \d fal_profdel
>                Table "fal_profdel"
>  Attribute |           Type           | Modifier
> -----------+--------------------------+----------
>  sid       | character(4)             | not null
>  card_num  | character(19)            | not null
>  date_del  | timestamp with time zone |
>  filename  | character varying(30)    |
> Indices: fal_prfdel_cn,
>          fal_prfdel_date,
>          fal_prfdel_pk
>
> dumps=# \d fal_prfdel_cn
>    Index "fal_prfdel_cn"
>  Attribute |     Type
> -----------+---------------
>  card_num  | character(19)
> unique btree
>
> dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on fal_profdel  (cost=0.00..120546.39 rows=46649 width=12)
>
> EXPLAIN
> ================================================================================
>
> Now, why the heck is the select query not using the index? I've tried
> it by having an exact 19 character card_num as well - still explains
> as a 'Seq Scan' (tablespace scan) - and each query takes up to 37
> seconds (thus confirming that it is indeed doing scans and not using
> the index).

Have you vacuum analyzed the table recently? What does explain show if you
do set enable_seqscan=off; before the explain and then how long does the
query actually take to run with seqscan disabled.


Re: Why does it not use the index?

From
Philip Greer
Date:
Thanks for the response:

I took a look at the table with 'vacuum verbose analyze', here's the results:

dumps=# vacuum verbose analyze fal_profdel;
NOTICE:  --Relation fal_profdel--
NOTICE:  Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup 4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447,
MinLen103, MaxLen 106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0. CPU 2.53s/0.58u sec. 
NOTICE:  Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec.
NOTICE:  Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec.
NOTICE:  Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec.
NOTICE:  Analyzing...
VACUUM


Then - afterwards, I ran the explain again:

dumps=# explain select card_num from fal_profdel where card_num = '4828820006970';
NOTICE:  QUERY PLAN:

Index Scan using fal_prfdel_cn on fal_profdel  (cost=0.00..4.95 rows=1 width=12)

EXPLAIN


WTF? Why would a vacuum be necessary in order for it to start using the index?


By the way, the actual query takes subseconds to return now. MUCH better.


So - let me know why one would have to use vacuum in order for the scans to cease and index use begin. Is it a
continualthing? Or does vacuum need to be done after a 'create index' in order for it to begin using the index? 

Thanks for your assistance, much apprecaited!


On Mon, Jul 21, 2003 at 11:00:56AM -0700, Stephan Szabo filled up my inbox with the following:
> On Mon, 21 Jul 2003, Philip Greer wrote:
>
> > dumps=# \d fal_profdel
> >                Table "fal_profdel"
> >  Attribute |           Type           | Modifier
> > -----------+--------------------------+----------
> >  sid       | character(4)             | not null
> >  card_num  | character(19)            | not null
> >  date_del  | timestamp with time zone |
> >  filename  | character varying(30)    |
> > Indices: fal_prfdel_cn,
> >          fal_prfdel_date,
> >          fal_prfdel_pk
> >
> > dumps=# \d fal_prfdel_cn
> >    Index "fal_prfdel_cn"
> >  Attribute |     Type
> > -----------+---------------
> >  card_num  | character(19)
> > unique btree
> >
> > dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy';
> > NOTICE:  QUERY PLAN:
> >
> > Seq Scan on fal_profdel  (cost=0.00..120546.39 rows=46649 width=12)
> >
> > EXPLAIN
> > ================================================================================
> >
> > Now, why the heck is the select query not using the index? I've tried
> > it by having an exact 19 character card_num as well - still explains
> > as a 'Seq Scan' (tablespace scan) - and each query takes up to 37
> > seconds (thus confirming that it is indeed doing scans and not using
> > the index).
>
> Have you vacuum analyzed the table recently? What does explain show if you
> do set enable_seqscan=off; before the explain and then how long does the
> query actually take to run with seqscan disabled.
>

--
-----------------------------------------------------------------------------
PG..                                                 philip@tildesoftware.com
Law of probable dispersal: Whatever it is that hits the fan will not be
evenly distributed.

Re: Why does it not use the index?

From
Richard Huxton
Date:
On Monday 21 July 2003 19:51, Philip Greer wrote:
> Thanks for the response:
>
> I took a look at the table with 'vacuum verbose analyze', here's the
> results:
>
> dumps=# vacuum verbose analyze fal_profdel;
[snip]
> Then - afterwards, I ran the explain again:
>
> dumps=# explain select card_num from fal_profdel where card_num =
> '4828820006970'; NOTICE:  QUERY PLAN:
>
> Index Scan using fal_prfdel_cn on fal_profdel  (cost=0.00..4.95 rows=1
> width=12)
[snip]
> WTF? Why would a vacuum be necessary in order for it to start using the
> index?

It's not the vacuum - it's the analyse. That builds up statistics on the table
in question so the planner knows how many rows there are, what the most
common values are etc. That way it can make a "best guess" as to whether
scanning the whole table or using the index will be faster.

> So - let me know why one would have to use vacuum in order for the scans to
> cease and index use begin. Is it a continual thing? Or does vacuum need to
> be done after a 'create index' in order for it to begin using the index?

You should vacuum to reclaim "deleted" space. You should analyse to update
statistics on the table. They both tend to depend on the amount of activity
you have.

--
  Richard Huxton
  Archonet Ltd

Re: Why does it not use the index?

From
Doug McNaught
Date:
Philip Greer <philip@tildesoftware.com> writes:

> WTF? Why would a vacuum be necessary in order for it to start using the index?

It's not the VACUUM that's necessary; it's the ANALYZE.  The query
planner uses table statistics to make its decisions, and ANALYZE is
what collects those statistics.  Without an ANALYZE the planner will
make default assumptions that are rarely correct.  :)

VACUUM should also be run regularly, of course.

-Doug

Re: Why does it not use the index?

From
Stephan Szabo
Date:
On Mon, 21 Jul 2003, Philip Greer wrote:

> Thanks for the response:
>
> I took a look at the table with 'vacuum verbose analyze', here's the results:
>
> dumps=# vacuum verbose analyze fal_profdel;
> NOTICE:  --Relation fal_profdel--
> NOTICE:  Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup
> 4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447, MinLen 103, MaxLen
> 106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0.
> CPU 2.53s/0.58u sec.
> NOTICE:  Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec.
> NOTICE:  Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec.
> NOTICE:  Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec.
> NOTICE:  Analyzing...
> VACUUM
>
> Then - afterwards, I ran the explain again:
>
> dumps=# explain select card_num from fal_profdel where card_num = '4828820006970';
> NOTICE:  QUERY PLAN:
>
> Index Scan using fal_prfdel_cn on fal_profdel  (cost=0.00..4.95 rows=1 width=12)
>
> EXPLAIN
>
>
> WTF? Why would a vacuum be necessary in order for it to start using the index?

It was the analyze that was important for this probably.  If you compare
the explain above with the one from before, you'll notice that before it
was estimating that around 46000 rows were going to be returned.  If that
were true (and there weren't clustering effects going on) it's possible
that the sequence scan would actually have been faster than scanning the
index.  Analyze gave it hopefully more reasonable data for the estimate
and so it's now guessing that 1 row is returned which is certainly better
for the index scan.

> So - let me know why one would have to use vacuum in order for the
> scans to cease and index use begin. Is it a continual thing? Or does
> vacuum need to be done after a 'create index' in order for it to begin
> using the index?

Well, if you do updates/deletes, vacuum is necessary to reclaim space, so
you should probably do it on some scheduled basis for that purpose.  A
good reason to upgrade is that in 7.1 vacuum gets an exclusive lock
whereas in recent versions it doesn't by default and you can analyze
without a vacuum.  Analyze is necessary to keep the statistics up to date
and should also be run on a periodic basis (daily isn't bad).




Re: Why does it not use the index?

From
Mat
Date:
Philip,
In order for psql to work out the most effective way to run queries it
relies on statistics about the size of your tables (amounst other
things).
Running VACUUM ANALYZE does two things:
The vacuum removes any tuples you have deleted from the database (before
this i think they are just marked as deleted)
The analzye bit then rebuilds the statistics which the query planner
uses - making your queries faster.

If you are only ever adding tuples to the database then you don't need
to Vacuum as often, although i think you still need to vacuum every X
million queries.

I made the same mistake last week...


On Mon, 2003-07-21 at 19:51, Philip Greer wrote:
> Thanks for the response:
>
> I took a look at the table with 'vacuum verbose analyze', here's the results:
>
> dumps=# vacuum verbose analyze fal_profdel;
> NOTICE:  --Relation fal_profdel--
> NOTICE:  Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup 4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447,
MinLen103, MaxLen 106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0. CPU 2.53s/0.58u sec. 
> NOTICE:  Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec.
> NOTICE:  Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec.
> NOTICE:  Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec.
> NOTICE:  Analyzing...
> VACUUM
>
>
> Then - afterwards, I ran the explain again:
>
> dumps=# explain select card_num from fal_profdel where card_num = '4828820006970';
> NOTICE:  QUERY PLAN:
>
> Index Scan using fal_prfdel_cn on fal_profdel  (cost=0.00..4.95 rows=1 width=12)
>
> EXPLAIN
>
>
> WTF? Why would a vacuum be necessary in order for it to start using the index?
>
>
> By the way, the actual query takes subseconds to return now. MUCH better.
>
>
> So - let me know why one would have to use vacuum in order for the scans to cease and index use begin. Is it a
continualthing? Or does vacuum need to be done after a 'create index' in order for it to begin using the index? 
>
> Thanks for your assistance, much apprecaited!
>
>
> On Mon, Jul 21, 2003 at 11:00:56AM -0700, Stephan Szabo filled up my inbox with the following:
> > On Mon, 21 Jul 2003, Philip Greer wrote:
> >
> > > dumps=# \d fal_profdel
> > >                Table "fal_profdel"
> > >  Attribute |           Type           | Modifier
> > > -----------+--------------------------+----------
> > >  sid       | character(4)             | not null
> > >  card_num  | character(19)            | not null
> > >  date_del  | timestamp with time zone |
> > >  filename  | character varying(30)    |
> > > Indices: fal_prfdel_cn,
> > >          fal_prfdel_date,
> > >          fal_prfdel_pk
> > >
> > > dumps=# \d fal_prfdel_cn
> > >    Index "fal_prfdel_cn"
> > >  Attribute |     Type
> > > -----------+---------------
> > >  card_num  | character(19)
> > > unique btree
> > >
> > > dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy';
> > > NOTICE:  QUERY PLAN:
> > >
> > > Seq Scan on fal_profdel  (cost=0.00..120546.39 rows=46649 width=12)
> > >
> > > EXPLAIN
> > > ================================================================================
> > >
> > > Now, why the heck is the select query not using the index? I've tried
> > > it by having an exact 19 character card_num as well - still explains
> > > as a 'Seq Scan' (tablespace scan) - and each query takes up to 37
> > > seconds (thus confirming that it is indeed doing scans and not using
> > > the index).
> >
> > Have you vacuum analyzed the table recently? What does explain show if you
> > do set enable_seqscan=off; before the explain and then how long does the
> > query actually take to run with seqscan disabled.
> >


Re: Why does it not use the index?

From
Andrew Ayers
Date:
Doug McNaught wrote:
> It's not the VACUUM that's necessary; it's the ANALYZE.  The query
> planner uses table statistics to make its decisions, and ANALYZE is
> what collects those statistics.  Without an ANALYZE the planner will
> make default assumptions that are rarely correct.  :)

I am not the original poster, but I am a PG newbie, so:

So - are you saying that if you have a table, and you create an index on
that table, you need to perform an ANALYZE in order for PG to use the
index. Otherwise, the index goes unused (or used improperly)?

I am just trying to understand this for a database I have set up, where
I set up a variety of indexes, assumming (wrongly?) that the indexes
would be used once they were created.

Andrew L. Ayers
Phoenix, Arizona

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain
informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the
intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use,
copy,disclose or distribute to anyone the message or any information contained in the message. If you have received
thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you. 

Re: Why does it not use the index?

From
Doug McNaught
Date:
Andrew Ayers <aayers@eldocomp.com> writes:

> Doug McNaught wrote:
> > It's not the VACUUM that's necessary; it's the ANALYZE.  The query
> > planner uses table statistics to make its decisions, and ANALYZE is
> > what collects those statistics.  Without an ANALYZE the planner will
> > make default assumptions that are rarely correct.  :)
>
> I am not the original poster, but I am a PG newbie, so:
>
> So - are you saying that if you have a table, and you create an index on
> that table, you need to perform an ANALYZE in order for PG to use the
> index. Otherwise, the index goes unused (or used improperly)?

Actually, the time to ANALYZE is after you make significant changes in
the table (bulk load a bunch of data, delete a bunch of rows etc).  If
you create an index on a table with correct statistics, the planner
should start using it without necessarily needing ANALYZE to be run.

So run it by hand after making major data changes, and run it
periodically depending on how fast your data changes in regular use.

Clear?

-Doug

Re: Why does it not use the index?

From
Philip Greer
Date:
Well duh. Crap. I remember that NOW that I've read everyones comments (thanks by the way!). I do remember reading that,
andhave placed the 'vacumedb -z' command in crontabs on databases in the past. Just forgot all about that - I guess it
isbecause most of the databases I've set up on postgresql get to the point where they just simply run, and run, and
run,and run. A nice thing about postgresql - wrap it in some automated maintenance and it becomes so stable it is
nearlyforgotten about! Vacuum analyze is one of those items where if you don't use it (manually) often, you lose it
(andforget the reason why to use it). 

Thanks again!

--
-----------------------------------------------------------------------------
PG..                                                 philip@tildesoftware.com
Law of probable dispersal: Whatever it is that hits the fan will not be
evenly distributed.

Re: Why does it not use the index?

From
nolan@celery.tssi.com
Date:
> So - are you saying that if you have a table, and you create an index on
> that table, you need to perform an ANALYZE in order for PG to use the
> index. Otherwise, the index goes unused (or used improperly)?

it is easy enough to demonstrate that creating an index will result
in immediate improvements in query times.  The internals wizards would
have to answer the question as to whether 'create index' also creates
the initial stats on that index, though.  (If it doesn't, maybe that
should be a configuration option.)

However, you do need the stats to take best advantage of the index over
time and a wide range of queries, so regularly scheduled 'vacuum analyze's
are desirable.

What I don't know is whether there is a way pass along hints to the
optimizer or to write a query to FORCE it to use an index on some part
of a query despite what the optimizer decides.
--
Mike Nolan