Thread: Index not being used unless enable_seqscan=false

Index not being used unless enable_seqscan=false

From

Shane

Date:

10 August 2005, 16:01:40

Hello all,

I am working with a simple table and query abut cannot seem
to get it to use the index I have created.  However, if I
set enable_seqscan=false, the index is used and the query
is much faster.  I have tried a vacuum analyze but to no
avail.

Table layout:
                  Table "public.seen"
  Column  |              Type              | Modifiers
----------+--------------------------------+-----------
 group_id | integer                        | not null
 msgid    | text                           | not null
 msgtime  | timestamp(0) without time zone | not null
Indexes:
    "seen_group_id_key" unique, btree (group_id, msgid)
    "seen_msgtime" btree (msgtime)
Foreign-key constraints:
    "$1" FOREIGN KEY (group_id) REFERENCES groups(id) ON UPDATE CASCADE ON DELETE CASCADE

explain analyze with enable_seqscan=true
explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time
zone);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on seen  (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1)
   Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
 Total runtime: 27096.337 ms
(3 rows)

Same query with enable_seqscan=false
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using seen_msgtime on seen  (cost=0.00..3818325.78 rows=1081044 width=46) (actual time=0.140..156.222
rows=28907loops=1) 
   Index Cond: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
 Total runtime: 248.737 ms
(3 rows)

Any ideas on how I can fix this.  I get this problem now
and again with other databases but a vacuum usually fixes
it.

Thanks,
Shane

Re: Index not being used unless enable_seqscan=false

From

Sven Willenberger

Date:

10 August 2005, 16:30:50

On Wed, 2005-08-10 at 12:01 -0700, Shane wrote:
> Hello all,
>
> I am working with a simple table and query abut cannot seem
> to get it to use the index I have created.  However, if I
> set enable_seqscan=false, the index is used and the query
> is much faster.  I have tried a vacuum analyze but to no
> avail.
>
> Table layout:
>                   Table "public.seen"
>   Column  |              Type              | Modifiers
> ----------+--------------------------------+-----------
>  group_id | integer                        | not null
>  msgid    | text                           | not null
>  msgtime  | timestamp(0) without time zone | not null
> Indexes:
>     "seen_group_id_key" unique, btree (group_id, msgid)
>     "seen_msgtime" btree (msgtime)
> Foreign-key constraints:
>     "$1" FOREIGN KEY (group_id) REFERENCES groups(id) ON UPDATE CASCADE ON DELETE CASCADE
>
> explain analyze with enable_seqscan=true
> explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time
zone);
>                                                      QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>  Seq Scan on seen  (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1)
>    Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
>  Total runtime: 27096.337 ms
> (3 rows)
>
> Same query with enable_seqscan=false
>                                                               QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using seen_msgtime on seen  (cost=0.00..3818325.78 rows=1081044 width=46) (actual time=0.140..156.222
rows=28907loops=1) 
>    Index Cond: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
>  Total runtime: 248.737 ms
> (3 rows)
>
> Any ideas on how I can fix this.  I get this problem now
> and again with other databases but a vacuum usually fixes
> it.

Right off the bat (if I am interpreting the results of your explain
analyze correctly) it looks like the planner is basing its decision to
seqscan as it thinks that it needs to filter over 1 million rows (versus
the 29,000 rows that actually are pulled). Perhaps increasing stats on
msgtime and then analyzing the table may help. Depending on your
hardware, decreasing random_page_cost in your postgresql.conf just a
touch may help too.

Sven

Re: Index not being used unless enable_seqscan=false

From

Ragnar Hafstað

Date:

10 August 2005, 16:47:26

On Wed, 2005-08-10 at 12:01 -0700, Shane wrote:
> Hello all,
>
> I am working with a simple table and query abut cannot seem
> to get it to use the index I have created.  However, if I
> set enable_seqscan=false, the index is used and the query
> is much faster.  I have tried a vacuum analyze but to no
> avail.

[snip]
> explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time
zone);
>                                                      QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>  Seq Scan on seen  (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1)
>    Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
>  Total runtime: 27096.337 ms
> (3 rows)

> Same query with enable_seqscan=false
[snip faster plan]

>
> Any ideas on how I can fix this.  I get this problem now
> and again with other databases but a vacuum usually fixes
> it.

The planner is not very good at estimating selectivity of
single unequalities. If you can specify a range in the
where clause, you might possibly have better luck.
...WHERE  msgtime < cast(now() - interval '6 months'
               as timestamp(0) without time zone
         AND msgtime >= '2000-01-01'

Also, you might want to try to increase the STATISTICS target
of msgtime.

Sometimes an ORDER BY clause can help the planner on choosing indexscan,
although in this case the difference in estimated cost is so high that
I doubt it.



gnari

Re: Index not being used unless enable_seqscan=false

From

Shane

Date:

10 August 2005, 16:58:30

On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:
> Right off the bat (if I am interpreting the results of your explain
> analyze correctly) it looks like the planner is basing its decision to
> seqscan as it thinks that it needs to filter over 1 million rows (versus
> the 29,000 rows that actually are pulled). Perhaps increasing stats on
> msgtime and then analyzing the table may help. Depending on your
> hardware, decreasing random_page_cost in your postgresql.conf just a
> touch may help too.

Thanks for the pointers.

I tried increasing the stats from the default of 10 to 25
with no change.  How high would you bring it?  Also, I've
never played with the various cost variables.  The database
sits on a raid5 partition composed of 4 15k u320 SCSI
drives, dual xeon 2.8(ht enabled) 2gb ram.  I suppose this
might actually increase the cost of fetching a random disk
page as it may well be on another physical disk and
wouldn't be in the readahead cache.  Any idea as to what it
should be on this sort of system?

Re: Index not being used unless enable_seqscan=false

From

Sven Willenberger

Date:

10 August 2005, 17:23:57

On Wed, 2005-08-10 at 12:58 -0700, Shane wrote:
> On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:
> > Right off the bat (if I am interpreting the results of your explain
> > analyze correctly) it looks like the planner is basing its decision to
> > seqscan as it thinks that it needs to filter over 1 million rows (versus
> > the 29,000 rows that actually are pulled). Perhaps increasing stats on
> > msgtime and then analyzing the table may help. Depending on your
> > hardware, decreasing random_page_cost in your postgresql.conf just a
> > touch may help too.
>
> Thanks for the pointers.
>
> I tried increasing the stats from the default of 10 to 25
> with no change.  How high would you bring it?  Also, I've
> never played with the various cost variables.  The database
> sits on a raid5 partition composed of 4 15k u320 SCSI
> drives, dual xeon 2.8(ht enabled) 2gb ram.  I suppose this
> might actually increase the cost of fetching a random disk
> page as it may well be on another physical disk and
> wouldn't be in the readahead cache.  Any idea as to what it
> should be on this sort of system?
>
>
> ---------------------------(end of broadcast)---------------------------

Try increasing stats to 100 on just the msgtime column, not the default
(changing the default will only have an effect on newly created columns
-- you may want to change the default back to 10):

ALTER TABLE seen ALTER msgtime SET STATISTICS 100;

After running that command, analyze the table again and see if that
helps.

I am assuming the culprit is this particular column as your index and
search criteria is based on that one. The default random_page_cost I
believe is 4.0; on your system you could probably easily drop it to 3,
possibly lower, and see how that performs.

Sven

Re: Index not being used unless enable_seqscan=false

From

Shane

Date:

10 August 2005, 17:32:13

On Wed, Aug 10, 2005 at 04:24:51PM -0400, Sven Willenberger wrote:
> On Wed, 2005-08-10 at 12:58 -0700, Shane wrote:
> > On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:
> > > Right off the bat (if I am interpreting the results of your explain
> > > analyze correctly) it looks like the planner is basing its decision to
> > > seqscan as it thinks that it needs to filter over 1 million rows (versus
> > > the 29,000 rows that actually are pulled). Perhaps increasing stats on
> > > msgtime and then analyzing the table may help. Depending on your
> > > hardware, decreasing random_page_cost in your postgresql.conf just a
> > > touch may help too.
>
> Try increasing stats to 100 on just the msgtime column, not the default
> (changing the default will only have an effect on newly created columns
> -- you may want to change the default back to 10):

Hi,

I brought the statistics on msgtime up to 100, vacuum
analyzed and brought random_page_cost down to 2.
Unfortunately, explain analyze still wants to seqscan and
estimates 1m returned rows.

Is there a way to simply force an index usage for this
particular query?

S

Re: Index not being used unless enable_seqscan=false

From

Sven Willenberger

Date:

10 August 2005, 18:01:49

On Wed, 2005-08-10 at 13:31 -0700, Shane wrote:
> On Wed, Aug 10, 2005 at 04:24:51PM -0400, Sven Willenberger wrote:
> > On Wed, 2005-08-10 at 12:58 -0700, Shane wrote:
> > > On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:
> > > > Right off the bat (if I am interpreting the results of your explain
> > > > analyze correctly) it looks like the planner is basing its decision to
> > > > seqscan as it thinks that it needs to filter over 1 million rows (versus
> > > > the 29,000 rows that actually are pulled). Perhaps increasing stats on
> > > > msgtime and then analyzing the table may help. Depending on your
> > > > hardware, decreasing random_page_cost in your postgresql.conf just a
> > > > touch may help too.
> >
> > Try increasing stats to 100 on just the msgtime column, not the default
> > (changing the default will only have an effect on newly created columns
> > -- you may want to change the default back to 10):
>
> Hi,
>
> I brought the statistics on msgtime up to 100, vacuum
> analyzed and brought random_page_cost down to 2.
> Unfortunately, explain analyze still wants to seqscan and
> estimates 1m returned rows.
>
> Is there a way to simply force an index usage for this
> particular query?
>
> S
>
What version of PostgreSQL are you running? Also, what happens if you
explain analyze choosing where msgtime > cast(now() - interval '6
months' as timestamp(0) without time zone); (instead of less than).
Depending on how you are connecting to run this query (script, webpage,
psql) you could always set enable_seq_scan=off; select ....; set
enable_seq_scan=on; scriptomagically.

Sven

Re: Index not being used unless enable_seqscan=false

From

Tom Lane

Date:

11 August 2005, 01:12:26

Shane <shane-pgsql@cm.nu> writes:
> I am working with a simple table and query abut cannot seem
> to get it to use the index I have created.
> ...
> explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time
zone);

As some other people already pointed out, the problem is the horrible
misestimate of the number of matching rows.  You did not say your
Postgres version, but I'm betting it's pre-8.0.  Versions before 8.0
would not assume that they could get any useful statistical info from
an expression involving now() (or in general, any non-immutable
function).  The default assumption in such cases is that a lot of
rows are retrieved --- too many for an indexscan.

If you cannot update to 8.0.* at the moment, a workaround is to do the
timestamp calculation on the client side so that you can send over
a query that's just a comparison to a constant:

    ... where msgtime < '2005-02-14 ...'::timestamp;

            regards, tom lane

Re: Index not being used unless enable_seqscan=false

From

Shane

Date:

11 August 2005, 16:33:31

On Thu, Aug 11, 2005 at 12:10:33AM -0400, Tom Lane wrote:
> Shane <shane-pgsql@cm.nu> writes:
> > I am working with a simple table and query abut cannot seem
> > to get it to use the index I have created.
> > ...
> > explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without
timezone); 
>
> As some other people already pointed out, the problem is the horrible
> misestimate of the number of matching rows.  You did not say your
> Postgres version, but I'm betting it's pre-8.0.  Versions before 8.0
> would not assume that they could get any useful statistical info from
> an expression involving now() (or in general, any non-immutable
> function).  The default assumption in such cases is that a lot of
> rows are retrieved --- too many for an indexscan.

Hi Tom,

The version being used is 7.4.7.  However, as an
experiment, I ran pg8 on a different port and loaded the
dataset.  It needed a vacuum analyze but after that pg8 was
using the index and got the row estimate correct.
Thanks for the pointer,
Shane