Thread: Index not being used unless enable_seqscan=false
Hello all, I am working with a simple table and query abut cannot seem to get it to use the index I have created. However, if I set enable_seqscan=false, the index is used and the query is much faster. I have tried a vacuum analyze but to no avail. Table layout: Table "public.seen" Column | Type | Modifiers ----------+--------------------------------+----------- group_id | integer | not null msgid | text | not null msgtime | timestamp(0) without time zone | not null Indexes: "seen_group_id_key" unique, btree (group_id, msgid) "seen_msgtime" btree (msgtime) Foreign-key constraints: "$1" FOREIGN KEY (group_id) REFERENCES groups(id) ON UPDATE CASCADE ON DELETE CASCADE explain analyze with enable_seqscan=true explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan on seen (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1) Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone) Total runtime: 27096.337 ms (3 rows) Same query with enable_seqscan=false QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Index Scan using seen_msgtime on seen (cost=0.00..3818325.78 rows=1081044 width=46) (actual time=0.140..156.222 rows=28907loops=1) Index Cond: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone) Total runtime: 248.737 ms (3 rows) Any ideas on how I can fix this. I get this problem now and again with other databases but a vacuum usually fixes it. Thanks, Shane
On Wed, 2005-08-10 at 12:01 -0700, Shane wrote: > Hello all, > > I am working with a simple table and query abut cannot seem > to get it to use the index I have created. However, if I > set enable_seqscan=false, the index is used and the query > is much faster. I have tried a vacuum analyze but to no > avail. > > Table layout: > Table "public.seen" > Column | Type | Modifiers > ----------+--------------------------------+----------- > group_id | integer | not null > msgid | text | not null > msgtime | timestamp(0) without time zone | not null > Indexes: > "seen_group_id_key" unique, btree (group_id, msgid) > "seen_msgtime" btree (msgtime) > Foreign-key constraints: > "$1" FOREIGN KEY (group_id) REFERENCES groups(id) ON UPDATE CASCADE ON DELETE CASCADE > > explain analyze with enable_seqscan=true > explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone); > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------- > Seq Scan on seen (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1) > Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone) > Total runtime: 27096.337 ms > (3 rows) > > Same query with enable_seqscan=false > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using seen_msgtime on seen (cost=0.00..3818325.78 rows=1081044 width=46) (actual time=0.140..156.222 rows=28907loops=1) > Index Cond: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone) > Total runtime: 248.737 ms > (3 rows) > > Any ideas on how I can fix this. I get this problem now > and again with other databases but a vacuum usually fixes > it. Right off the bat (if I am interpreting the results of your explain analyze correctly) it looks like the planner is basing its decision to seqscan as it thinks that it needs to filter over 1 million rows (versus the 29,000 rows that actually are pulled). Perhaps increasing stats on msgtime and then analyzing the table may help. Depending on your hardware, decreasing random_page_cost in your postgresql.conf just a touch may help too. Sven
On Wed, 2005-08-10 at 12:01 -0700, Shane wrote: > Hello all, > > I am working with a simple table and query abut cannot seem > to get it to use the index I have created. However, if I > set enable_seqscan=false, the index is used and the query > is much faster. I have tried a vacuum analyze but to no > avail. [snip] > explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone); > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------- > Seq Scan on seen (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1) > Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone) > Total runtime: 27096.337 ms > (3 rows) > Same query with enable_seqscan=false [snip faster plan] > > Any ideas on how I can fix this. I get this problem now > and again with other databases but a vacuum usually fixes > it. The planner is not very good at estimating selectivity of single unequalities. If you can specify a range in the where clause, you might possibly have better luck. ...WHERE msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone AND msgtime >= '2000-01-01' Also, you might want to try to increase the STATISTICS target of msgtime. Sometimes an ORDER BY clause can help the planner on choosing indexscan, although in this case the difference in estimated cost is so high that I doubt it. gnari
On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote: > Right off the bat (if I am interpreting the results of your explain > analyze correctly) it looks like the planner is basing its decision to > seqscan as it thinks that it needs to filter over 1 million rows (versus > the 29,000 rows that actually are pulled). Perhaps increasing stats on > msgtime and then analyzing the table may help. Depending on your > hardware, decreasing random_page_cost in your postgresql.conf just a > touch may help too. Thanks for the pointers. I tried increasing the stats from the default of 10 to 25 with no change. How high would you bring it? Also, I've never played with the various cost variables. The database sits on a raid5 partition composed of 4 15k u320 SCSI drives, dual xeon 2.8(ht enabled) 2gb ram. I suppose this might actually increase the cost of fetching a random disk page as it may well be on another physical disk and wouldn't be in the readahead cache. Any idea as to what it should be on this sort of system?
On Wed, 2005-08-10 at 12:58 -0700, Shane wrote: > On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote: > > Right off the bat (if I am interpreting the results of your explain > > analyze correctly) it looks like the planner is basing its decision to > > seqscan as it thinks that it needs to filter over 1 million rows (versus > > the 29,000 rows that actually are pulled). Perhaps increasing stats on > > msgtime and then analyzing the table may help. Depending on your > > hardware, decreasing random_page_cost in your postgresql.conf just a > > touch may help too. > > Thanks for the pointers. > > I tried increasing the stats from the default of 10 to 25 > with no change. How high would you bring it? Also, I've > never played with the various cost variables. The database > sits on a raid5 partition composed of 4 15k u320 SCSI > drives, dual xeon 2.8(ht enabled) 2gb ram. I suppose this > might actually increase the cost of fetching a random disk > page as it may well be on another physical disk and > wouldn't be in the readahead cache. Any idea as to what it > should be on this sort of system? > > > ---------------------------(end of broadcast)--------------------------- Try increasing stats to 100 on just the msgtime column, not the default (changing the default will only have an effect on newly created columns -- you may want to change the default back to 10): ALTER TABLE seen ALTER msgtime SET STATISTICS 100; After running that command, analyze the table again and see if that helps. I am assuming the culprit is this particular column as your index and search criteria is based on that one. The default random_page_cost I believe is 4.0; on your system you could probably easily drop it to 3, possibly lower, and see how that performs. Sven
On Wed, Aug 10, 2005 at 04:24:51PM -0400, Sven Willenberger wrote: > On Wed, 2005-08-10 at 12:58 -0700, Shane wrote: > > On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote: > > > Right off the bat (if I am interpreting the results of your explain > > > analyze correctly) it looks like the planner is basing its decision to > > > seqscan as it thinks that it needs to filter over 1 million rows (versus > > > the 29,000 rows that actually are pulled). Perhaps increasing stats on > > > msgtime and then analyzing the table may help. Depending on your > > > hardware, decreasing random_page_cost in your postgresql.conf just a > > > touch may help too. > > Try increasing stats to 100 on just the msgtime column, not the default > (changing the default will only have an effect on newly created columns > -- you may want to change the default back to 10): Hi, I brought the statistics on msgtime up to 100, vacuum analyzed and brought random_page_cost down to 2. Unfortunately, explain analyze still wants to seqscan and estimates 1m returned rows. Is there a way to simply force an index usage for this particular query? S
On Wed, 2005-08-10 at 13:31 -0700, Shane wrote: > On Wed, Aug 10, 2005 at 04:24:51PM -0400, Sven Willenberger wrote: > > On Wed, 2005-08-10 at 12:58 -0700, Shane wrote: > > > On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote: > > > > Right off the bat (if I am interpreting the results of your explain > > > > analyze correctly) it looks like the planner is basing its decision to > > > > seqscan as it thinks that it needs to filter over 1 million rows (versus > > > > the 29,000 rows that actually are pulled). Perhaps increasing stats on > > > > msgtime and then analyzing the table may help. Depending on your > > > > hardware, decreasing random_page_cost in your postgresql.conf just a > > > > touch may help too. > > > > Try increasing stats to 100 on just the msgtime column, not the default > > (changing the default will only have an effect on newly created columns > > -- you may want to change the default back to 10): > > Hi, > > I brought the statistics on msgtime up to 100, vacuum > analyzed and brought random_page_cost down to 2. > Unfortunately, explain analyze still wants to seqscan and > estimates 1m returned rows. > > Is there a way to simply force an index usage for this > particular query? > > S > What version of PostgreSQL are you running? Also, what happens if you explain analyze choosing where msgtime > cast(now() - interval '6 months' as timestamp(0) without time zone); (instead of less than). Depending on how you are connecting to run this query (script, webpage, psql) you could always set enable_seq_scan=off; select ....; set enable_seq_scan=on; scriptomagically. Sven
Shane <shane-pgsql@cm.nu> writes: > I am working with a simple table and query abut cannot seem > to get it to use the index I have created. > ... > explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone); As some other people already pointed out, the problem is the horrible misestimate of the number of matching rows. You did not say your Postgres version, but I'm betting it's pre-8.0. Versions before 8.0 would not assume that they could get any useful statistical info from an expression involving now() (or in general, any non-immutable function). The default assumption in such cases is that a lot of rows are retrieved --- too many for an indexscan. If you cannot update to 8.0.* at the moment, a workaround is to do the timestamp calculation on the client side so that you can send over a query that's just a comparison to a constant: ... where msgtime < '2005-02-14 ...'::timestamp; regards, tom lane
On Thu, Aug 11, 2005 at 12:10:33AM -0400, Tom Lane wrote: > Shane <shane-pgsql@cm.nu> writes: > > I am working with a simple table and query abut cannot seem > > to get it to use the index I have created. > > ... > > explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without timezone); > > As some other people already pointed out, the problem is the horrible > misestimate of the number of matching rows. You did not say your > Postgres version, but I'm betting it's pre-8.0. Versions before 8.0 > would not assume that they could get any useful statistical info from > an expression involving now() (or in general, any non-immutable > function). The default assumption in such cases is that a lot of > rows are retrieved --- too many for an indexscan. Hi Tom, The version being used is 7.4.7. However, as an experiment, I ran pg8 on a different port and loaded the dataset. It needed a vacuum analyze but after that pg8 was using the index and got the row estimate correct. Thanks for the pointer, Shane