Thread: index and queries using '<' '>'
I'm using PG 7.3.4 I've a table with a column of type int8 where I store date-based values, and an index exists for it. The problem is that the index is almost never used with the '>' test. # explain SELECT date FROM album WHERE (date='1093989600'); Index Scan using date_album_key on album (cost=0.00..86.31 rows=21 width=8) Index Cond: (date = 1093989600::bigint) # explain SELECT date FROM album WHERE (date>'1093989600'); Seq Scan on album (cost=0.00..907.91 rows=447 width=8) Filter: (date > 1093989600::bigint) # explain SELECT date FROM album WHERE (date>'1099989600'); Index Scan using date_album_key on album (cost=0.00..323.09 rows=84 width=8) Index Cond: (date > 1099989600::bigint) It works when the query is supposed to generate low number of rows. The problem is that the execution time is much longer with a scan. How can I force the use of this index? -- Marc
The system seems to think that a scan is cheap because the table is so small. Have you ever ANALYZEd that table? Also, EXPLAIN ANALYZE gives a much better idea of what is going on... On Thu, Nov 18, 2004 at 03:55:12PM +0100, Marc Boucher wrote: > I'm using PG 7.3.4 > > I've a table with a column of type int8 where I store date-based values, > and an index exists for it. > The problem is that the index is almost never used with the '>' test. > > # explain SELECT date FROM album WHERE (date='1093989600'); > Index Scan using date_album_key on album (cost=0.00..86.31 rows=21 width=8) > Index Cond: (date = 1093989600::bigint) > > # explain SELECT date FROM album WHERE (date>'1093989600'); > Seq Scan on album (cost=0.00..907.91 rows=447 width=8) > Filter: (date > 1093989600::bigint) > > # explain SELECT date FROM album WHERE (date>'1099989600'); > Index Scan using date_album_key on album (cost=0.00..323.09 rows=84 width=8) > Index Cond: (date > 1099989600::bigint) > > > It works when the query is supposed to generate low number of rows. The > problem is that the execution time is much longer with a scan. > How can I force the use of this index? > > > -- > Marc > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
At 16:23 18/11/2004 +0100, Martijn van Oosterhout wrote: > The system seems to think that a scan is cheap because the table is so > small. The table currently contains just over 10000 elements. So 238 rows is a small part of it. > Have you ever ANALYZEd that table? Yes. That and reindex (in case of a problem). I have also tried ALTER TABLE ALTER COLUMN SET STATISTICS, but without improvement. > Also, EXPLAIN ANALYZE gives a much better idea of what is going on... You mean ANALYZE VERBOSE ? Since the table is still growing, and the amount of rows in the reply of the query is quite uniform (it's not dependant on the size of the table), I hope that the statistics will evolve in a state that will force the use of the index. thanks -- Marc
On Fri, Nov 19, 2004 at 14:45:04 +0100, Marc Boucher <pgml@gmx.net> wrote: > At 16:23 18/11/2004 +0100, Martijn van Oosterhout wrote: > > The system seems to think that a scan is cheap because the table is so > > small. > The table currently contains just over 10000 elements. So 238 rows is a > small part of it. No, small is typically less than 1%. This depends on the size of the rows and how much better accessing disk blocks sequentially is in your enviroment and the size of your cache. Because your table is so small it will probably all be cached after being read through once, so you may want to tune your config settings to say than random disk access costs only a little more than sequential access. However, you need to be careful if your table is going to grow a lot larger. > Since the table is still growing, and the amount of rows in the reply of > the query is quite uniform (it's not dependant on the size of the table), I > hope that the statistics will evolve in a state that will force the use of > the index. Index scans aren't always faster than sequential scans.
At 08:16 19/11/2004 -0600, Bruno Wolff III wrote: >> The table currently contains just over 10000 elements. So 238 rows is a >> small part of it. > No, small is typically less than 1%. This depends on the size of the rows > and how much better accessing disk blocks sequentially is in your > enviroment and the size of your cache. PG runs on an old computer (200Mhz, 64MB ram); this is probably part of my "problem". With modern hard drives, sequential scan could be faster. > Because your table is so small it will probably all be cached after being > read through once, so you may want to tune your config settings to > say than random disk access costs only a little more than sequential > access. I think that the indexes are all cached after a while, but I doubt that the tables can. > However, you need to be careful if your table is going to grow > a lot larger. The whole database is quite large (that is for the computer it is on). >> Since the table is still growing, and the amount of rows in the reply of >> the query is quite uniform (it's not dependant on the size of the table), I >> hope that the statistics will evolve in a state that will force the use of >> the index. > Index scans aren't always faster than sequential scans. I know that, but I've some comparisons with other queries. And someone advised me to try "set enable_seqscan=off;". It takes 50-60% (after checking right now) less to use the index. Unfortunately I can't use this setting, the query being part of a larger query (joins), and the time gained on this particular index is partially lost on the joins. -- Marc
On Fri, Nov 19, 2004 at 15:47:21 +0100, Marc Boucher <pgml@gmx.net> wrote: > I know that, but I've some comparisons with other queries. And someone > advised me to try "set enable_seqscan=off;". It takes 50-60% (after > checking right now) less to use the index. Unfortunately I can't use this > setting, the query being part of a larger query (joins), and the time > gained on this particular index is partially lost on the joins. Tweaking random_page_cost may help. The default is 4, which is often too high. It is supposed to represent the relative cost of retrieving disk blocks in a random order as compared to retrieving them in sequential order. Lowering this value will make index scans look better. Than value should not be lowered below 1.
Marc Boucher <pgml@gmx.net> writes: >> Index scans aren't always faster than sequential scans. > I know that, but I've some comparisons with other queries. And someone > advised me to try "set enable_seqscan=off;". It takes 50-60% (after > checking right now) less to use the index. Unfortunately I can't use this > setting, the query being part of a larger query (joins), and the time > gained on this particular index is partially lost on the joins. A less brute-force way of encouraging the planner to use indexscans is to reduce the random_page_cost setting. It defaults to 4 but many people find that values nearer 2 are more representative of what happens in their environments. regards, tom lane
On Fri, 19 Nov 2004, Marc Boucher wrote: > At 16:23 18/11/2004 +0100, Martijn van Oosterhout wrote: > > The system seems to think that a scan is cheap because the table is so > > small. > The table currently contains just over 10000 elements. So 238 rows is a > small part of it. > > > Have you ever ANALYZEd that table? > Yes. That and reindex (in case of a problem). > I have also tried ALTER TABLE ALTER COLUMN SET STATISTICS, but without > improvement. > > > Also, EXPLAIN ANALYZE gives a much better idea of what is going on... > You mean ANALYZE VERBOSE ? EXPLAIN ANALYZE is similar to EXPLAIN except that it runs the query and gives real time per step and real row counts rather than the estimates. It's very handy. If random page reads are consistently less expensive than postgres currently thinks, you may want to lower random_page_cost. EXPLAIN ANALYZE is useful here because it lets you compare the real times versus the estimated costs of both scan types when used with enable_seqscan=off.
At 16:23 18/11/2004 +0100, Martijn van Oosterhout wrote: > The system seems to think that a scan is cheap because the table is so > small. The table currently contains just over 10000 elements. So 238 rows is a small part of it. > Have you ever ANALYZEd that table? Yes. That and reindex (in case of a problem). > Also, EXPLAIN ANALYZE gives a much better idea of what is going on... You mean ANALYZE VERBOSE ? Since the table is still growing, and the amount of rows in the reply of the query is quite uniform (it's not dependant on the size of the table), I hope that the statistics will evolve in a state that will force the use of the index. thanks -- Marc