Thread: index and queries using '<' '>'

index and queries using '<' '>'

From
Marc Boucher
Date:
I'm using PG 7.3.4

I've a table with a column of type int8 where I store date-based values,
and an index exists for it.
The problem is that the index is almost never used with the '>' test.

# explain SELECT date FROM album WHERE (date='1093989600');
 Index Scan using date_album_key on album  (cost=0.00..86.31 rows=21 width=8)
   Index Cond: (date = 1093989600::bigint)

# explain SELECT date FROM album WHERE (date>'1093989600');
 Seq Scan on album  (cost=0.00..907.91 rows=447 width=8)
   Filter: (date > 1093989600::bigint)

# explain SELECT date FROM album WHERE (date>'1099989600');
 Index Scan using date_album_key on album  (cost=0.00..323.09 rows=84 width=8)
   Index Cond: (date > 1099989600::bigint)


It works when the query is supposed to generate low number of rows. The
problem is that the execution time is much longer with a scan.
How can I force the use of this index?


--
Marc


Re: index and queries using '<' '>'

From
Martijn van Oosterhout
Date:
The system seems to think that a scan is cheap because the table is so
small. Have you ever ANALYZEd that table? Also, EXPLAIN ANALYZE gives a
much better idea of what is going on...

On Thu, Nov 18, 2004 at 03:55:12PM +0100, Marc Boucher wrote:
> I'm using PG 7.3.4
>
> I've a table with a column of type int8 where I store date-based values,
> and an index exists for it.
> The problem is that the index is almost never used with the '>' test.
>
> # explain SELECT date FROM album WHERE (date='1093989600');
>  Index Scan using date_album_key on album  (cost=0.00..86.31 rows=21 width=8)
>    Index Cond: (date = 1093989600::bigint)
>
> # explain SELECT date FROM album WHERE (date>'1093989600');
>  Seq Scan on album  (cost=0.00..907.91 rows=447 width=8)
>    Filter: (date > 1093989600::bigint)
>
> # explain SELECT date FROM album WHERE (date>'1099989600');
>  Index Scan using date_album_key on album  (cost=0.00..323.09 rows=84 width=8)
>    Index Cond: (date > 1099989600::bigint)
>
>
> It works when the query is supposed to generate low number of rows. The
> problem is that the execution time is much longer with a scan.
> How can I force the use of this index?
>
>
> --
> Marc
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: index and queries using '<' '>'

From
Marc Boucher
Date:
At 16:23 18/11/2004 +0100, Martijn van Oosterhout wrote:
> The system seems to think that a scan is cheap because the table is so
> small.
The table currently contains just over 10000 elements. So 238 rows is a
small part of it.

> Have you ever ANALYZEd that table?
Yes. That and reindex (in case of a problem).
I have also tried ALTER TABLE ALTER COLUMN SET STATISTICS, but without
improvement.

> Also, EXPLAIN ANALYZE gives a much better idea of what is going on...
You mean ANALYZE VERBOSE ?


Since the table is still growing, and the amount of rows in the reply of
the query is quite uniform (it's not dependant on the size of the table), I
hope that the statistics will evolve in a state that will force the use of
the index.


thanks

--
Marc


Re: index and queries using '<' '>'

From
Bruno Wolff III
Date:
On Fri, Nov 19, 2004 at 14:45:04 +0100,
  Marc Boucher <pgml@gmx.net> wrote:
> At 16:23 18/11/2004 +0100, Martijn van Oosterhout wrote:
> > The system seems to think that a scan is cheap because the table is so
> > small.
> The table currently contains just over 10000 elements. So 238 rows is a
> small part of it.

No, small is typically less than 1%. This depends on the size of the rows
and how much better accessing disk blocks sequentially is in your
enviroment and the size of your cache.

Because your table is so small it will probably all be cached after being
read through once, so you may want to tune your config settings to
say than random disk access costs only a little more than sequential
access. However, you need to be careful if your table is going to grow
a lot larger.

> Since the table is still growing, and the amount of rows in the reply of
> the query is quite uniform (it's not dependant on the size of the table), I
> hope that the statistics will evolve in a state that will force the use of
> the index.

Index scans aren't always faster than sequential scans.

Re: index and queries using '<' '>'

From
Marc Boucher
Date:
At 08:16 19/11/2004 -0600, Bruno Wolff III wrote:
>> The table currently contains just over 10000 elements. So 238 rows is a
>> small part of it.
> No, small is typically less than 1%. This depends on the size of the rows
> and how much better accessing disk blocks sequentially is in your
> enviroment and the size of your cache.
PG runs on an old computer (200Mhz, 64MB ram); this is probably part of my
"problem". With modern hard drives, sequential scan could be faster.

> Because your table is so small it will probably all be cached after being
> read through once, so you may want to tune your config settings to
> say than random disk access costs only a little more than sequential
> access.
I think that the indexes are all cached after a while, but I doubt that the
tables can.

> However, you need to be careful if your table is going to grow
> a lot larger.
The whole database is quite large (that is for the computer it is on).

>> Since the table is still growing, and the amount of rows in the reply of
>> the query is quite uniform (it's not dependant on the size of the table), I
>> hope that the statistics will evolve in a state that will force the use of
>> the index.
> Index scans aren't always faster than sequential scans.
I know that, but I've some comparisons with other queries. And someone
advised me to try "set enable_seqscan=off;". It takes 50-60% (after
checking right now) less to use the index. Unfortunately I can't use this
setting, the query being part of a larger query (joins), and the time
gained on this particular index is partially lost on the joins.


--
Marc


Re: index and queries using '<' '>'

From
Bruno Wolff III
Date:
On Fri, Nov 19, 2004 at 15:47:21 +0100,
  Marc Boucher <pgml@gmx.net> wrote:
> I know that, but I've some comparisons with other queries. And someone
> advised me to try "set enable_seqscan=off;". It takes 50-60% (after
> checking right now) less to use the index. Unfortunately I can't use this
> setting, the query being part of a larger query (joins), and the time
> gained on this particular index is partially lost on the joins.

Tweaking random_page_cost may help. The default is 4, which is often too
high. It is supposed to represent the relative cost of retrieving disk
blocks in a random order as compared to retrieving them in sequential
order. Lowering this value will make index scans look better. Than value
should not be lowered below 1.

Re: index and queries using '<' '>'

From
Tom Lane
Date:
Marc Boucher <pgml@gmx.net> writes:
>> Index scans aren't always faster than sequential scans.

> I know that, but I've some comparisons with other queries. And someone
> advised me to try "set enable_seqscan=off;". It takes 50-60% (after
> checking right now) less to use the index. Unfortunately I can't use this
> setting, the query being part of a larger query (joins), and the time
> gained on this particular index is partially lost on the joins.

A less brute-force way of encouraging the planner to use indexscans is
to reduce the random_page_cost setting.  It defaults to 4 but many
people find that values nearer 2 are more representative of what happens
in their environments.

            regards, tom lane

Re: index and queries using '<' '>'

From
Stephan Szabo
Date:
On Fri, 19 Nov 2004, Marc Boucher wrote:

> At 16:23 18/11/2004 +0100, Martijn van Oosterhout wrote:
> > The system seems to think that a scan is cheap because the table is so
> > small.
> The table currently contains just over 10000 elements. So 238 rows is a
> small part of it.
>
> > Have you ever ANALYZEd that table?
> Yes. That and reindex (in case of a problem).
> I have also tried ALTER TABLE ALTER COLUMN SET STATISTICS, but without
> improvement.
>
> > Also, EXPLAIN ANALYZE gives a much better idea of what is going on...
> You mean ANALYZE VERBOSE ?

EXPLAIN ANALYZE is similar to EXPLAIN except that it runs the query
and gives real time per step and real row counts rather than the
estimates.  It's very handy.

If random page reads are consistently less expensive than postgres
currently thinks, you may want to lower random_page_cost.  EXPLAIN ANALYZE
is useful here because it lets you compare the real times versus the
estimated costs of both scan types when used with enable_seqscan=off.

Re: index and queries using '<' '>'

From
Marc Boucher
Date:
At 16:23 18/11/2004 +0100, Martijn van Oosterhout wrote:
> The system seems to think that a scan is cheap because the table is so
> small.
The table currently contains just over 10000 elements. So 238 rows is a
small part of it.

> Have you ever ANALYZEd that table?
Yes. That and reindex (in case of a problem).

> Also, EXPLAIN ANALYZE gives a much better idea of what is going on...
You mean ANALYZE VERBOSE ?


Since the table is still growing, and the amount of rows in the reply of
the query is quite uniform (it's not dependant on the size of the table), I
hope that the statistics will evolve in a state that will force the use of
the index.


thanks

--
Marc