Thread: using hash index when BETWEEN is specified

using hash index when BETWEEN is specified

From

Zdenek Kotala

Date:

10 September 2008, 07:39:04

I has played with new hash index implementation and I tried following 
command:

postgres=# select * from test where id between 1 and 5;
Time: 9651,033 ms
postgres=# explain select * from test where id between 1 and 5;                       QUERY PLAN
--------------------------------------------------------- Seq Scan on test  (cost=0.00..141681.00 rows=1 width=4)
Filter:((id >= 1) AND (id <= 5))
 
(2 rows)


Hash index is created on id column. However when I use

postgres=# explain select * from test where id in (1,2,3,4,5);                               QUERY PLAN
------------------------------------------------------------------------- Bitmap Heap Scan on test  (cost=22.24..332.53
rows=83width=4)   Recheck Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))   ->  Bitmap Index Scan on test_idx
(cost=0.00..22.22rows=83 width=0)         Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
 
(4 rows)

Time: 1,352 ms

I'm not planner guru but it seems to me that BETWEEN clause could be 
rewritten as a IN clause for integer data types and small interval.

    Zdenek

Re: using hash index when BETWEEN is specified

From

"Asko Oja"

Date:

10 September 2008, 07:52:28

On Wed, Sep 10, 2008 at 1:39 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:

I has played with new hash index implementation and I tried following command:

postgres=# select * from test where id between 1 and 5;
Time: 9651,033 ms
postgres=# explain select * from test where id between 1 and 5;
QUERY PLAN
---------------------------------------------------------
Seq Scan on test (cost=0.00..141681.00 rows=1 width=4)
Filter: ((id >= 1) AND (id <= 5))
(2 rows)

Hash index is created on id column. However when I use

postgres=# explain select * from test where id in (1,2,3,4,5);
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=22.24..332.53 rows=83 width=4)
Recheck Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
-> Bitmap Index Scan on test_idx (cost=0.00..22.22 rows=83 width=0)
Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
(4 rows)

Time: 1,352 ms

I'm not planner guru but it seems to me that BETWEEN clause could be rewritten as a IN clause for integer data types and small interval.

Where should the line be drawn.
Define small :)

Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: using hash index when BETWEEN is specified

From

"Robert Haas"

Date:

10 September 2008, 08:24:11

>> I'm not planner guru but it seems to me that BETWEEN clause could be
>> rewritten as a IN clause for integer data types and small interval.
>
> Where should the line be drawn.
> Define small :)

When the estimated cost is lower?

...Robert

Re: using hash index when BETWEEN is specified

From

Hannu Krosing

Date:

10 September 2008, 09:25:46

On Wed, 2008-09-10 at 07:13 -0400, Robert Haas wrote:
> >> I'm not planner guru but it seems to me that BETWEEN clause could be
> >> rewritten as a IN clause for integer data types and small interval.
> >
> > Where should the line be drawn.
> > Define small :)
> 
> When the estimated cost is lower?

You still need to draw a line for when to even try estimating the cost .

Will this be interval of 10 ? or 100 ? or 10000 ?

----------------
Hannu

Re: using hash index when BETWEEN is specified

From

Zdenek Kotala

Date:

10 September 2008, 11:31:09

Hannu Krosing napsal(a):
> On Wed, 2008-09-10 at 07:13 -0400, Robert Haas wrote:
>>>> I'm not planner guru but it seems to me that BETWEEN clause could be
>>>> rewritten as a IN clause for integer data types and small interval.
>>> Where should the line be drawn.
>>> Define small :)
>> When the estimated cost is lower?
> 
> You still need to draw a line for when to even try estimating the cost .
> 
> Will this be interval of 10 ? or 100 ? or 10000 ?

I think it depends of ration of unique integer number in a table and 
numbers of requested interval, number distribution and total number of rows.

For example if you have 10 distinct number and each has 100 occurrence 
then full scan is better (for between 1 and 5). But if each number 
occurs 100000x. Then using hash index should be effective.
    Zdenek

Re: using hash index when BETWEEN is specified

From

Tom Lane

Date:

10 September 2008, 13:13:08

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> I think it depends of ration of unique integer number in a table and 
> numbers of requested interval, number distribution and total number of rows.

> For example if you have 10 distinct number and each has 100 occurrence 
> then full scan is better (for between 1 and 5). But if each number 
> occurs 100000x. Then using hash index should be effective.

I think this discussion is a complete waste of time.  Hash indexes don't
win against btrees for single indexscans currently.  Even if that ever
gets fixed, it's highly unlikely that they'd win for N separate
indexscans versus 1 indexscan, which is what a query rewrite of this
sort would produce.  Remember that the btree will have the desired range
of keys stored adjacently, whereas in a hash they are almost certainly
in distinct buckets, and likely not even close-together buckets if the
hash function is doing its job well.  So you really are talking about a
factor of N both in indexscan setup overhead and in I/O costs.
        regards, tom lane