Thread: Index use in BETWEEN statement...

Index use in BETWEEN statement...

From

"Cristian Prieto"

Date:

26 September 2005, 12:29:26

Hello pals, I have the following table in Postgresql 8.0.1

Mydb# \d geoip_block
Table "public.geoip_block"
   Column    |  Type  | Modifiers
-------------+--------+-----------
 locid       | bigint |
 start_block | inet   |
 end_block   | inet   |

mydb# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
                                                      QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------
 Seq Scan on geoip_block  (cost=0.00..142772.86 rows=709688 width=8) (actual
time=14045.384..14706.927 rows=1 loops=1)
   Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
 Total runtime: 14707.038 ms

Ok, now I decided to create a index to "speed" a little the query

Mydb# create index idx_ipblocks on geoip_block(start_block, end_block);
CREATE INDEX

clickad=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
                                                      QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------
 Seq Scan on geoip_block  (cost=0.00..78033.96 rows=230141 width=8) (actual
time=12107.919..12610.199 rows=1 loops=1)
   Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
 Total runtime: 12610.329 ms
(3 rows)

I guess the planner is doing a sequential scan in the table, why not use the
compound index? Do you have any idea in how to speed up this query?

Thanks a lot!

Re: Index use in BETWEEN statement...

From

Sean Davis

Date:

26 September 2005, 13:24:07

On 9/26/05 11:26 AM, "Cristian Prieto" <cristian@clickdiario.com> wrote:

>
> Hello pals, I have the following table in Postgresql 8.0.1
>
> Mydb# \d geoip_block
> Table "public.geoip_block"
>  Column    |  Type  | Modifiers
> -------------+--------+-----------
> locid       | bigint |
> start_block | inet   |
> end_block   | inet   |
>
> mydb# explain analyze select locid from geoip_block where
> '216.230.158.50'::inet between start_block and end_block;
>                                                     QUERY PLAN
> ----------------------------------------------------------------------------
> -------------------------------------------
> Seq Scan on geoip_block  (cost=0.00..142772.86 rows=709688 width=8) (actual
> time=14045.384..14706.927 rows=1 loops=1)
>  Filter: (('216.230.158.50'::inet >= start_block) AND
> ('216.230.158.50'::inet <= end_block))
> Total runtime: 14707.038 ms
>
> Ok, now I decided to create a index to "speed" a little the query
>
> Mydb# create index idx_ipblocks on geoip_block(start_block, end_block);
> CREATE INDEX
>
> clickad=# explain analyze select locid from geoip_block where
> '216.230.158.50'::inet between start_block and end_block;
>                                                     QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------
> Seq Scan on geoip_block  (cost=0.00..78033.96 rows=230141 width=8) (actual
> time=12107.919..12610.199 rows=1 loops=1)
>  Filter: (('216.230.158.50'::inet >= start_block) AND
> ('216.230.158.50'::inet <= end_block))
> Total runtime: 12610.329 ms
> (3 rows)
>
> I guess the planner is doing a sequential scan in the table, why not use the
> compound index? Do you have any idea in how to speed up this query?

Did you vacuum analyze the table after creating the index?

Sean

Re: Index use in BETWEEN statement...

From

"Cristian Prieto"

Date:

26 September 2005, 15:45:23

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
                                                      QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------
 Seq Scan on geoip_block  (cost=0.00..78033.96 rows=230141 width=8) (actual
time=13015.538..13508.708 rows=1 loops=1)
   Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
 Total runtime: 13508.905 ms
(3 rows)

mydb=# alter table geoip_block add constraint pkey_geoip_block primary key
(start_block, end_block);
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pkey_geoip_block" for table "geoip_block"
ALTER TABLE

mydb=# vacuum analyze geoip_block;

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
                                                      QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------
 Seq Scan on geoip_block  (cost=0.00..101121.01 rows=308324 width=8) (actual
time=12128.190..12631.550 rows=1 loops=1)
   Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
 Total runtime: 12631.679 ms
(3 rows)

mydb=#


As you see it still using a sequential scan in the table and ignores the
index, any other suggestion?

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sean Davis
Sent: Lunes, 26 de Septiembre de 2005 10:24 a.m.
To: Cristian Prieto; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Index use in BETWEEN statement...

On 9/26/05 11:26 AM, "Cristian Prieto" <cristian@clickdiario.com> wrote:

>
> Hello pals, I have the following table in Postgresql 8.0.1
>
> Mydb# \d geoip_block
> Table "public.geoip_block"
>  Column    |  Type  | Modifiers
> -------------+--------+-----------
> locid       | bigint |
> start_block | inet   |
> end_block   | inet   |
>
> mydb# explain analyze select locid from geoip_block where
> '216.230.158.50'::inet between start_block and end_block;
>                                                     QUERY PLAN
>
----------------------------------------------------------------------------
> -------------------------------------------
> Seq Scan on geoip_block  (cost=0.00..142772.86 rows=709688 width=8)
(actual
> time=14045.384..14706.927 rows=1 loops=1)
>  Filter: (('216.230.158.50'::inet >= start_block) AND
> ('216.230.158.50'::inet <= end_block))
> Total runtime: 14707.038 ms
>
> Ok, now I decided to create a index to "speed" a little the query
>
> Mydb# create index idx_ipblocks on geoip_block(start_block, end_block);
> CREATE INDEX
>
> clickad=# explain analyze select locid from geoip_block where
> '216.230.158.50'::inet between start_block and end_block;
>                                                     QUERY PLAN
>
----------------------------------------------------------------------------
> ------------------------------------------
> Seq Scan on geoip_block  (cost=0.00..78033.96 rows=230141 width=8) (actual
> time=12107.919..12610.199 rows=1 loops=1)
>  Filter: (('216.230.158.50'::inet >= start_block) AND
> ('216.230.158.50'::inet <= end_block))
> Total runtime: 12610.329 ms
> (3 rows)
>
> I guess the planner is doing a sequential scan in the table, why not use
the
> compound index? Do you have any idea in how to speed up this query?

Did you vacuum analyze the table after creating the index?

Sean


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: Index use in BETWEEN statement...

From

Tom Lane

Date:

26 September 2005, 16:41:39

"Cristian Prieto" <cristian@clickdiario.com> writes:
> mydb=# explain analyze select locid from geoip_block where
> '216.230.158.50'::inet between start_block and end_block;

> As you see it still using a sequential scan in the table and ignores the
> index, any other suggestion?

That two-column index is entirely useless for this query; in fact btree
indexes of any sort are pretty useless.  You really need some sort of
multidimensional index type like rtree or gist.  There was discussion
just a week or three ago of how to optimize searches for intervals
overlapping a specified point, which is identical to your problem.
Can't remember if the question was about timestamp intervals or plain
intervals, but try checking the list archives.

            regards, tom lane

Re: Index use in BETWEEN statement...

From

Don Isgitt

Date:

26 September 2005, 18:07:42


Cristian Prieto wrote:

>mydb=# explain analyze select locid from geoip_block where
>'216.230.158.50'::inet between start_block and end_block;
>                                                      QUERY PLAN
>----------------------------------------------------------------------------
>------------------------------------------
> Seq Scan on geoip_block  (cost=0.00..78033.96 rows=230141 width=8) (actual
>time=13015.538..13508.708 rows=1 loops=1)
>   Filter: (('216.230.158.50'::inet >= start_block) AND
>('216.230.158.50'::inet <= end_block))
> Total runtime: 13508.905 ms
>(3 rows)
>
>mydb=# alter table geoip_block add constraint pkey_geoip_block primary key
>(start_block, end_block);
>NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
>"pkey_geoip_block" for table "geoip_block"
>ALTER TABLE
>
>mydb=# vacuum analyze geoip_block;
>
>mydb=# explain analyze select locid from geoip_block where
>'216.230.158.50'::inet between start_block and end_block;
>                                                      QUERY PLAN
>----------------------------------------------------------------------------
>-------------------------------------------
> Seq Scan on geoip_block  (cost=0.00..101121.01 rows=308324 width=8) (actual
>time=12128.190..12631.550 rows=1 loops=1)
>   Filter: (('216.230.158.50'::inet >= start_block) AND
>('216.230.158.50'::inet <= end_block))
> Total runtime: 12631.679 ms
>(3 rows)
>
>mydb=#
>
>
>As you see it still using a sequential scan in the table and ignores the
>index, any other suggestion?
>
>Cristian,
>
>
Please note that the planner thinks 308324 rows are being returned,
while there is actually only 1 (one!). You might try altering statistics
for the relevant column(s), analyzing the table, and then try again. If
that doesn't give you a more accurate row estimate, though, it won't help.

Don

Re: Index use in BETWEEN statement...

From

Yonatan Ben-Nes

Date:

27 September 2005, 07:46:27

Tom Lane wrote:
> "Cristian Prieto" <cristian@clickdiario.com> writes:
>
>>mydb=# explain analyze select locid from geoip_block where
>>'216.230.158.50'::inet between start_block and end_block;
>
>
>>As you see it still using a sequential scan in the table and ignores the
>>index, any other suggestion?
>
>
> That two-column index is entirely useless for this query; in fact btree
> indexes of any sort are pretty useless.  You really need some sort of
> multidimensional index type like rtree or gist.  There was discussion
> just a week or three ago of how to optimize searches for intervals
> overlapping a specified point, which is identical to your problem.
> Can't remember if the question was about timestamp intervals or plain
> intervals, but try checking the list archives.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

I think that Tom is talking about a discussion which I started entitled
"Planner create a slow plan without an available index" search for it
maybe it will help you.
At the end I created an RTREE index and it did solved my problem though
my data was 2 INT fields and not INET fields as yours so im not sure how
can you work with that... To solve my problem I created boxes from the 2
numbers and with them I did overlapping.

Re: Index use in BETWEEN statement...

From

Sean Davis

Date:

27 September 2005, 07:56:16

On 9/27/05 7:45 AM, "Yonatan Ben-Nes" <da@canaan.co.il> wrote:

> Tom Lane wrote:
>> "Cristian Prieto" <cristian@clickdiario.com> writes:
>>
>>> mydb=# explain analyze select locid from geoip_block where
>>> '216.230.158.50'::inet between start_block and end_block;
>>
>>
>>> As you see it still using a sequential scan in the table and ignores the
>>> index, any other suggestion?
>>
>>
>> That two-column index is entirely useless for this query; in fact btree
>> indexes of any sort are pretty useless.  You really need some sort of
>> multidimensional index type like rtree or gist.  There was discussion
>> just a week or three ago of how to optimize searches for intervals
>> overlapping a specified point, which is identical to your problem.
>> Can't remember if the question was about timestamp intervals or plain
>> intervals, but try checking the list archives.
>>
>> regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>>                http://archives.postgresql.org
>
> I think that Tom is talking about a discussion which I started entitled
> "Planner create a slow plan without an available index" search for it
> maybe it will help you.
> At the end I created an RTREE index and it did solved my problem though
> my data was 2 INT fields and not INET fields as yours so im not sure how
> can you work with that... To solve my problem I created boxes from the 2
> numbers and with them I did overlapping.

There is some code in this thread that shows the box approach explicitly:

http://archives.postgresql.org/pgsql-sql/2005-09/msg00189.php

Sean