Thread: Index use in BETWEEN statement...
Hello pals, I have the following table in Postgresql 8.0.1 Mydb# \d geoip_block Table "public.geoip_block" Column | Type | Modifiers -------------+--------+----------- locid | bigint | start_block | inet | end_block | inet | mydb# explain analyze select locid from geoip_block where '216.230.158.50'::inet between start_block and end_block; QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------- Seq Scan on geoip_block (cost=0.00..142772.86 rows=709688 width=8) (actual time=14045.384..14706.927 rows=1 loops=1) Filter: (('216.230.158.50'::inet >= start_block) AND ('216.230.158.50'::inet <= end_block)) Total runtime: 14707.038 ms Ok, now I decided to create a index to "speed" a little the query Mydb# create index idx_ipblocks on geoip_block(start_block, end_block); CREATE INDEX clickad=# explain analyze select locid from geoip_block where '216.230.158.50'::inet between start_block and end_block; QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------ Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual time=12107.919..12610.199 rows=1 loops=1) Filter: (('216.230.158.50'::inet >= start_block) AND ('216.230.158.50'::inet <= end_block)) Total runtime: 12610.329 ms (3 rows) I guess the planner is doing a sequential scan in the table, why not use the compound index? Do you have any idea in how to speed up this query? Thanks a lot!
On 9/26/05 11:26 AM, "Cristian Prieto" <cristian@clickdiario.com> wrote: > > Hello pals, I have the following table in Postgresql 8.0.1 > > Mydb# \d geoip_block > Table "public.geoip_block" > Column | Type | Modifiers > -------------+--------+----------- > locid | bigint | > start_block | inet | > end_block | inet | > > mydb# explain analyze select locid from geoip_block where > '216.230.158.50'::inet between start_block and end_block; > QUERY PLAN > ---------------------------------------------------------------------------- > ------------------------------------------- > Seq Scan on geoip_block (cost=0.00..142772.86 rows=709688 width=8) (actual > time=14045.384..14706.927 rows=1 loops=1) > Filter: (('216.230.158.50'::inet >= start_block) AND > ('216.230.158.50'::inet <= end_block)) > Total runtime: 14707.038 ms > > Ok, now I decided to create a index to "speed" a little the query > > Mydb# create index idx_ipblocks on geoip_block(start_block, end_block); > CREATE INDEX > > clickad=# explain analyze select locid from geoip_block where > '216.230.158.50'::inet between start_block and end_block; > QUERY PLAN > ---------------------------------------------------------------------------- > ------------------------------------------ > Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual > time=12107.919..12610.199 rows=1 loops=1) > Filter: (('216.230.158.50'::inet >= start_block) AND > ('216.230.158.50'::inet <= end_block)) > Total runtime: 12610.329 ms > (3 rows) > > I guess the planner is doing a sequential scan in the table, why not use the > compound index? Do you have any idea in how to speed up this query? Did you vacuum analyze the table after creating the index? Sean
mydb=# explain analyze select locid from geoip_block where '216.230.158.50'::inet between start_block and end_block; QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------ Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual time=13015.538..13508.708 rows=1 loops=1) Filter: (('216.230.158.50'::inet >= start_block) AND ('216.230.158.50'::inet <= end_block)) Total runtime: 13508.905 ms (3 rows) mydb=# alter table geoip_block add constraint pkey_geoip_block primary key (start_block, end_block); NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pkey_geoip_block" for table "geoip_block" ALTER TABLE mydb=# vacuum analyze geoip_block; mydb=# explain analyze select locid from geoip_block where '216.230.158.50'::inet between start_block and end_block; QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------- Seq Scan on geoip_block (cost=0.00..101121.01 rows=308324 width=8) (actual time=12128.190..12631.550 rows=1 loops=1) Filter: (('216.230.158.50'::inet >= start_block) AND ('216.230.158.50'::inet <= end_block)) Total runtime: 12631.679 ms (3 rows) mydb=# As you see it still using a sequential scan in the table and ignores the index, any other suggestion? -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sean Davis Sent: Lunes, 26 de Septiembre de 2005 10:24 a.m. To: Cristian Prieto; pgsql-general@postgresql.org Subject: Re: [GENERAL] Index use in BETWEEN statement... On 9/26/05 11:26 AM, "Cristian Prieto" <cristian@clickdiario.com> wrote: > > Hello pals, I have the following table in Postgresql 8.0.1 > > Mydb# \d geoip_block > Table "public.geoip_block" > Column | Type | Modifiers > -------------+--------+----------- > locid | bigint | > start_block | inet | > end_block | inet | > > mydb# explain analyze select locid from geoip_block where > '216.230.158.50'::inet between start_block and end_block; > QUERY PLAN > ---------------------------------------------------------------------------- > ------------------------------------------- > Seq Scan on geoip_block (cost=0.00..142772.86 rows=709688 width=8) (actual > time=14045.384..14706.927 rows=1 loops=1) > Filter: (('216.230.158.50'::inet >= start_block) AND > ('216.230.158.50'::inet <= end_block)) > Total runtime: 14707.038 ms > > Ok, now I decided to create a index to "speed" a little the query > > Mydb# create index idx_ipblocks on geoip_block(start_block, end_block); > CREATE INDEX > > clickad=# explain analyze select locid from geoip_block where > '216.230.158.50'::inet between start_block and end_block; > QUERY PLAN > ---------------------------------------------------------------------------- > ------------------------------------------ > Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual > time=12107.919..12610.199 rows=1 loops=1) > Filter: (('216.230.158.50'::inet >= start_block) AND > ('216.230.158.50'::inet <= end_block)) > Total runtime: 12610.329 ms > (3 rows) > > I guess the planner is doing a sequential scan in the table, why not use the > compound index? Do you have any idea in how to speed up this query? Did you vacuum analyze the table after creating the index? Sean ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
"Cristian Prieto" <cristian@clickdiario.com> writes: > mydb=# explain analyze select locid from geoip_block where > '216.230.158.50'::inet between start_block and end_block; > As you see it still using a sequential scan in the table and ignores the > index, any other suggestion? That two-column index is entirely useless for this query; in fact btree indexes of any sort are pretty useless. You really need some sort of multidimensional index type like rtree or gist. There was discussion just a week or three ago of how to optimize searches for intervals overlapping a specified point, which is identical to your problem. Can't remember if the question was about timestamp intervals or plain intervals, but try checking the list archives. regards, tom lane
Cristian Prieto wrote: >mydb=# explain analyze select locid from geoip_block where >'216.230.158.50'::inet between start_block and end_block; > QUERY PLAN >---------------------------------------------------------------------------- >------------------------------------------ > Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual >time=13015.538..13508.708 rows=1 loops=1) > Filter: (('216.230.158.50'::inet >= start_block) AND >('216.230.158.50'::inet <= end_block)) > Total runtime: 13508.905 ms >(3 rows) > >mydb=# alter table geoip_block add constraint pkey_geoip_block primary key >(start_block, end_block); >NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index >"pkey_geoip_block" for table "geoip_block" >ALTER TABLE > >mydb=# vacuum analyze geoip_block; > >mydb=# explain analyze select locid from geoip_block where >'216.230.158.50'::inet between start_block and end_block; > QUERY PLAN >---------------------------------------------------------------------------- >------------------------------------------- > Seq Scan on geoip_block (cost=0.00..101121.01 rows=308324 width=8) (actual >time=12128.190..12631.550 rows=1 loops=1) > Filter: (('216.230.158.50'::inet >= start_block) AND >('216.230.158.50'::inet <= end_block)) > Total runtime: 12631.679 ms >(3 rows) > >mydb=# > > >As you see it still using a sequential scan in the table and ignores the >index, any other suggestion? > >Cristian, > > Please note that the planner thinks 308324 rows are being returned, while there is actually only 1 (one!). You might try altering statistics for the relevant column(s), analyzing the table, and then try again. If that doesn't give you a more accurate row estimate, though, it won't help. Don
Tom Lane wrote: > "Cristian Prieto" <cristian@clickdiario.com> writes: > >>mydb=# explain analyze select locid from geoip_block where >>'216.230.158.50'::inet between start_block and end_block; > > >>As you see it still using a sequential scan in the table and ignores the >>index, any other suggestion? > > > That two-column index is entirely useless for this query; in fact btree > indexes of any sort are pretty useless. You really need some sort of > multidimensional index type like rtree or gist. There was discussion > just a week or three ago of how to optimize searches for intervals > overlapping a specified point, which is identical to your problem. > Can't remember if the question was about timestamp intervals or plain > intervals, but try checking the list archives. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org I think that Tom is talking about a discussion which I started entitled "Planner create a slow plan without an available index" search for it maybe it will help you. At the end I created an RTREE index and it did solved my problem though my data was 2 INT fields and not INET fields as yours so im not sure how can you work with that... To solve my problem I created boxes from the 2 numbers and with them I did overlapping.
On 9/27/05 7:45 AM, "Yonatan Ben-Nes" <da@canaan.co.il> wrote: > Tom Lane wrote: >> "Cristian Prieto" <cristian@clickdiario.com> writes: >> >>> mydb=# explain analyze select locid from geoip_block where >>> '216.230.158.50'::inet between start_block and end_block; >> >> >>> As you see it still using a sequential scan in the table and ignores the >>> index, any other suggestion? >> >> >> That two-column index is entirely useless for this query; in fact btree >> indexes of any sort are pretty useless. You really need some sort of >> multidimensional index type like rtree or gist. There was discussion >> just a week or three ago of how to optimize searches for intervals >> overlapping a specified point, which is identical to your problem. >> Can't remember if the question was about timestamp intervals or plain >> intervals, but try checking the list archives. >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org > > I think that Tom is talking about a discussion which I started entitled > "Planner create a slow plan without an available index" search for it > maybe it will help you. > At the end I created an RTREE index and it did solved my problem though > my data was 2 INT fields and not INET fields as yours so im not sure how > can you work with that... To solve my problem I created boxes from the 2 > numbers and with them I did overlapping. There is some code in this thread that shows the box approach explicitly: http://archives.postgresql.org/pgsql-sql/2005-09/msg00189.php Sean