Thread: Help optimizing a slow index scan

Help optimizing a slow index scan

From
Dan Harris
Date:
explain analyze
select distinct eventmain.incidentid, eventmain.entrydate,
eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy
from eventmain, eventgeo
where
    eventmain.incidentid = eventgeo.incidentid and
    ( long > -104.998027962962 and long < -104.985957781349 ) and
    ( lat > 39.7075542720006 and lat < 39.7186195832938 ) and
    eventmain.entrydate > '2006-1-1 00:00' and
    eventmain.entrydate <= '2006-3-17 00:00'
order by
    eventmain.entrydate;

   QUERY
PLAN
                          


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=121313.81..121330.72 rows=451 width=178) (actual
time=723719.761..723726.875 rows=1408 loops=1)
   ->  Sort  (cost=121313.81..121314.94 rows=451 width=178) (actual
time=723719.755..723721.807 rows=1408 loops=1)
         Sort Key: eventmain.entrydate, eventmain.disposition,
eventmain.incidentid, eventgeo.reportingarea, eventgeo.beatid,
eventmain.finaltype, eventmain.casenumber, eventgeo.eventlocation,
eventmain.insertdate, eventmain.priority, eventgeo.long, eventgeo.lat,
eventgeo.geox, eventgeo.geoy
         ->  Nested Loop  (cost=0.00..121293.93 rows=451 width=178)
(actual time=1916.230..723712.900 rows=1408 loops=1)
               ->  Index Scan using eventgeo_lat_idx on eventgeo
(cost=0.00..85488.05 rows=10149 width=76) (actual time=0.402..393376.129
rows=22937 loops=1)
                     Index Cond: ((lat > 39.7075542720006::double
precision) AND (lat < 39.7186195832938::double precision))
                     Filter: ((long > -104.998027962962::double
precision) AND (long < -104.985957781349::double precision))
               ->  Index Scan using eventmain_incidentid_idx on
eventmain  (cost=0.00..3.52 rows=1 width=119) (actual
time=14.384..14.392 rows=0 loops=22937)
                     Index Cond: ((eventmain.incidentid)::text =
("outer".incidentid)::text)
                     Filter: ((entrydate > '2006-01-01
00:00:00'::timestamp without time zone) AND (entrydate <= '2006-03-17
00:00:00'::timestamp without time zone))

 Total runtime:  >>> 723729.238 ms(!) <<<



I'm trying to figure out why it's consuming so much time on the index
scan for eventgeo_lat_idx.  Also, I have an index on "long" that the
planner does not appear to find helpful.

There are 3.3 million records in eventmain and eventgeo.  The server has
a reasonably fast RAID10 setup with 16x 15k RPM drives and 12GB of RAM (
11GB listed as "cache" by vmstat ).  Running version 8.0.2 on linux
kernel 2.6.12.

I have just vacuum analyze'd both tables, rebuilt the eventgeo_lat_idx
index and reran the query multiple times to see if caching helped ( it
didn't help much ).   The server seems to be fine utilizing other fields
from this table but using "long" and "lat" seem to drag it down
significantly.

  Is it because there's such slight differences between the records,
since they are all within a few hundredths of a degree from each other?

Thanks for your time and ideas.

-Dan

Re: Help optimizing a slow index scan

From
Dan Harris
Date:
Dan Harris wrote:
> explain analyze
.... doh.. sorry to reply to my own post.  But I messed up copying some
of the fields into the select statement that you'll see in the "Sort
Key" section of the analyze results.  The mistake was mine.  Everything
else is "normal" between the query and the plan.

-Dan

Re: Help optimizing a slow index scan

From
Dan Harris
Date:
Markus Bertheau wrote:
> Have you tried using a GIST index on lat & long? These things are
> meant for two-dimensional data, whereas btree doesn't handle
> two-dimensional data that well. How many rows satisfy either of the
> long / lat condition?
>
>
>>
According to the analyze, less than 500 rows matched.  I'll look into
GIST indexes, thanks for the feedback.

-Dan

Re: Help optimizing a slow index scan

From
Dan Harris
Date:
Dan Harris wrote:
> Markus Bertheau wrote:
>> Have you tried using a GIST index on lat & long? These things are
>> meant for two-dimensional data, whereas btree doesn't handle
>> two-dimensional data that well. How many rows satisfy either of the
>> long / lat condition?
>>
>>
>>>
> According to the analyze, less than 500 rows matched.  I'll look into
> GIST indexes, thanks for the feedback.
>
> -Dan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

When I try to create a GIST index, I get the following error:

create index eventgeo_lat_idx on eventgeo using GIST (lat);

ERROR:  data type double precision has no default operator class for
access method "gist"
HINT:  You must specify an operator class for the index or define a
default operator class for the data type.

I'm not sure what a "default operator class" is, exactly..

-Dan

Re: Help optimizing a slow index scan

From
"Merlin Moncure"
Date:
On 3/16/06, Dan Harris <fbsd@drivefaster.net> wrote:
> explain analyze
> select distinct eventmain.incidentid, eventmain.entrydate,
> eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy
> from eventmain, eventgeo
> where
>     eventmain.incidentid = eventgeo.incidentid and
>     ( long > -104.998027962962 and long < -104.985957781349 ) and
>     ( lat > 39.7075542720006 and lat < 39.7186195832938 ) and
>     eventmain.entrydate > '2006-1-1 00:00' and
>     eventmain.entrydate <= '2006-3-17 00:00'
> order by
>     eventmain.entrydate;

As others will probably mention, effective queries on lot/long which
is a spatial problem will require r-tree or gist.  I don't have a lot
of experience with exotic indexes but this may be the way to go.

One easy optimization to consider making is to make an index on either
(incidentid, entrydate) or (incident_id,long) which ever is more
selective.

This is 'yet another query' that would be fun to try out and tweak
using the 8.2 upcoming row-wise comparison.

merlin

Re: Help optimizing a slow index scan

From
Bruno Wolff III
Date:
On Fri, Mar 17, 2006 at 08:34:26 -0700,
  Dan Harris <fbsd@drivefaster.net> wrote:
> Markus Bertheau wrote:
> >Have you tried using a GIST index on lat & long? These things are
> >meant for two-dimensional data, whereas btree doesn't handle
> >two-dimensional data that well. How many rows satisfy either of the
> >long / lat condition?
> >
> >
> >>
> According to the analyze, less than 500 rows matched.  I'll look into
> GIST indexes, thanks for the feedback.

Have you looked at using the Earth Distance contrib module? If a spherical
model of the earth is suitable for your application, then it may work for you
and might be easier than trying to create Gist indexes yourself.

Re: Help optimizing a slow index scan

From
"Merlin Moncure"
Date:
On 3/17/06, Bruno Wolff III <bruno@wolff.to> wrote:
> Have you looked at using the Earth Distance contrib module? If a spherical
> model of the earth is suitable for your application, then it may work for you
> and might be easier than trying to create Gist indexes yourself.

earth distance = great stuff.  If the maximum error is known then you
can just pad the distance and filter the result on the client if exact
precision is needed.

Merlin

Re: Help optimizing a slow index scan

From
Dan Harris
Date:
Merlin Moncure wrote:

> As others will probably mention, effective queries on lot/long which
> is a spatial problem will require r-tree or gist.  I don't have a lot
> of experience with exotic indexes but this may be the way to go.
>
> One easy optimization to consider making is to make an index on either
> (incidentid, entrydate) or (incident_id,long) which ever is more
> selective.
>
> This is 'yet another query' that would be fun to try out and tweak
> using the 8.2 upcoming row-wise comparison.
>
> merlin
>
Thanks to everyone for your suggestions.  One problem I ran into is that
apparently my version doesn't support the GIST index that was
mentioned.  "function 'box' doesn't exist" ).. So I'm guessing that both
this as well as the Earth Distance contrib require me to add on some
more pieces that aren't there.

Furthermore, by doing so, I am tying my queries directly to
"postgres-isms".  One of the long term goals of this project is to be
able to fairly transparently support any ANSI SQL-compliant back end
with the same code base.  If I had full control over the query designs,
I could make stored procedures to abstract this.  However, I have to
deal with a "gray box" third-party reporting library that isn't so
flexible.  I'll certainly consider going with something
postgre-specific, but only as a last resort.

I tried the multi-column index as mentioned above but didn't see any
noticeable improvement in elapsed time, although the planner did use the
new index.

What is the real reason for the index not being very effective on these
columns?  Although the numbers are in a very limited range, it seems
that the records would be very selective as it's not terribly common for
multiple rows to share the same coords.

Is the "8.2. upcoming row-wise comparison" something that would be
likely to help me?

Thanks again for your input

Re: Help optimizing a slow index scan

From
"Merlin Moncure"
Date:
On 3/17/06, Dan Harris <fbsd@drivefaster.net> wrote:
> Merlin Moncure wrote:
> Thanks to everyone for your suggestions.  One problem I ran into is that
> apparently my version doesn't support the GIST index that was
> mentioned.  "function 'box' doesn't exist" ).. So I'm guessing that both
> this as well as the Earth Distance contrib require me to add on some
> more pieces that aren't there.

earth distance is a contrib module that has to be built and installed.
it does use some pg-isms so I guess that can be ruled out.  GIST is a
bit more complex and I would consider reading the documentation very
carefully regarding them and make your own determination.

> Furthermore, by doing so, I am tying my queries directly to
> "postgres-isms".  [snip]

> I tried the multi-column index as mentioned above but didn't see any
> noticeable improvement in elapsed time, although the planner did use the
> new index.

did you try both flavors of the multiple key index I suggested? (there
were other possiblities, please experiment)

> Is the "8.2. upcoming row-wise comparison" something that would be
> likely to help me?

possibly. good news is that rwc is ansi sql.  you can see my blog
about it here: http://people.planetpostgresql.org/merlin/

Specifically, if you can order your table with an order by statement
such that the records you want are contingous, then yes.  However,
even though it's ansi sql, various commercial databases implement rwc
improperly or not at all (mysql, to their credit, gets it right) and I
still feel like an exotic index or some other nifty pg trick might be
the best performance approach here).

Merlin

Re: Help optimizing a slow index scan

From
Tom Lane
Date:
Dan Harris <fbsd@drivefaster.net> writes:
> Furthermore, by doing so, I am tying my queries directly to
> "postgres-isms".  One of the long term goals of this project is to be
> able to fairly transparently support any ANSI SQL-compliant back end
> with the same code base.

Unfortunately, there isn't any portable or standard (not exactly the
same thing ;-)) SQL functionality for dealing gracefully with
two-dimensional searches, which is what your lat/long queries are.
You should accept right now that you can have portability or you can
have good performance, not both.

Merlin's enthusiasm for row-comparison queries is understandable because
that fix definitely helped a common problem.  But row comparison has
nothing to do with searches in two independent dimensions.  Row
comparison basically makes it easier to exploit the natural behavior of
multicolumn btree indexes ... but a multicolumn btree index does not
efficiently support queries that involve separate range limitations on
each index column.  (If you think about the index storage order you'll
see why: the answer entries are not contiguous in the index.)

To support two-dimensional searches you really need a non-btree index
structure, such as GIST.  Since this isn't standard, demanding a
portable answer won't get you anywhere.  (I don't mean to suggest that
Postgres is the only database that has such functionality, just that
the DBs that do have it don't agree on any common API.)

            regards, tom lane

Re: Help optimizing a slow index scan

From
Michael Fuhr
Date:
On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote:
> Dan Harris <fbsd@drivefaster.net> writes:
> > Furthermore, by doing so, I am tying my queries directly to
> > "postgres-isms".  One of the long term goals of this project is to be
> > able to fairly transparently support any ANSI SQL-compliant back end
> > with the same code base.
>
> Unfortunately, there isn't any portable or standard (not exactly the
> same thing ;-)) SQL functionality for dealing gracefully with
> two-dimensional searches, which is what your lat/long queries are.

The OpenGIS Simple Features Specification[1] is a step in that
direction, no?  PostGIS[2], MySQL[3], and Oracle Spatial[4] implement
to varying degrees.  With PostGIS you do have to add non-standard
operators to a query's predicate to benefit from GiST indexes on
spatial columns, but the rest of the query can be straight out of
the SQL and OGC standards.

[1] http://www.opengeospatial.org/docs/99-049.pdf
[2] http://www.postgis.org/
[3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
[4] http://www.oracle.com/technology/products/spatial/index.html

--
Michael Fuhr

Re: Help optimizing a slow index scan

From
Oleg Bartunov
Date:
I may be wrong but we in astronomy have several sky indexing schemes, which
allows to effectively use classical btree index. See
http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization
for details. Sergei Koposov has developed Q3C contrib module for
PostgreSQL 8.1+ and we use it with billiard size astronomical catalogs.


     Oleg

On Fri, 17 Mar 2006, Michael Fuhr wrote:

> On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote:
>> Dan Harris <fbsd@drivefaster.net> writes:
>>> Furthermore, by doing so, I am tying my queries directly to
>>> "postgres-isms".  One of the long term goals of this project is to be
>>> able to fairly transparently support any ANSI SQL-compliant back end
>>> with the same code base.
>>
>> Unfortunately, there isn't any portable or standard (not exactly the
>> same thing ;-)) SQL functionality for dealing gracefully with
>> two-dimensional searches, which is what your lat/long queries are.
>
> The OpenGIS Simple Features Specification[1] is a step in that
> direction, no?  PostGIS[2], MySQL[3], and Oracle Spatial[4] implement
> to varying degrees.  With PostGIS you do have to add non-standard
> operators to a query's predicate to benefit from GiST indexes on
> spatial columns, but the rest of the query can be straight out of
> the SQL and OGC standards.
>
> [1] http://www.opengeospatial.org/docs/99-049.pdf
> [2] http://www.postgis.org/
> [3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
> [4] http://www.oracle.com/technology/products/spatial/index.html
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Help optimizing a slow index scan

From
Bruno Wolff III
Date:
On Sat, Mar 18, 2006 at 11:50:48 +0300,
  Oleg Bartunov <oleg@sai.msu.su> wrote:
> I may be wrong but we in astronomy have several sky indexing schemes, which
> allows to effectively use classical btree index. See
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization
> for details. Sergei Koposov has developed Q3C contrib module for
> PostgreSQL 8.1+ and we use it with billiard size astronomical catalogs.

Note that Earth Distance can also be used for astronomy. If you use an
appropiate radius, distances will be in degrees.

Re: Help optimizing a slow index scan

From
Evgeny Gridasov
Date:
Try contrib/btree_gist.
I've tried that one, but for my case it didn't help much.
The performance was almost equal or even slower than built-in btree.

On Fri, 17 Mar 2006 08:53:44 -0700
Dan Harris <fbsd@drivefaster.net> wrote:

> Dan Harris wrote:
> > Markus Bertheau wrote:
> >> Have you tried using a GIST index on lat & long? These things are
> >> meant for two-dimensional data, whereas btree doesn't handle
> >> two-dimensional data that well. How many rows satisfy either of the
> >> long / lat condition?
> >>
> >>
> >>>
> > According to the analyze, less than 500 rows matched.  I'll look into
> > GIST indexes, thanks for the feedback.
> >
> > -Dan
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
>
> When I try to create a GIST index, I get the following error:
>
> create index eventgeo_lat_idx on eventgeo using GIST (lat);
>
> ERROR:  data type double precision has no default operator class for
> access method "gist"
> HINT:  You must specify an operator class for the index or define a
> default operator class for the data type.
>
> I'm not sure what a "default operator class" is, exactly..
>
> -Dan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


--
Evgeny Gridasov
Software Engineer
I-Free, Russia

Re: Help optimizing a slow index scan

From
Oleg Bartunov
Date:
On Fri, 17 Mar 2006, Evgeny Gridasov wrote:

> Try contrib/btree_gist.

contrib/btree_gist does nothing more than  built-in btree - it's just
an support for multicolumn GiST indices.

> I've tried that one, but for my case it didn't help much.
> The performance was almost equal or even slower than built-in btree.
>
> On Fri, 17 Mar 2006 08:53:44 -0700
> Dan Harris <fbsd@drivefaster.net> wrote:
>
>> Dan Harris wrote:
>>> Markus Bertheau wrote:
>>>> Have you tried using a GIST index on lat & long? These things are
>>>> meant for two-dimensional data, whereas btree doesn't handle
>>>> two-dimensional data that well. How many rows satisfy either of the
>>>> long / lat condition?
>>>>
>>>>
>>>>>
>>> According to the analyze, less than 500 rows matched.  I'll look into
>>> GIST indexes, thanks for the feedback.
>>>
>>> -Dan
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 2: Don't 'kill -9' the postmaster
>>
>> When I try to create a GIST index, I get the following error:
>>
>> create index eventgeo_lat_idx on eventgeo using GIST (lat);
>>
>> ERROR:  data type double precision has no default operator class for
>> access method "gist"
>> HINT:  You must specify an operator class for the index or define a
>> default operator class for the data type.
>>
>> I'm not sure what a "default operator class" is, exactly..
>>
>> -Dan
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83