Thread: Help optimizing a slow index scan
explain analyze select distinct eventmain.incidentid, eventmain.entrydate, eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy from eventmain, eventgeo where eventmain.incidentid = eventgeo.incidentid and ( long > -104.998027962962 and long < -104.985957781349 ) and ( lat > 39.7075542720006 and lat < 39.7186195832938 ) and eventmain.entrydate > '2006-1-1 00:00' and eventmain.entrydate <= '2006-3-17 00:00' order by eventmain.entrydate; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=121313.81..121330.72 rows=451 width=178) (actual time=723719.761..723726.875 rows=1408 loops=1) -> Sort (cost=121313.81..121314.94 rows=451 width=178) (actual time=723719.755..723721.807 rows=1408 loops=1) Sort Key: eventmain.entrydate, eventmain.disposition, eventmain.incidentid, eventgeo.reportingarea, eventgeo.beatid, eventmain.finaltype, eventmain.casenumber, eventgeo.eventlocation, eventmain.insertdate, eventmain.priority, eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy -> Nested Loop (cost=0.00..121293.93 rows=451 width=178) (actual time=1916.230..723712.900 rows=1408 loops=1) -> Index Scan using eventgeo_lat_idx on eventgeo (cost=0.00..85488.05 rows=10149 width=76) (actual time=0.402..393376.129 rows=22937 loops=1) Index Cond: ((lat > 39.7075542720006::double precision) AND (lat < 39.7186195832938::double precision)) Filter: ((long > -104.998027962962::double precision) AND (long < -104.985957781349::double precision)) -> Index Scan using eventmain_incidentid_idx on eventmain (cost=0.00..3.52 rows=1 width=119) (actual time=14.384..14.392 rows=0 loops=22937) Index Cond: ((eventmain.incidentid)::text = ("outer".incidentid)::text) Filter: ((entrydate > '2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate <= '2006-03-17 00:00:00'::timestamp without time zone)) Total runtime: >>> 723729.238 ms(!) <<< I'm trying to figure out why it's consuming so much time on the index scan for eventgeo_lat_idx. Also, I have an index on "long" that the planner does not appear to find helpful. There are 3.3 million records in eventmain and eventgeo. The server has a reasonably fast RAID10 setup with 16x 15k RPM drives and 12GB of RAM ( 11GB listed as "cache" by vmstat ). Running version 8.0.2 on linux kernel 2.6.12. I have just vacuum analyze'd both tables, rebuilt the eventgeo_lat_idx index and reran the query multiple times to see if caching helped ( it didn't help much ). The server seems to be fine utilizing other fields from this table but using "long" and "lat" seem to drag it down significantly. Is it because there's such slight differences between the records, since they are all within a few hundredths of a degree from each other? Thanks for your time and ideas. -Dan
Dan Harris wrote: > explain analyze .... doh.. sorry to reply to my own post. But I messed up copying some of the fields into the select statement that you'll see in the "Sort Key" section of the analyze results. The mistake was mine. Everything else is "normal" between the query and the plan. -Dan
Markus Bertheau wrote: > Have you tried using a GIST index on lat & long? These things are > meant for two-dimensional data, whereas btree doesn't handle > two-dimensional data that well. How many rows satisfy either of the > long / lat condition? > > >> According to the analyze, less than 500 rows matched. I'll look into GIST indexes, thanks for the feedback. -Dan
Dan Harris wrote: > Markus Bertheau wrote: >> Have you tried using a GIST index on lat & long? These things are >> meant for two-dimensional data, whereas btree doesn't handle >> two-dimensional data that well. How many rows satisfy either of the >> long / lat condition? >> >> >>> > According to the analyze, less than 500 rows matched. I'll look into > GIST indexes, thanks for the feedback. > > -Dan > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster When I try to create a GIST index, I get the following error: create index eventgeo_lat_idx on eventgeo using GIST (lat); ERROR: data type double precision has no default operator class for access method "gist" HINT: You must specify an operator class for the index or define a default operator class for the data type. I'm not sure what a "default operator class" is, exactly.. -Dan
On 3/16/06, Dan Harris <fbsd@drivefaster.net> wrote: > explain analyze > select distinct eventmain.incidentid, eventmain.entrydate, > eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy > from eventmain, eventgeo > where > eventmain.incidentid = eventgeo.incidentid and > ( long > -104.998027962962 and long < -104.985957781349 ) and > ( lat > 39.7075542720006 and lat < 39.7186195832938 ) and > eventmain.entrydate > '2006-1-1 00:00' and > eventmain.entrydate <= '2006-3-17 00:00' > order by > eventmain.entrydate; As others will probably mention, effective queries on lot/long which is a spatial problem will require r-tree or gist. I don't have a lot of experience with exotic indexes but this may be the way to go. One easy optimization to consider making is to make an index on either (incidentid, entrydate) or (incident_id,long) which ever is more selective. This is 'yet another query' that would be fun to try out and tweak using the 8.2 upcoming row-wise comparison. merlin
On Fri, Mar 17, 2006 at 08:34:26 -0700, Dan Harris <fbsd@drivefaster.net> wrote: > Markus Bertheau wrote: > >Have you tried using a GIST index on lat & long? These things are > >meant for two-dimensional data, whereas btree doesn't handle > >two-dimensional data that well. How many rows satisfy either of the > >long / lat condition? > > > > > >> > According to the analyze, less than 500 rows matched. I'll look into > GIST indexes, thanks for the feedback. Have you looked at using the Earth Distance contrib module? If a spherical model of the earth is suitable for your application, then it may work for you and might be easier than trying to create Gist indexes yourself.
On 3/17/06, Bruno Wolff III <bruno@wolff.to> wrote: > Have you looked at using the Earth Distance contrib module? If a spherical > model of the earth is suitable for your application, then it may work for you > and might be easier than trying to create Gist indexes yourself. earth distance = great stuff. If the maximum error is known then you can just pad the distance and filter the result on the client if exact precision is needed. Merlin
Merlin Moncure wrote: > As others will probably mention, effective queries on lot/long which > is a spatial problem will require r-tree or gist. I don't have a lot > of experience with exotic indexes but this may be the way to go. > > One easy optimization to consider making is to make an index on either > (incidentid, entrydate) or (incident_id,long) which ever is more > selective. > > This is 'yet another query' that would be fun to try out and tweak > using the 8.2 upcoming row-wise comparison. > > merlin > Thanks to everyone for your suggestions. One problem I ran into is that apparently my version doesn't support the GIST index that was mentioned. "function 'box' doesn't exist" ).. So I'm guessing that both this as well as the Earth Distance contrib require me to add on some more pieces that aren't there. Furthermore, by doing so, I am tying my queries directly to "postgres-isms". One of the long term goals of this project is to be able to fairly transparently support any ANSI SQL-compliant back end with the same code base. If I had full control over the query designs, I could make stored procedures to abstract this. However, I have to deal with a "gray box" third-party reporting library that isn't so flexible. I'll certainly consider going with something postgre-specific, but only as a last resort. I tried the multi-column index as mentioned above but didn't see any noticeable improvement in elapsed time, although the planner did use the new index. What is the real reason for the index not being very effective on these columns? Although the numbers are in a very limited range, it seems that the records would be very selective as it's not terribly common for multiple rows to share the same coords. Is the "8.2. upcoming row-wise comparison" something that would be likely to help me? Thanks again for your input
On 3/17/06, Dan Harris <fbsd@drivefaster.net> wrote: > Merlin Moncure wrote: > Thanks to everyone for your suggestions. One problem I ran into is that > apparently my version doesn't support the GIST index that was > mentioned. "function 'box' doesn't exist" ).. So I'm guessing that both > this as well as the Earth Distance contrib require me to add on some > more pieces that aren't there. earth distance is a contrib module that has to be built and installed. it does use some pg-isms so I guess that can be ruled out. GIST is a bit more complex and I would consider reading the documentation very carefully regarding them and make your own determination. > Furthermore, by doing so, I am tying my queries directly to > "postgres-isms". [snip] > I tried the multi-column index as mentioned above but didn't see any > noticeable improvement in elapsed time, although the planner did use the > new index. did you try both flavors of the multiple key index I suggested? (there were other possiblities, please experiment) > Is the "8.2. upcoming row-wise comparison" something that would be > likely to help me? possibly. good news is that rwc is ansi sql. you can see my blog about it here: http://people.planetpostgresql.org/merlin/ Specifically, if you can order your table with an order by statement such that the records you want are contingous, then yes. However, even though it's ansi sql, various commercial databases implement rwc improperly or not at all (mysql, to their credit, gets it right) and I still feel like an exotic index or some other nifty pg trick might be the best performance approach here). Merlin
Dan Harris <fbsd@drivefaster.net> writes: > Furthermore, by doing so, I am tying my queries directly to > "postgres-isms". One of the long term goals of this project is to be > able to fairly transparently support any ANSI SQL-compliant back end > with the same code base. Unfortunately, there isn't any portable or standard (not exactly the same thing ;-)) SQL functionality for dealing gracefully with two-dimensional searches, which is what your lat/long queries are. You should accept right now that you can have portability or you can have good performance, not both. Merlin's enthusiasm for row-comparison queries is understandable because that fix definitely helped a common problem. But row comparison has nothing to do with searches in two independent dimensions. Row comparison basically makes it easier to exploit the natural behavior of multicolumn btree indexes ... but a multicolumn btree index does not efficiently support queries that involve separate range limitations on each index column. (If you think about the index storage order you'll see why: the answer entries are not contiguous in the index.) To support two-dimensional searches you really need a non-btree index structure, such as GIST. Since this isn't standard, demanding a portable answer won't get you anywhere. (I don't mean to suggest that Postgres is the only database that has such functionality, just that the DBs that do have it don't agree on any common API.) regards, tom lane
On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote: > Dan Harris <fbsd@drivefaster.net> writes: > > Furthermore, by doing so, I am tying my queries directly to > > "postgres-isms". One of the long term goals of this project is to be > > able to fairly transparently support any ANSI SQL-compliant back end > > with the same code base. > > Unfortunately, there isn't any portable or standard (not exactly the > same thing ;-)) SQL functionality for dealing gracefully with > two-dimensional searches, which is what your lat/long queries are. The OpenGIS Simple Features Specification[1] is a step in that direction, no? PostGIS[2], MySQL[3], and Oracle Spatial[4] implement to varying degrees. With PostGIS you do have to add non-standard operators to a query's predicate to benefit from GiST indexes on spatial columns, but the rest of the query can be straight out of the SQL and OGC standards. [1] http://www.opengeospatial.org/docs/99-049.pdf [2] http://www.postgis.org/ [3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html [4] http://www.oracle.com/technology/products/spatial/index.html -- Michael Fuhr
I may be wrong but we in astronomy have several sky indexing schemes, which allows to effectively use classical btree index. See http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization for details. Sergei Koposov has developed Q3C contrib module for PostgreSQL 8.1+ and we use it with billiard size astronomical catalogs. Oleg On Fri, 17 Mar 2006, Michael Fuhr wrote: > On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote: >> Dan Harris <fbsd@drivefaster.net> writes: >>> Furthermore, by doing so, I am tying my queries directly to >>> "postgres-isms". One of the long term goals of this project is to be >>> able to fairly transparently support any ANSI SQL-compliant back end >>> with the same code base. >> >> Unfortunately, there isn't any portable or standard (not exactly the >> same thing ;-)) SQL functionality for dealing gracefully with >> two-dimensional searches, which is what your lat/long queries are. > > The OpenGIS Simple Features Specification[1] is a step in that > direction, no? PostGIS[2], MySQL[3], and Oracle Spatial[4] implement > to varying degrees. With PostGIS you do have to add non-standard > operators to a query's predicate to benefit from GiST indexes on > spatial columns, but the rest of the query can be straight out of > the SQL and OGC standards. > > [1] http://www.opengeospatial.org/docs/99-049.pdf > [2] http://www.postgis.org/ > [3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html > [4] http://www.oracle.com/technology/products/spatial/index.html > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Sat, Mar 18, 2006 at 11:50:48 +0300, Oleg Bartunov <oleg@sai.msu.su> wrote: > I may be wrong but we in astronomy have several sky indexing schemes, which > allows to effectively use classical btree index. See > http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization > for details. Sergei Koposov has developed Q3C contrib module for > PostgreSQL 8.1+ and we use it with billiard size astronomical catalogs. Note that Earth Distance can also be used for astronomy. If you use an appropiate radius, distances will be in degrees.
Try contrib/btree_gist. I've tried that one, but for my case it didn't help much. The performance was almost equal or even slower than built-in btree. On Fri, 17 Mar 2006 08:53:44 -0700 Dan Harris <fbsd@drivefaster.net> wrote: > Dan Harris wrote: > > Markus Bertheau wrote: > >> Have you tried using a GIST index on lat & long? These things are > >> meant for two-dimensional data, whereas btree doesn't handle > >> two-dimensional data that well. How many rows satisfy either of the > >> long / lat condition? > >> > >> > >>> > > According to the analyze, less than 500 rows matched. I'll look into > > GIST indexes, thanks for the feedback. > > > > -Dan > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: Don't 'kill -9' the postmaster > > When I try to create a GIST index, I get the following error: > > create index eventgeo_lat_idx on eventgeo using GIST (lat); > > ERROR: data type double precision has no default operator class for > access method "gist" > HINT: You must specify an operator class for the index or define a > default operator class for the data type. > > I'm not sure what a "default operator class" is, exactly.. > > -Dan > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Evgeny Gridasov Software Engineer I-Free, Russia
On Fri, 17 Mar 2006, Evgeny Gridasov wrote: > Try contrib/btree_gist. contrib/btree_gist does nothing more than built-in btree - it's just an support for multicolumn GiST indices. > I've tried that one, but for my case it didn't help much. > The performance was almost equal or even slower than built-in btree. > > On Fri, 17 Mar 2006 08:53:44 -0700 > Dan Harris <fbsd@drivefaster.net> wrote: > >> Dan Harris wrote: >>> Markus Bertheau wrote: >>>> Have you tried using a GIST index on lat & long? These things are >>>> meant for two-dimensional data, whereas btree doesn't handle >>>> two-dimensional data that well. How many rows satisfy either of the >>>> long / lat condition? >>>> >>>> >>>>> >>> According to the analyze, less than 500 rows matched. I'll look into >>> GIST indexes, thanks for the feedback. >>> >>> -Dan >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 2: Don't 'kill -9' the postmaster >> >> When I try to create a GIST index, I get the following error: >> >> create index eventgeo_lat_idx on eventgeo using GIST (lat); >> >> ERROR: data type double precision has no default operator class for >> access method "gist" >> HINT: You must specify an operator class for the index or define a >> default operator class for the data type. >> >> I'm not sure what a "default operator class" is, exactly.. >> >> -Dan >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings >> > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83