Re: Strange issue with GiST index scan taking far too long - Mailing list pgsql-hackers
From | A. Kretschmer |
---|---|
Subject | Re: Strange issue with GiST index scan taking far too long |
Date | |
Msg-id | 20080609142408.GA9409@a-kretschmer.de Whole thread Raw |
In response to | Strange issue with GiST index scan taking far too long (Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk>) |
List | pgsql-hackers |
am Mon, dem 09.06.2008, um 14:18:50 +0100 mailte Mark Cave-Ayland folgendes: > Hi there, > > I'm currently looking at a bug report in PostGIS where we are getting > extremely long index scan times querying an index in one case, but the > same scan can take much less time if the input geometry is calculated as > the result of another function. > > First of all, I include the EXPLAIN ANALYZE of the basic query which > looks like this: > > > postgis=# explain analyze select count(*) from geography where centroid > && (select the_geom from geography where id=69495); > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual > time=2691.783..2691.784 rows=1 loops=1) > InitPlan > -> Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559) > (actual time=60.987..61.373 rows=1 loops=1) > Filter: (id = 69495::numeric) > -> Index Scan using geography_geom_centroid_idx on geography > (cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722 > rows=32880 loops=1) > Index Cond: (centroid && $0) > Filter: (centroid && $0) > Total runtime: 2692.288 ms > (8 rows) > There are a BIG difference between estimated rows and real rows (1 vs. 32880). Why? Do you have recent statistics? > > The only real thing to know about the query is that the id field within > the geography table is a primary key, and hence only a single geometry Sure? I can't believe this because the rows=32880. > is being returned from within the subselect. Note that most of the time > is disappearing into the index scan. > > Where things start getting really strange is when we add an extra > function called force_2d() into the mix. All this function does is scan > through the single geometry returned from the subselect and remove any > 3rd dimension coordinates. Now the resulting EXPLAIN ANALYZE for this > query looks like this: > > > postgis=# explain analyze select count(*) from geography where centroid > && (select force_2d(the_geom) from geography where id=69495); > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual > time=343.004..343.005 rows=1 loops=1) > InitPlan > -> Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559) > (actual time=48.714..49.016 rows=1 loops=1) > Filter: (id = 69495::numeric) > -> Index Scan using geography_geom_centroid_idx on geography > (cost=0.00..8.28 rows=1 width=0) (actual time=49.367..235.296 rows=32880 > loops=1) > Index Cond: (centroid && $0) > Filter: (centroid && $0) > Total runtime: 343.084 ms > (8 rows) > > > So by adding in an extra function around the subselect result, we have > speeded up the index lookup by several orders of magnitude, and the Wild guess: you have a big cache. But i'm not a PostGIS-Expert. Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
pgsql-hackers by date: