Re: Strange issue with GiST index scan taking far too long - Mailing list pgsql-hackers

From A. Kretschmer
Subject Re: Strange issue with GiST index scan taking far too long
Date
Msg-id 20080609142408.GA9409@a-kretschmer.de
Whole thread Raw
In response to Strange issue with GiST index scan taking far too long  (Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk>)
List pgsql-hackers
am  Mon, dem 09.06.2008, um 14:18:50 +0100 mailte Mark Cave-Ayland folgendes:
> Hi there,
> 
> I'm currently looking at a bug report in PostGIS where we are getting 
> extremely long index scan times querying an index in one case, but the 
> same scan can take much less time if the input geometry is calculated as 
> the result of another function.
> 
> First of all, I include the EXPLAIN ANALYZE of the basic query which 
> looks like this:
> 
> 
> postgis=# explain analyze select count(*) from geography where centroid 
> && (select the_geom from geography where id=69495);
> 
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
> time=2691.783..2691.784 rows=1 loops=1)
>    InitPlan
>      ->  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
> (actual time=60.987..61.373 rows=1 loops=1)
>            Filter: (id = 69495::numeric)
>    ->  Index Scan using geography_geom_centroid_idx on geography 
> (cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722 
> rows=32880 loops=1)
>          Index Cond: (centroid && $0)
>          Filter: (centroid && $0)
>  Total runtime: 2692.288 ms
> (8 rows)
> 


There are a BIG difference between estimated rows and real rows (1 vs.
32880). Why? Do you have recent statistics?


> 
> The only real thing to know about the query is that the id field within 
> the geography table is a primary key, and hence only a single geometry 

Sure? I can't believe this because the rows=32880.



> is being returned from within the subselect. Note that most of the time 
> is disappearing into the index scan.
> 
> Where things start getting really strange is when we add an extra 
> function called force_2d() into the mix. All this function does is scan 
> through the single geometry returned from the subselect and remove any 
> 3rd dimension coordinates. Now the resulting EXPLAIN ANALYZE for this 
> query looks like this:
> 
> 
> postgis=# explain analyze select count(*) from geography where centroid 
> && (select force_2d(the_geom) from geography where id=69495);
> 
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
> time=343.004..343.005 rows=1 loops=1)
>    InitPlan
>      ->  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
> (actual time=48.714..49.016 rows=1 loops=1)
>            Filter: (id = 69495::numeric)
>    ->  Index Scan using geography_geom_centroid_idx on geography 
> (cost=0.00..8.28 rows=1 width=0) (actual time=49.367..235.296 rows=32880 
> loops=1)
>          Index Cond: (centroid && $0)
>          Filter: (centroid && $0)
>  Total runtime: 343.084 ms
> (8 rows)
> 
> 
> So by adding in an extra function around the subselect result, we have 
> speeded up the index lookup by several orders of magnitude, and the 

Wild guess: you have a big cache.


But i'm not a PostGIS-Expert.



Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: pg_dump restore time and Foreign Keys
Next
From: Simon Riggs
Date:
Subject: Re: Strange issue with GiST index scan taking far too long