Thread: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
Stefan Keller
Date:
Hi I have an interesting query to be optimized related to this one [1]. The query definition is: Select all buildings that have more than 1 pharmacies and more than 1 schools within a radius of 1000m. The problem is that I think that this query is inherently O(n^2). In fact the solution I propose below takes forever... My questions: 1. Any comments about the nature of this problem? 2. ... on how to speed it up ? 3. In the original query [1] there's a count which contains a subquery. According to my tests PostgreSQL does not allow this despite the documentation which says "count(expression)". Remarks: I know that "count(*)" could be faster on PostgreSQL but "count(osm_id)" does not change the query plan and this does not seem to be the bottleneck here anyway. Yours, S. [1] http://gis.stackexchange.com/questions/11445/selecting-pois-around-specific-buildings-using-postgis Here's my query: -- Select all buildings that have >1 pharmacies and >1 schools within 1000m: SELECT osm_id AS building_id FROM (SELECT osm_id, way FROM osm_polygon WHERE tags @> hstore('building','yes') ) AS b WHERE (SELECT count(*) > 1 FROM osm_poi AS p WHERE p.tags @> hstore('amenity','pharmacy') AND ST_DWithin(b.way,p.way,1000) ) AND (SELECT count(*) > 1 FROM osm_poi AS p WHERE p.tags @> hstore('amenity','school') AND ST_DWithin(b.way,p.way,1000) ) -- Total query runtime: 4308488 ms. 66345 rows retrieved. Here's the query plan (from EXPLAIN): "Index Scan using osm_polygon_tags_idx on osm_polygon (cost=0.00..406812.81 rows=188 width=901)" " Index Cond: (tags @> '"building"=>"yes"'::hstore)" " Filter: ((SubPlan 1) AND (SubPlan 2))" " SubPlan 1" " -> Aggregate (cost=269.19..269.20 rows=1 width=0)" " -> Bitmap Heap Scan on osm_poi p (cost=7.76..269.19 rows=1 width=0)" " Recheck Cond: (way && st_expand(osm_polygon.way, 1000::double precision))" " Filter: ((tags @> '"amenity"=>"pharmacy"'::hstore) AND (osm_polygon.way && st_expand(way, 1000::double precision)) AND _st_dwithin(osm_polygon.way, way, 1000::double precision))" " -> Bitmap Index Scan on osm_poi_way_idx (cost=0.00..7.76 rows=62 width=0)" " Index Cond: (way && st_expand(osm_polygon.way, 1000::double precision))" " SubPlan 2" " -> Aggregate (cost=269.19..269.20 rows=1 width=0)" " -> Bitmap Heap Scan on osm_poi p (cost=7.76..269.19 rows=1 width=0)" " Recheck Cond: (way && st_expand(osm_polygon.way, 1000::double precision))" " Filter: ((tags @> '"amenity"=>"school"'::hstore) AND (osm_polygon.way && st_expand(way, 1000::double precision)) AND _st_dwithin(osm_polygon.way, way, 1000::double precision))" " -> Bitmap Index Scan on osm_poi_way_idx (cost=0.00..7.76 rows=62 width=0)" " Index Cond: (way && st_expand(osm_polygon.way, 1000::double precision))" ***
Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
"Tomas Vondra"
Date:
On 7 Srpen 2012, 14:01, Stefan Keller wrote: > Hi > > I have an interesting query to be optimized related to this one [1]. > > The query definition is: Select all buildings that have more than 1 > pharmacies and more than 1 schools within a radius of 1000m. > > The problem is that I think that this query is inherently O(n^2). In > fact the solution I propose below takes forever... What about plain INTERSECT? Something like SELECT osm_id FROM osm_poi AS p, osm_polygon b WHERE p.tags @> hstore('amenity','pharmacy') AND ST_DWithin(b.way,p.way,1000) INTERSECT SELECT osm_id FROM osm_poi AS p, osm_polygon b WHERE p.tags @> hstore('amenity','school') AND ST_DWithin(b.way,p.way,1000) Or something like that. But maybe it's a complete nonsense ... Tomas
Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
Stefan Keller
Date:
Your proposal lacks the requirement that it's the same building from where pharmacies and schools are reachable. But I think about. Yours, S. 2012/8/7 Tomas Vondra <tv@fuzzy.cz>: > On 7 Srpen 2012, 14:01, Stefan Keller wrote: >> Hi >> >> I have an interesting query to be optimized related to this one [1]. >> >> The query definition is: Select all buildings that have more than 1 >> pharmacies and more than 1 schools within a radius of 1000m. >> >> The problem is that I think that this query is inherently O(n^2). In >> fact the solution I propose below takes forever... > > What about plain INTERSECT? Something like > > SELECT osm_id FROM osm_poi AS p, osm_polygon b > WHERE p.tags @> hstore('amenity','pharmacy') > AND ST_DWithin(b.way,p.way,1000) > INTERSECT > SELECT osm_id FROM osm_poi AS p, osm_polygon b > WHERE p.tags @> hstore('amenity','school') > AND ST_DWithin(b.way,p.way,1000) > > Or something like that. But maybe it's a complete nonsense ... > > Tomas >
Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
"Tomas Vondra"
Date:
On 7 Srpen 2012, 14:22, Stefan Keller wrote: > Your proposal lacks the requirement that it's the same building from > where pharmacies and schools are reachable. > But I think about. I don't know the dataset so I've expected the osm_id to identify the building - then the intersect should work as AND for the conditions. And I see I've forgot to include the 'is building' condition, so it should be like this: SELECT b.osm_id FROM osm_poi AS p, osm_polygon b WHERE p.tags @> hstore('amenity','pharmacy') AND b.tags @> hstore('building','yes') AND ST_DWithin(b.way,p.way,1000) INTERSECT SELECT b.osm_id FROM osm_poi AS p, osm_polygon b WHERE p.tags @> hstore('amenity','school') AND b.tags @> hstore('building','yes') AND ST_DWithin(b.way,p.way,1000) Tomas
Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
Craig James
Date:
On Tue, Aug 7, 2012 at 5:01 AM, Stefan Keller <sfkeller@gmail.com> wrote:
Maybe you could get rid of the O(n^2) aspect like this:
Select all buildings that have more than 1
pharmacies and more than 1 schools within a radius of 1000m
from
(Select all buildings that have more than four (pharmacy or school)
within a radius of 1000m)
The inner select should be fast -- you could make it fast by creating a new property like "building of interest" that was "pharmacy or school" and build an index on the "building of interest" property.
The inner query would reduce your sample set to a much smaller set of buildings, and presumably the outer query could handle that pretty quickly.
Craig James
Hi
I have an interesting query to be optimized related to this one [1].
The query definition is: Select all buildings that have more than 1
pharmacies and more than 1 schools within a radius of 1000m.
The problem is that I think that this query is inherently O(n^2). In
fact the solution I propose below takes forever...
Maybe you could get rid of the O(n^2) aspect like this:
Select all buildings that have more than 1
pharmacies and more than 1 schools within a radius of 1000m
from
(Select all buildings that have more than four (pharmacy or school)
within a radius of 1000m)
The inner select should be fast -- you could make it fast by creating a new property like "building of interest" that was "pharmacy or school" and build an index on the "building of interest" property.
The inner query would reduce your sample set to a much smaller set of buildings, and presumably the outer query could handle that pretty quickly.
Craig James
My questions:
1. Any comments about the nature of this problem?
2. ... on how to speed it up ?
3. In the original query [1] there's a count which contains a
subquery. According to my tests PostgreSQL does not allow this despite
the documentation which says "count(expression)".
Remarks: I know that "count(*)" could be faster on PostgreSQL but
"count(osm_id)" does not change the query plan and this does not seem
to be the bottleneck here anyway.
Yours, S.
[1] http://gis.stackexchange.com/questions/11445/selecting-pois-around-specific-buildings-using-postgis
Here's my query:
-- Select all buildings that have >1 pharmacies and >1 schools within 1000m:
SELECT osm_id AS building_id
FROM
(SELECT osm_id, way
FROM osm_polygon
WHERE tags @> hstore('building','yes')
) AS b
WHERE
(SELECT count(*) > 1
FROM osm_poi AS p
WHERE p.tags @> hstore('amenity','pharmacy')
AND ST_DWithin(b.way,p.way,1000)
)
AND
(SELECT count(*) > 1
FROM osm_poi AS p
WHERE p.tags @> hstore('amenity','school')
AND ST_DWithin(b.way,p.way,1000)
)
-- Total query runtime: 4308488 ms. 66345 rows retrieved.
Here's the query plan (from EXPLAIN):
"Index Scan using osm_polygon_tags_idx on osm_polygon
(cost=0.00..406812.81 rows=188 width=901)"
" Index Cond: (tags @> '"building"=>"yes"'::hstore)"
" Filter: ((SubPlan 1) AND (SubPlan 2))"
" SubPlan 1"
" -> Aggregate (cost=269.19..269.20 rows=1 width=0)"
" -> Bitmap Heap Scan on osm_poi p (cost=7.76..269.19
rows=1 width=0)"
" Recheck Cond: (way && st_expand(osm_polygon.way,
1000::double precision))"
" Filter: ((tags @> '"amenity"=>"pharmacy"'::hstore)
AND (osm_polygon.way && st_expand(way, 1000::double precision)) AND
_st_dwithin(osm_polygon.way, way, 1000::double precision))"
" -> Bitmap Index Scan on osm_poi_way_idx
(cost=0.00..7.76 rows=62 width=0)"
" Index Cond: (way && st_expand(osm_polygon.way,
1000::double precision))"
" SubPlan 2"
" -> Aggregate (cost=269.19..269.20 rows=1 width=0)"
" -> Bitmap Heap Scan on osm_poi p (cost=7.76..269.19
rows=1 width=0)"
" Recheck Cond: (way && st_expand(osm_polygon.way,
1000::double precision))"
" Filter: ((tags @> '"amenity"=>"school"'::hstore) AND
(osm_polygon.way && st_expand(way, 1000::double precision)) AND
_st_dwithin(osm_polygon.way, way, 1000::double precision))"
" -> Bitmap Index Scan on osm_poi_way_idx
(cost=0.00..7.76 rows=62 width=0)"
" Index Cond: (way && st_expand(osm_polygon.way,
1000::double precision))"
***
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
Stefan Keller
Date:
Hi Craig Clever proposal! I slightly tried to adapt it to the hstore involved. Now I'm having a weird problem that PG says that "relation 'p' does not exist". Why does PG recognize table b in the subquery but not table p? Any ideas? -- Stefan SELECT b.way AS building_geometry FROM (SELECT way FROM osm_polygon WHERE tags @> hstore('building','yes') ) AS b, (SELECT way, tags->'amenity' as value FROM osm_poi WHERE tags ? 'amenity' ) AS p WHERE (SELECT count(*) > 1 FROM p WHERE p.value = 'pharmacy' AND ST_DWithin(b.way,p.way,1000) ) AND (SELECT count(*) > 1 FROM p WHERE p.value = 'school' AND ST_DWithin(b.way,p.way,1000) ) ERROR: relation "p" does not exist LINE 14: FROM p 2012/8/7 Craig James <cjames@emolecules.com>: > On Tue, Aug 7, 2012 at 5:01 AM, Stefan Keller <sfkeller@gmail.com> wrote: >> >> Hi >> >> I have an interesting query to be optimized related to this one [1]. >> >> The query definition is: Select all buildings that have more than 1 >> pharmacies and more than 1 schools within a radius of 1000m. >> >> The problem is that I think that this query is inherently O(n^2). In >> fact the solution I propose below takes forever... > > > Maybe you could get rid of the O(n^2) aspect like this: > > > Select all buildings that have more than 1 > pharmacies and more than 1 schools within a radius of 1000m > from > (Select all buildings that have more than four (pharmacy or school) > within a radius of 1000m) > > The inner select should be fast -- you could make it fast by creating a new > property like "building of interest" that was "pharmacy or school" and build > an index on the "building of interest" property. > > The inner query would reduce your sample set to a much smaller set of > buildings, and presumably the outer query could handle that pretty quickly. > > Craig James > >> >> >> My questions: >> >> 1. Any comments about the nature of this problem? >> >> 2. ... on how to speed it up ? >> >> 3. In the original query [1] there's a count which contains a >> subquery. According to my tests PostgreSQL does not allow this despite >> the documentation which says "count(expression)". >> >> Remarks: I know that "count(*)" could be faster on PostgreSQL but >> "count(osm_id)" does not change the query plan and this does not seem >> to be the bottleneck here anyway. >> >> Yours, S. >> >> [1] >> http://gis.stackexchange.com/questions/11445/selecting-pois-around-specific-buildings-using-postgis >> >> >> Here's my query: >> >> -- Select all buildings that have >1 pharmacies and >1 schools within >> 1000m: >> SELECT osm_id AS building_id >> FROM >> (SELECT osm_id, way >> FROM osm_polygon >> WHERE tags @> hstore('building','yes') >> ) AS b >> WHERE >> (SELECT count(*) > 1 >> FROM osm_poi AS p >> WHERE p.tags @> hstore('amenity','pharmacy') >> AND ST_DWithin(b.way,p.way,1000) >> ) >> AND >> (SELECT count(*) > 1 >> FROM osm_poi AS p >> WHERE p.tags @> hstore('amenity','school') >> AND ST_DWithin(b.way,p.way,1000) >> ) >> -- Total query runtime: 4308488 ms. 66345 rows retrieved. >> >> Here's the query plan (from EXPLAIN): >> "Index Scan using osm_polygon_tags_idx on osm_polygon >> (cost=0.00..406812.81 rows=188 width=901)" >> " Index Cond: (tags @> '"building"=>"yes"'::hstore)" >> " Filter: ((SubPlan 1) AND (SubPlan 2))" >> " SubPlan 1" >> " -> Aggregate (cost=269.19..269.20 rows=1 width=0)" >> " -> Bitmap Heap Scan on osm_poi p (cost=7.76..269.19 >> rows=1 width=0)" >> " Recheck Cond: (way && st_expand(osm_polygon.way, >> 1000::double precision))" >> " Filter: ((tags @> '"amenity"=>"pharmacy"'::hstore) >> AND (osm_polygon.way && st_expand(way, 1000::double precision)) AND >> _st_dwithin(osm_polygon.way, way, 1000::double precision))" >> " -> Bitmap Index Scan on osm_poi_way_idx >> (cost=0.00..7.76 rows=62 width=0)" >> " Index Cond: (way && st_expand(osm_polygon.way, >> 1000::double precision))" >> " SubPlan 2" >> " -> Aggregate (cost=269.19..269.20 rows=1 width=0)" >> " -> Bitmap Heap Scan on osm_poi p (cost=7.76..269.19 >> rows=1 width=0)" >> " Recheck Cond: (way && st_expand(osm_polygon.way, >> 1000::double precision))" >> " Filter: ((tags @> '"amenity"=>"school"'::hstore) AND >> (osm_polygon.way && st_expand(way, 1000::double precision)) AND >> _st_dwithin(osm_polygon.way, way, 1000::double precision))" >> " -> Bitmap Index Scan on osm_poi_way_idx >> (cost=0.00..7.76 rows=62 width=0)" >> " Index Cond: (way && st_expand(osm_polygon.way, >> 1000::double precision))" >> >> *** >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance > >
Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
Jeff Janes
Date:
On Tue, Aug 7, 2012 at 5:07 PM, Stefan Keller <sfkeller@gmail.com> wrote: > Hi Craig > > Clever proposal! > I slightly tried to adapt it to the hstore involved. > Now I'm having a weird problem that PG says that "relation 'p' does not exist". > Why does PG recognize table b in the subquery but not table p? > Any ideas? I don't think it does recognize b, either. It just fell over on p before it had a chance to fall over on b. I think you have to use WITH if you want to reference the same subquery in multiple FROMs. Another approach would be to add explicit conditions for there being at least 1 school and 1 pharmacy within distance. There can't be >1 unless there is >=1, but the join possibilities for >=1 (i.e. "where exists" rather than "where (select count(*)...)>1" ) are much more attractive than the ones for >1. Cheers, Jeff
Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
Stefan Keller
Date:
Hi 2012/8/8 Jeff Janes <jeff.janes@gmail.com>: > On Tue, Aug 7, 2012 at 5:07 PM, Stefan Keller <sfkeller@gmail.com> wrote: >> Hi Craig >> >> Clever proposal! >> I slightly tried to adapt it to the hstore involved. >> Now I'm having a weird problem that PG says that "relation 'p' does not exist". >> Why does PG recognize table b in the subquery but not table p? >> Any ideas? > > I don't think it does recognize b, either. It just fell over on p > before it had a chance to fall over on b. No, the b get's recognized. See my original query. That's a strange behaviour of the SQL parser which I can't understand. > I think you have to use WITH if you want to reference the same > subquery in multiple FROMs. I'll try that with CTE too. > Another approach would be to add explicit conditions for there being > at least 1 school and 1 pharmacy within distance. There can't be >1 > unless there is >=1, but the join possibilities for >=1 (i.e. "where > exists" rather than "where (select count(*)...)>1" ) are much more > attractive than the ones for >1. > > Cheers, > > Jeff You mean, first doing a select on existence and then apply the count condition later? Stefan
Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
From
Jeff Janes
Date:
On Thu, Aug 9, 2012 at 4:00 AM, Stefan Keller <sfkeller@gmail.com> wrote: > Hi > > 2012/8/8 Jeff Janes <jeff.janes@gmail.com>: >> On Tue, Aug 7, 2012 at 5:07 PM, Stefan Keller <sfkeller@gmail.com> wrote: >>> Hi Craig >>> >>> Clever proposal! >>> I slightly tried to adapt it to the hstore involved. >>> Now I'm having a weird problem that PG says that "relation 'p' does not exist". >>> Why does PG recognize table b in the subquery but not table p? >>> Any ideas? >> >> I don't think it does recognize b, either. It just fell over on p >> before it had a chance to fall over on b. > > No, the b get's recognized. See my original query. > That's a strange behaviour of the SQL parser which I can't understand. Oh, I see. You are referencing b only as the qualifier for a column name, while you are trying to reference p as a an entire query. I initially misread it and thought you referencing both b and p in both ways each. > >> I think you have to use WITH if you want to reference the same >> subquery in multiple FROMs. > > I'll try that with CTE too. > >> Another approach would be to add explicit conditions for there being >> at least 1 school and 1 pharmacy within distance. There can't be >1 >> unless there is >=1, but the join possibilities for >=1 (i.e. "where >> exists" rather than "where (select count(*)...)>1" ) are much more >> attractive than the ones for >1. >> >> Cheers, >> >> Jeff > > You mean, first doing a select on existence and then apply the count > condition later? Yes, exactly. Of course this won't help if most buildings do have at least one of each within distance, as then the prefilter is not very selective. Cheers, Jeff