Thread: Performance on multiple OR conditions inside ()

Performance on multiple OR conditions inside ()

From
Håvard Wahl Kongsgård
Date:
Hi, I have a spatial query with multiple OR statements, for large tables it's very slow. Is it possible to skip the spatial lookup on the other conditions if first(previous) condition equal 1, and thereby increase the performance?
 
SELECT vciia_main.sitrp,vciia_main.date_time from vciia_main, south_vietnam72, roads, rails, houses, city where st_within(vciia_main.geom, south_vietnam72.geom) and date_time is not null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR st_dwithin(vciia_main.geom, rails.geom, 500) or st_dwithin(vciia_main.geom, city.geom, 800) or st_dwithin(vciia_main.geom, houses.the_geom, 500))

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

Re: Performance on multiple OR conditions inside ()

From
Håvard Wahl Kongsgård
Date:
Hi, so one solution is to use the most common feature first (), in this case the houses ?

|What version of PostGIS are you using?

1.5.1


2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
Hallo Håvard

The planner is supposed to take care of that. It estimates ehat is the
cheapest part of the OR statements and checks that. If true, then
nothing is done with the others.

Do you have spatial indexes on the geometry columns?

Do you know if they are used by the planner?

Even if the indexes is in place and are used this query will probably be
slow because of how the spatial index works.

What the spatial index does for ST_Within and ST_DWithin is to tell if
the geometries has overlapping bounding boxes (or expanded bounding
boxes in ST_DWithin case). If they do the index is of no more help and
the rest of the calculation has to be done vertex by vertex which is
costly.

So, the problem is when the bounding boxes covers many geoemtries, then
the part of the work that the index can help with is small.

There is techniques to slice the big geometry in smaller pieces, build a
new index and things will go faster.

Long roads often have this problem. If you want to find all houses along
a road the bounding box test will find many more houses than those close
to the road (If the road is not going just north/south or east/west)

I don't think it should do any difference for the planner but I would
test to build the query with joins instead.

What version of PostGIS are you using?




 


Regards

Nicklas Avén


On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård wrote:
> Hi, I have a spatial query with multiple OR statements, for large
> tables it's very slow. Is it possible to skip the spatial lookup on
> the other conditions if first(previous) condition equal 1, and thereby
> increase the performance?
>
> SELECT vciia_main.sitrp,vciia_main.date_time from vciia_main,
> south_vietnam72, roads, rails, houses, city where
> st_within(vciia_main.geom, south_vietnam72.geom) and date_time is not
> null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
> st_dwithin(vciia_main.geom, rails.geom, 500) or
> st_dwithin(vciia_main.geom, city.geom, 800) or
> st_dwithin(vciia_main.geom, houses.the_geom, 500))
>
> --
> Håvard Wahl Kongsgård
> Peace Research Institute Oslo (PRIO)
>
> http://havard.security-review.net/
>





--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

Re: Performance on multiple OR conditions inside ()

From
Nicklas Avén
Date:
No, the order is not supposed to make any difference. The planner plans
the query from costs and statistics and shouldn't care about the order
(I think)

What you should do is to divide the big geometries and build new index.

That can be done by building a grid with generate_series and cutting the
geometry against that grid by using ST_Intersection between the grid and
your geometry. I have not tried it myself so I don't have any query
written for it.

Another way that might be simplier is to use some other map than taht
south Vietnam map. If you have some map with smaller regions instead the
index will be more efficiency.

Also use ST_Dump to tear apart all geometries as much as possible.

Then build a new index and analyse for the planner to know what you have
done.

One thing to remember is that you in many cases will get the same
geometry many times because it is within the given distance to many of
your regions like ST_Intersects(Mjøsa, fylker) will give many
fylke-mjøsa combinations.

Regards

Nicklas




On Thu, 2011-01-13 at 21:51 +0100, Håvard Wahl Kongsgård wrote:
> Hi, so one solution is to use the most common feature first (), in
> this case the houses ?
>
> |What version of PostGIS are you using?
>
> 1.5.1
>
>
> 2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
>         Hallo Håvard
>
>         The planner is supposed to take care of that. It estimates
>         ehat is the
>         cheapest part of the OR statements and checks that. If true,
>         then
>         nothing is done with the others.
>
>         Do you have spatial indexes on the geometry columns?
>
>         Do you know if they are used by the planner?
>
>         Even if the indexes is in place and are used this query will
>         probably be
>         slow because of how the spatial index works.
>
>         What the spatial index does for ST_Within and ST_DWithin is to
>         tell if
>         the geometries has overlapping bounding boxes (or expanded
>         bounding
>         boxes in ST_DWithin case). If they do the index is of no more
>         help and
>         the rest of the calculation has to be done vertex by vertex
>         which is
>         costly.
>
>         So, the problem is when the bounding boxes covers many
>         geoemtries, then
>         the part of the work that the index can help with is small.
>
>         There is techniques to slice the big geometry in smaller
>         pieces, build a
>         new index and things will go faster.
>
>         Long roads often have this problem. If you want to find all
>         houses along
>         a road the bounding box test will find many more houses than
>         those close
>         to the road (If the road is not going just north/south or
>         east/west)
>
>         I don't think it should do any difference for the planner but
>         I would
>         test to build the query with joins instead.
>
>         What version of PostGIS are you using?
>
>
>
>
>
>
>
>         Regards
>
>         Nicklas Avén
>
>
>
>         On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård
>         wrote:
>         > Hi, I have a spatial query with multiple OR statements, for
>         large
>         > tables it's very slow. Is it possible to skip the spatial
>         lookup on
>         > the other conditions if first(previous) condition equal 1,
>         and thereby
>         > increase the performance?
>         >
>         > SELECT vciia_main.sitrp,vciia_main.date_time from
>         vciia_main,
>         > south_vietnam72, roads, rails, houses, city where
>         > st_within(vciia_main.geom, south_vietnam72.geom) and
>         date_time is not
>         > null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
>         > st_dwithin(vciia_main.geom, rails.geom, 500) or
>         > st_dwithin(vciia_main.geom, city.geom, 800) or
>         > st_dwithin(vciia_main.geom, houses.the_geom, 500))
>         >
>         > --
>         > Håvard Wahl Kongsgård
>         > Peace Research Institute Oslo (PRIO)
>         >
>         > http://havard.security-review.net/
>         >
>
>
>
>
>
>
> --
> Håvard Wahl Kongsgård
> Peace Research Institute Oslo (PRIO)
>
> http://havard.security-review.net/
>



Re: Performance on multiple OR conditions inside ()

From
Håvard Wahl Kongsgård
Date:
This alternative was a little faster and does not return duplicated rows. It will take 41 hours on my system, but it will have to do for now.

SELECT vciia_main.sitrp,vciia_main.date_time,ST_AsText(vciia_main.geom) from vciia_main, south_vietnam72 where st_within(vciia_main.geom, south_vietnam72.geom) and (st_dwithin(vciia_main.geom, (select the_geom from houses order by st_distance(vciia_main.geom, the_geom)  limit 1), 500) OR st_dwithin(vciia_main.geom, (select geom from roads order by st_distance(vciia_main.geom, geom)  limit 1), 500) OR st_dwithin(vciia_main.geom, (select geom from rails order by st_distance(vciia_main.geom, geom)  limit 1), 500) OR st_dwithin(vciia_main.geom, (select geom from city order by st_distance(vciia_main.geom, geom)  limit 1), 800)) and vciia_main.date_time is not null;

-Håvard

2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
No, the order is not supposed to make any difference. The planner plans
the query from costs and statistics and shouldn't care about the order
(I think)

What you should do is to divide the big geometries and build new index.

That can be done by building a grid with generate_series and cutting the
geometry against that grid by using ST_Intersection between the grid and
your geometry. I have not tried it myself so I don't have any query
written for it.

Another way that might be simplier is to use some other map than taht
south Vietnam map. If you have some map with smaller regions instead the
index will be more efficiency.

Also use ST_Dump to tear apart all geometries as much as possible.

Then build a new index and analyse for the planner to know what you have
done.

One thing to remember is that you in many cases will get the same
geometry many times because it is within the given distance to many of
your regions like ST_Intersects(Mjøsa, fylker) will give many
fylke-mjøsa combinations.

Regards

Nicklas




On Thu, 2011-01-13 at 21:51 +0100, Håvard Wahl Kongsgård wrote:
> Hi, so one solution is to use the most common feature first (), in
> this case the houses ?
>
> |What version of PostGIS are you using?
>
> 1.5.1
>
>
> 2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
>         Hallo Håvard
>
>         The planner is supposed to take care of that. It estimates
>         ehat is the
>         cheapest part of the OR statements and checks that. If true,
>         then
>         nothing is done with the others.
>
>         Do you have spatial indexes on the geometry columns?
>
>         Do you know if they are used by the planner?
>
>         Even if the indexes is in place and are used this query will
>         probably be
>         slow because of how the spatial index works.
>
>         What the spatial index does for ST_Within and ST_DWithin is to
>         tell if
>         the geometries has overlapping bounding boxes (or expanded
>         bounding
>         boxes in ST_DWithin case). If they do the index is of no more
>         help and
>         the rest of the calculation has to be done vertex by vertex
>         which is
>         costly.
>
>         So, the problem is when the bounding boxes covers many
>         geoemtries, then
>         the part of the work that the index can help with is small.
>
>         There is techniques to slice the big geometry in smaller
>         pieces, build a
>         new index and things will go faster.
>
>         Long roads often have this problem. If you want to find all
>         houses along
>         a road the bounding box test will find many more houses than
>         those close
>         to the road (If the road is not going just north/south or
>         east/west)
>
>         I don't think it should do any difference for the planner but
>         I would
>         test to build the query with joins instead.
>
>         What version of PostGIS are you using?
>
>
>
>
>
>
>
>         Regards
>
>         Nicklas Avén
>
>
>
>         On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård
>         wrote:
>         > Hi, I have a spatial query with multiple OR statements, for
>         large
>         > tables it's very slow. Is it possible to skip the spatial
>         lookup on
>         > the other conditions if first(previous) condition equal 1,
>         and thereby
>         > increase the performance?
>         >
>         > SELECT vciia_main.sitrp,vciia_main.date_time from
>         vciia_main,
>         > south_vietnam72, roads, rails, houses, city where
>         > st_within(vciia_main.geom, south_vietnam72.geom) and
>         date_time is not
>         > null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
>         > st_dwithin(vciia_main.geom, rails.geom, 500) or
>         > st_dwithin(vciia_main.geom, city.geom, 800) or
>         > st_dwithin(vciia_main.geom, houses.the_geom, 500))
>         >
>         > --
>         > Håvard Wahl Kongsgård
>         > Peace Research Institute Oslo (PRIO)
>         >
>         > http://havard.security-review.net/
>         >
>
>
>
>
>
>
> --
> Håvard Wahl Kongsgård
> Peace Research Institute Oslo (PRIO)
>
> http://havard.security-review.net/
>





--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/